-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crossgen2 comparisons are failing in coreclr-outerloop runs #111972
Comments
Tagging subscribers to this area: @hoyosjs |
/cc @dotnet/jit-contrib |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@steveisok is this known to be a codegen issue? Any details? The logs are not very enlightening. |
Yes, it looks like bad non-deterministic codegen from x64-hosted x86-targeting crossgen2 and other similar configuration pairs. Here are the steps to investigate the failure:
Execution failed with a python error for me. I may have too old or too new python on my machine, not sure. In any case, I have ignored the error since it produced the R2R binaries to look at.
Size: 304 bytes
vs. Size: 307 bytes
There are other similar diffs where the emitted instructions is doubled for some reason. |
Thanks Jan, will take a look. |
Interestingly enough I think this may be the issue I fixed in #112020. We were losing assertions depending on whether we were using long or short bit vectors, and the short size depends on the host arch. Should know soon. |
That may have cut down on some of the diffs, but there are still some. Another culprit seems to be coming from this bit of code: runtime/src/coreclr/jit/importercalls.cpp Lines 2990 to 3006 in fa0f65c
We get different address values depending on the host:
and this changes our address mode formation ;; x64 host
IN0044: 0000F3 jae SHORT G_M19140_IG17
recordRelocation: 000002975123C2C6 (rw: 000002975123C2C6) => 400000000052FFB8, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0045: 0000F5 mov edx, (reloc 0x400000000052ffb8)
; byrRegs +[edx]
IN0046: 0000FA movsx ecx, word ptr [edx+2*ebx]
;; x86 host
IN0045: 0000F5 add ebx, ebx
recordRelocation: 2D1B8E48 (rw: 2D1B8E48) => 4052FF88, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0046: 0000F7 mov edx, (reloc 0x4052ff88)
; byrRegs +[edx]
IN0047: 0000FC add edx, ebx
IN0048: 0000FE movsx ecx, word ptr [edx] @EgorBo ring any bells? Seems odd we'd depend on the size of a relocatable handle to pick an address mode. |
hm.. not really, that code hasn't changed since 2022 |
I think the issue may be here: On an x86 host the constant fits in 32 bits and is relocatable so we bail out without forming an address mode. On an x64 host the constant doesn't fit and we skip down further in the method and see I am going to change this to always skip down if |
That resolved some of the diffs, but there are more. Will keep looking. |
We are seeing different PGO data for some value probes, also some parsing or dumping issues with the PGO schema. The edge count data and class histograms are consistent however.
Also the x64 hosted version looks odd, we have duplicated entries in the table. |
The arm to arm Linux, arm64 to arm64 OSX, and the x86 to x86 Windows comparison legs are failing. This was noticed after the change in #111881 was run to correct infrastructure issues.
Example build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=932682&view=results
arm to arm Linux
The text was updated successfully, but these errors were encountered: