Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossgen2 comparisons are failing in coreclr-outerloop runs #111972

Open
steveisok opened this issue Jan 29, 2025 · 12 comments
Open

Crossgen2 comparisons are failing in coreclr-outerloop runs #111972

steveisok opened this issue Jan 29, 2025 · 12 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Milestone

Comments

@steveisok
Copy link
Member

steveisok commented Jan 29, 2025

The arm to arm Linux, arm64 to arm64 OSX, and the x86 to x86 Windows comparison legs are failing. This was noticed after the change in #111881 was run to correct infrastructure issues.

Example build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=932682&view=results

arm to arm Linux

Number of omitted results: 0
Number of mismatched results: 33
Total number of files compared: 236
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@steveisok
Copy link
Member Author

/cc @dotnet/jit-contrib

@steveisok steveisok added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-Infrastructure-coreclr labels Jan 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member

@steveisok is this known to be a codegen issue? Any details? The logs are not very enlightening.

@jkotas
Copy link
Member

jkotas commented Jan 30, 2025

Yes, it looks like bad non-deterministic codegen from x64-hosted x86-targeting crossgen2 and other similar configuration pairs.
 
This test is verifying crossgen2 determinism. It compiles the same input using x86-hosted x86-targeting crossgen2 and using x64-hosted x86-targeting crossgen2, and expects to get the same result.

Here are the steps to investigate the failure:

  1. Install runfo tool if you have not done that before: dotnet tool install -g runfo

  2. "Test crossgen2-comparison windows x86 Release to x86 windows" looks like the easiest configuration to investigate. Find job GUID at the top of the test log: Console log: 'WorkItem' from job 4beb857f-d6bd-4748-9868-af9265951d0d workitem 5cdd69cb-5d32-4c2c-9bd1-9cb24635eda9 (windows.10.amd64.open.rt) executed on machine a008MFE running Windows-10-10.0.14393-SP0. Download the helix payload for the job: runfo get-helix-payload -j 4beb857f-d6bd-4748-9868-af9265951d0d -o c:\helix_payload

  3. Downloading the helix payload should have printed the repro instructions at the end. Copy&paste them into the console window:

set HELIX_CORRELATION_ID=4beb857f-d6bd-4748-9868-af9265951d0d
set HELIX_CORRELATION_PAYLOAD=c:\helix_payload\correlation-payload
set HELIX_PYTHONPATH=echo skipping python
set HELIX_WORKITEM_FRIENDLYNAME=WorkItem
set HELIX_WORKITEM_ID =WorkItem
set HELIX_WORKITEM_PAYLOAD=c:\helix_payload\workitems\WorkItem
set HELIX_WORKITEM_ROOT=c:\helix_payload\workitems\WorkItem
set HELIX_WORKITEM_UPLOAD_ROOT=c:\helix_payload\workitems\WorkItem
set HELIX_DUMP_FOLDER=c:\helix_payload\workitems\WorkItem
set HELIX_CURRENT_LOG =c:\helix_payload\workitems\WorkItem\log.txt
pushd c:\helix_payload\workitems\WorkItem && %HELIX_CORRELATION_PAYLOAD%\scripts\1aa3249b1d64498881aa4a362b85cb44\execute.cmd && popd

Execution failed with a python error for me. I may have too old or too new python on my machine, not sure. In any case, I have ignored the error since it produced the R2R binaries to look at.
4. Build R2RDump tool or download it from a drop somewhere. I have built mine locally (my enlistment is at c:\runtime).
5. Pick a pair of the non-deterministic R2R compiled .dlls and dump them using R2RDump, with hidden offsets to make the comparison easier (hiding of offsets is not perfect):

c:\runtime\dotnet c:\runtime\artifacts\bin\coreclr\windows.x64.Checked\R2RDump\R2RDump.dll -d --naked --hide-offsets -i c:\helix_payload\workitems\WorkItem\Microsoft.CSharp.ni.dll -o 1.txt
c:\runtime\dotnet c:\runtime\artifacts\bin\coreclr\windows.x64.Checked\R2RDump\R2RDump.dll -d --naked --hide-offsets -i c:\helix_payload\workitems\WorkItem\prebuiltWork\log\Microsoft.CSharp.ni.dll -o 2.txt
  1. Compare 1.txt and 2.txt using your favorite diff tool. You should find several actual codegen diffs. For example, the code for Microsoft.CSharp.RuntimeBinder.Semantics.TypeArray Microsoft.CSharp.RuntimeBinder.SymbolTable.CreateParameterArray(System.Reflection.MemberInfo, System.Reflection.ParameterInfo[]) method differs. The problem is that movzx ebx, bl instruction is emitted twice in one case that looks wrong:

Size: 304 bytes

...
    setne   bl
    movzx   ebx, bl
    mov     dword ptr [ebp - 16], ebx
...

vs.

Size: 307 bytes

...
    setne   bl
    movzx   ebx, bl
    movzx   ebx, bl
    mov     dword ptr [ebp - 16], ebx
...

There are other similar diffs where the emitted instructions is doubled for some reason.

@jkotas jkotas added the blocking-clean-ci-optional Blocking optional rolling runs label Jan 30, 2025
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 30, 2025
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Jan 30, 2025
@AndyAyersMS
Copy link
Member

Thanks Jan, will take a look.

@AndyAyersMS
Copy link
Member

Interestingly enough I think this may be the issue I fixed in #112020. We were losing assertions depending on whether we were using long or short bit vectors, and the short size depends on the host arch.

Should know soon.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jan 31, 2025

That may have cut down on some of the diffs, but there are still some. Another culprit seems to be coming from this bit of code:

void* data = info.compCompHnd->getArrayInitializationData(fieldToken, totalFieldSize);
if (!data)
{
return nullptr;
}
//
// Ready to commit to the work
//
impPopStack();
// Turn count and pointer value into constants.
GenTree* lengthValue = gtNewIconNode(count, TYP_INT);
FieldSeq* fldSeq =
GetFieldSeqStore()->Create(fieldToken, (ssize_t)data, FieldSeq::FieldKind::SimpleStaticKnownAddress);
GenTree* pointerValue = gtNewIconHandleNode((size_t)data, GTF_ICON_STATIC_HDL, fldSeq);

We get different address values depending on the host:

;; x64 host
Named Intrinsic System.Runtime.CompilerServices.RuntimeHelpers.CreateSpan: Recognized

lvaGrabTemp returning 10 (V10 tmp6) called for ReadOnlySpan<T> for CreateSpan<T>.


STMT00056 ( 0x000[E-] ... ??? ) <- INLRT @ 0x081[E-]
               [000208] UA---------                         *  STORE_LCL_FLD int    V10 tmp6         [+4]
               [000205] -----------                         \--*  CNS_INT   int    14


STMT00057 ( ??? ... ??? ) <- INLRT @ 0x081[E-]
               [000207] UA---------                         *  STORE_LCL_FLD byref  V10 tmp6         [+0]
               [000206] H----------                         \--*  CNS_INT(h) int    0x52FFB8 static Fseq[374EE580A55269D9E298B7EFD9DB0DFE1798E301679B89E48D722345EB74B59B2]

;; x86 host

Named Intrinsic System.Runtime.CompilerServices.RuntimeHelpers.CreateSpan: Recognized

lvaGrabTemp returning 10 (V10 tmp6) called for ReadOnlySpan<T> for CreateSpan<T>.


STMT00056 ( 0x000[E-] ... ??? ) <- INLRT @ 0x081[E-]
               [000208] UA---------                         *  STORE_LCL_FLD int    V10 tmp6         [+4]
               [000205] -----------                         \--*  CNS_INT   int    14


STMT00057 ( ??? ... ??? ) <- INLRT @ 0x081[E-]
               [000207] UA---------                         *  STORE_LCL_FLD byref  V10 tmp6         [+0]
               [000206] H----------                         \--*  CNS_INT(h) int    0x4052FF88 static Fseq[374EE580A55269D9E298B7EFD9DB0DFE1798E301679B89E48D722345EB74B59B2]

    [ 1]  10 (0x00a) stloc.0
lvaGrabTemp returning 11 (V11 tmp7) (a long lifetime temp) called for Inline stloc first use temp.

and this changes our address mode formation

;; x64 host

IN0044: 0000F3 jae      SHORT G_M19140_IG17
recordRelocation: 000002975123C2C6 (rw: 000002975123C2C6) => 400000000052FFB8, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0045: 0000F5 mov      edx, (reloc 0x400000000052ffb8)
                    ; byrRegs +[edx]
IN0046: 0000FA movsx    ecx, word  ptr [edx+2*ebx]

;; x86 host

IN0045: 0000F5 add      ebx, ebx
recordRelocation: 2D1B8E48 (rw: 2D1B8E48) => 4052FF88, type 3 (IMAGE_REL_BASED_MOFFSET), delta 0
IN0046: 0000F7 mov      edx, (reloc 0x4052ff88)
                    ; byrRegs +[edx]
IN0047: 0000FC add      edx, ebx
IN0048: 0000FE movsx    ecx, word  ptr [edx]

@EgorBo ring any bells? Seems odd we'd depend on the size of a relocatable handle to pick an address mode.

@EgorBo
Copy link
Member

EgorBo commented Jan 31, 2025

@EgorBo ring any bells? Seems odd we'd depend on the size of a relocatable handle to pick an address mode

hm.. not really, that code hasn't changed since 2022

@AndyAyersMS
Copy link
Member

I think the issue may be here:

https://github.com/dotnet/runtime/blame/ad7b8299d8d80eb27cf22838c7017c5872056b56/src/coreclr/jit/codegencommon.cpp#L1198-L1205

On an x86 host the constant fits in 32 bits and is relocatable so we bail out without forming an address mode. On an x64 host the constant doesn't fit and we skip down further in the method and see op1 is GT_LSH by 1 and form a scaled address mode out of it.

I am going to change this to always skip down if op2 is relocatable.

@AndyAyersMS
Copy link
Member

That resolved some of the diffs, but there are more. Will keep looking.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 1, 2025

We are seeing different PGO data for some value probes, also some parsing or dumping issues with the PGO schema. The edge count data and class histograms are consistent however.

;; x64 hosted

Have Static PGO: 7 schema records (schema at 00000181E96E8C70, data at 00000181E96EABD0)
Unknown PGO record type 0x231 in schema entry 2 (offset 0x3b count 0x1 other 0x0)
Unknown PGO record type 0x242 in schema entry 3 (offset 0x3b count 0x1a0 other 0x0)
Profile summary: 13 runs, 0 block probes, 4 edge probes, 0 class profiles, 0 method profiles, 2 other records

....

Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
8 likely values:
  0) 10 - 83%
  1) 8 - 6%
  2) 10 - 4%
  3) 8 - 4%
  4) 2 - 2%
  5) 14 - 1%
  6) 166 - 0%
  7) 166 - 0%
Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
Notify VM instruction set (SSE) must be supported.
(no duplication)

;; x86 hosted

Have Static PGO: 7 schema records (schema at 070B6F70, data at 070B8E64)
Unknown PGO record type 0x231 in schema entry 2 (offset 0x3b count 0x1 other 0x0)
Unknown PGO record type 0x242 in schema entry 3 (offset 0x3b count 0x1a0 other 0x0)
Profile summary: 13 runs, 0 block probes, 4 edge probes, 0 class profiles, 0 method profiles, 2 other records

...

Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
8 likely values:
  0) 10 - 57%
  1) 8 - 15%
  2) 2 - 10%
  3) 204 - 5%
  4) 14 - 5%
  5) 166 - 3%
  6) 182 - 3%
  7) 164 - 2%
Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
Named Intrinsic System.SpanHelpers.SequenceEqual: Recognized
Notify VM instruction set (SSE) must be supported.
Duplicating for popular value = 10

Also the x64 hosted version looks odd, we have duplicated entries in the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI blocking-clean-ci-optional Blocking optional rolling runs
Projects
Status: No status
Development

No branches or pull requests

5 participants