Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/8.0-staging] [Mono] Fix the race condition issue of aot_module loading #104918

Merged

Conversation

fanyang-mono
Copy link
Member

@fanyang-mono fanyang-mono commented Jul 15, 2024

Backport of #103975 to release/8.0

/cc @fanyang-mono

Customer Impact

Prior to this change, Maui Android customer's app would run into Invalid IL code error when setting AndroidStripILAfterAOT=true. The customer's app would either hit InvalidProgramException or crash completely. The customers reported the issue via #103782 and #104156

When setting AndroidStripILAfterAOT=true, the IL code for the AOT'ed methods would be removed. As a result, JIT would fail with an error like Invalid IL code, if the runtime attempts to do so. Usually, when a method was AOT compiled, Mono runtime would use it instead of attempting to JIT it. What happened to this customer reported scenario was that mono_aot_get_method raced with load_aot_module - one thread attemptted to read MonoImage:aot_module while the other thread was still loading the aot module. MonoImage:aot_module contained the information of where the AOT'ed code lives. When it wasn't fully loaded, Mono runtime thought it didn't exist. However, this race condition only happened when two threads were trying to invoke methods from the same assembly, which also happened to be not loaded yet. It is often related to the usage of reflection.

Before this PR, there wasn't a way to tell if an aot module was loaded but not found or wasn't loaded at all. This PR introduced a way to differentiate it. Now when mono_aot_get_method sees that the aot module wasn't loaded yet, it will wait a little bit and try again.

Testing

Manually validated the problematic maui Android app reported in the original issue (#103782). It works correctly now.

Risk

Moderate risk. The race condition that this PR is fixing happens very rarely. So far it only happened to a handful of Maui Android customers. This PR doesn't change the behavior when there isn't a race condition, which is the majority of the case. Additionally, we had monitored microbenchmarks results for two weeks and didn't see any performance regressions caused by this change.

@fanyang-mono fanyang-mono requested a review from steveisok July 15, 2024 20:39
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jul 15, 2024
@fanyang-mono fanyang-mono added Servicing-consider Issue for next servicing release review area-Codegen-AOT-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jul 15, 2024
@carlossanlop
Copy link
Member

Friendly reminder that today Monday July 15th is Code Complete day at 4pm PT, which is the deadline to get this included in the August Release.

@fanyang-mono
Copy link
Member Author

@carlossanlop Thanks for the reminder. It is okay to ship this in the September release.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. we will take for consideration in 8.0.x

@jeffschwMSFT jeffschwMSFT added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jul 23, 2024
@jeffschwMSFT jeffschwMSFT modified the milestones: 8.0.x, 8.0.9 Jul 23, 2024
@fanyang-mono
Copy link
Member Author

CI failure on runtime (Build browser-wasm linux Release WasmBuildTests) is known. See #104827

@fanyang-mono fanyang-mono merged commit 9b9f7ed into dotnet:release/8.0-staging Jul 23, 2024
107 of 115 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2024
@rbhanda rbhanda modified the milestones: 8.0.9, 8.0.10 Oct 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Codegen-AOT-mono Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants