Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test #70

Merged
merged 3 commits into from
Feb 4, 2025
Merged

Fix flaky test #70

merged 3 commits into from
Feb 4, 2025

Conversation

gjcairo
Copy link
Collaborator

@gjcairo gjcairo commented Feb 3, 2025

HTTP2TransportTLSEnabledTests/testServerFailsClientValidation would rarely fail with a CancellationError being unexpectedly thrown.

I believe this happens because when triggering a graceful shutdown on the server, the quiescing event is translated into a cancellation error if there are connections still open. However, whether this error is surfaced seems to have to do with the timing of both the client and the server shutting down.

I have refactored the test to use nested withGRPCServer/withGRPCClient with-style methods instead, as this should eliminate timing issues.

@gjcairo gjcairo added the semver/none No version bump required. label Feb 3, 2025
@gjcairo gjcairo requested a review from glbrntt February 3, 2025 18:36
Copy link
Collaborator

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor makes sense but I'd like to understand the problem a bit more first to make sure we're not just working around it.

I believe this happens because when triggering a graceful shutdown on the server, the quiescing event is translated into a cancellation error if there are connections still open.

What function is surfacing that error?

@gjcairo
Copy link
Collaborator Author

gjcairo commented Feb 4, 2025

CommonHTTP2ServerTransport/listen does a serverChannel.executeThenClose where we iterate over the channel's inbound stream. This for-await loop throws CancellationError and we don't handle it.

@glbrntt
Copy link
Collaborator

glbrntt commented Feb 4, 2025

CommonHTTP2ServerTransport/listen does a serverChannel.executeThenClose where we iterate over the channel's inbound stream. This for-await loop throws CancellationError and we don't handle it.

What triggers the cancellation?

@glbrntt glbrntt enabled auto-merge (squash) February 4, 2025 12:55
@gjcairo
Copy link
Collaborator Author

gjcairo commented Feb 4, 2025

In the current implementation we have a discarding throwing task group with one task running the server and the other running the client requests. In this type of task group, when one task throws, the others are cancelled. In rare instances where the client throws, the server task would get cancelled before the server is gracefully shut down. This would trigger the CancellationError to be thrown from the server.

@glbrntt glbrntt merged commit 11bb867 into main Feb 4, 2025
26 of 28 checks passed
@glbrntt glbrntt deleted the fix-flaky-test branch February 4, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/none No version bump required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants