-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate Intermittent Test Failures in GitHub Actions #3527
Comments
Thank you, @atilsensalduz! I would like to add that ideally, any solution will NOT involve |
Also note that this flakiness feels like it started around the time we switched to running all tests in parallel, but I could be mistaken. |
Actually I have tried to run this actions with the list of My actions : |
I am thinking of restricting the number of parallel test cases running in the actions so that it may not arise again. Ubuntu runners are running fine at If this method is okay.. I would like to open a PR regarding it. |
Thanks for the investigation, @Abiji-2020 By default, tests run in parallel based on the CPU count. If not explicitly specified, the number of parallel executions is determined by the GOMAXPROCS environment variable. If the value of GOMAXPROCS is not explicitly set, it will be equal to the number of CPUs. I'm not sure whether the proposed values are lower than the number of CPUs on the GitHub Actions runner. |
- Ensure each test instance uses a dedicated HTTP transport instead of sharing the default transport. - Prevents race conditions caused by CloseIdleConnections() closing connections in the shared pool. - Improves test stability by avoiding intermittent connection broken errors. References: - Related discussion: google#3527 Signed-off-by: atilsensalduz <atil.sensalduz@gmail.com>
I'm sharing my findings and proposed solution regarding the issue. Root CauseThe issue arises from multiple parallel tests sharing the same underlying HTTP transport, which manages connection pooling. Although each test creates its own clients and servers via the setup(t) function, they inadvertently share Go’s default transport when no explicit transport is specified. Client Setup Without Explicit Client Settings:The client setup does not specify any custom settings: go-github test setup Where CloseIdleConnections() Is Likely Being Called:When a test completes, it calls server.Close(), which triggers:
Test calls server.Close(): go-github test server close The Race Condition Occurs When:
What's Happening Internally:When a test server closes, it calls CloseIdleConnections() on the default transport. Why It’s Intermittent:The timing of these operations is critical. The error only occurs when the connection closure happens at just the wrong moment between connection reuse attempts, explaining the intermittent nature of the problem. Proposed Solution:Isolated Transports: Each test should create and use a unique HTTP transport to prevent unintended connection sharing. |
We've encountered intermittent test failures that don't seem to be tied to a specific test case. The failures appear to be related to network issues in GitHub Actions, as rerunning the pipeline usually resolves them.
Since we've observed this issue across multiple test cases, it might be worth investigating whether it's related to how the test server is being managed or if it's an underlying issue with GitHub Actions.
Example workflow runs:
The text was updated successfully, but these errors were encountered: