Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Intermittent Test Failures in GitHub Actions #3527

Open
atilsensalduz opened this issue Mar 21, 2025 · 6 comments
Open

Investigate Intermittent Test Failures in GitHub Actions #3527

atilsensalduz opened this issue Mar 21, 2025 · 6 comments
Labels
bug challenging This issue is for experienced Go developers only, please.

Comments

@atilsensalduz
Copy link
Contributor

atilsensalduz commented Mar 21, 2025

We've encountered intermittent test failures that don't seem to be tied to a specific test case. The failures appear to be related to network issues in GitHub Actions, as rerunning the pipeline usually resolves them.

--- FAIL: TestUsersService_DeleteKey (0.00s)
    users_keys_test.go:170: Users.DeleteKey returned error: Delete "http://127.0.0.1:40547/api-v3/user/keys/1": net/http: HTTP/1.x transport connection broken: http: CloseIdleConnections called

Since we've observed this issue across multiple test cases, it might be worth investigating whether it's related to how the test server is being managed or if it's an underlying issue with GitHub Actions.

Example workflow runs:

@gmlewis
Copy link
Collaborator

gmlewis commented Mar 21, 2025

Thank you, @atilsensalduz! I would like to add that ideally, any solution will NOT involve time.Sleep, as experience has shown that using time.Sleep is not a great long-term solution because inevitably a system can be found that needs slightly more time than was alotted and the problem shows up again.

@gmlewis gmlewis added bug challenging This issue is for experienced Go developers only, please. labels Mar 21, 2025
@gmlewis
Copy link
Collaborator

gmlewis commented Mar 21, 2025

Also note that this flakiness feels like it started around the time we switched to running all tests in parallel, but I could be mistaken.

@Abiji-2020
Copy link
Contributor

Also note that this flakiness feels like it started around the time we switched to running all tests in parallel, but I could be mistaken.

Actually I have tried to run this actions with the list of -parallel=25 to ensure is the parallel running was problem and yes it is.

My actions :

@Abiji-2020
Copy link
Contributor

Abiji-2020 commented Mar 22, 2025

I am thinking of restricting the number of parallel test cases running in the actions so that it may not arise again.

Ubuntu runners are running fine at -parallel=15 whereas windows runner is running fine at parallel=10

If this method is okay.. I would like to open a PR regarding it.

@atilsensalduz
Copy link
Contributor Author

Thanks for the investigation, @Abiji-2020 By default, tests run in parallel based on the CPU count. If not explicitly specified, the number of parallel executions is determined by the GOMAXPROCS environment variable. If the value of GOMAXPROCS is not explicitly set, it will be equal to the number of CPUs. I'm not sure whether the proposed values are lower than the number of CPUs on the GitHub Actions runner.

atilsensalduz added a commit to atilsensalduz/go-github that referenced this issue Mar 22, 2025
- Ensure each test instance uses a dedicated HTTP transport
  instead of sharing the default transport.
- Prevents race conditions caused by CloseIdleConnections()
  closing connections in the shared pool.
- Improves test stability by avoiding intermittent connection broken errors.

References:
- Related discussion: google#3527

Signed-off-by: atilsensalduz <atil.sensalduz@gmail.com>
@atilsensalduz
Copy link
Contributor Author

atilsensalduz commented Mar 22, 2025

I'm sharing my findings and proposed solution regarding the issue.

Root Cause

The issue arises from multiple parallel tests sharing the same underlying HTTP transport, which manages connection pooling. Although each test creates its own clients and servers via the setup(t) function, they inadvertently share Go’s default transport when no explicit transport is specified.

Client Setup Without Explicit Client Settings:

The client setup does not specify any custom settings: go-github test setup
It uses the default transport layer: go-github default transport
In Go’s net/http package, the http.Client structure uses DefaultTransport if no transport is specified: net/http DefaultTransport

Where CloseIdleConnections() Is Likely Being Called:

When a test completes, it calls server.Close(), which triggers:

  • Closing the test's HTTP server
  • Closing any connections to that server that are in StateIdle or StateNew
  • Calling CloseIdleConnections() on the default transport, which affects all idle connections in the shared pool, regardless of destination

Test calls server.Close(): go-github test server close
In Go's httptest package calls CloseIdleConnections for default transport before actual specified transport: net/http DefaultTransport CloseIdleConnections

The Race Condition Occurs When:

  • Test A finishes and calls server.Close().
  • This closes idle connections in the shared default transport.
  • Test B is in the middle of one of its test phases.
  • Test B attempts to reuse a connection that was just marked as closed.
  • The request fails with a "connection broken" error.

What's Happening Internally:

When a test server closes, it calls CloseIdleConnections() on the default transport.
This default transport is shared across all test clients that haven't specified their own transport.
The CloseIdleConnections() method closes all idle connections in the transport, not just those to a specific server.
If another test is in the middle of using a connection from this shared pool, it encounters a "connection broken" error.

Why It’s Intermittent:

The timing of these operations is critical. The error only occurs when the connection closure happens at just the wrong moment between connection reuse attempts, explaining the intermittent nature of the problem.

Proposed Solution:

Isolated Transports: Each test should create and use a unique HTTP transport to prevent unintended connection sharing.

#3529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug challenging This issue is for experienced Go developers only, please.
Projects
None yet
Development

No branches or pull requests

3 participants