Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e kind test AntreaPolicyExtendedNamespaces fail on older/less powerful/less resource rich laptops #7067

Open
petertran-avgo opened this issue Mar 18, 2025 · 0 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@petertran-avgo
Copy link

Describe the bug

When running the e2e kind tests on older laptops (~3+ years) there is consistent failures for the test AntreaPolicyExtendedNamespaces.

Notable log outputs include

I0318 09:48:25.000065   43728 k8s_util.go:230] no-tier-d5forgjy/a -> prod2-d5forgjy/b: expected Con but got Err: err - error sending request: Post "https://127.0.0.1:53595/api/v1/namespaces/no-tier-d5forgjy/pods/no-tier-d5forgjya-5b5cbd9bfb-sx4nj/exec?command=%2Fbin%2Fsh&command=-c&command=for+i+in+%24%28seq+1+3%29%3B+do+echo+-n+%22%24%7Bi%7D%3A+%22+%3E%262+%26%26++%2Fagnhost+connect+10.244.2.14%3A80+--timeout%3D1s+--protocol%3Dtcp+%26%26+echo+%22CONNECTED%22+%3E%262%3B+done%3B+echo+%22FINISHED%22+%3E%262&container=c80&stderr=true&stdout=true": read tcp 127.0.0.1:57206->127.0.0.1:53595: read: connection reset by peer /// stdout -  /// stderr -

And

antreapolicy_test.go:91: test failed: after 10 tries, HTTP servers are not ready

From the api-server one will see

│ E0317 17:39:47.674013       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:60358->172.18.0.2:10250: write: connection reset by peer                                    │
│ E0317 17:39:47.894218       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:60442->172.18.0.2:10250: write: broken pipe                                                 │
│ E0317 17:39:47.911306       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:60746->172.18.0.2:10250: write: connection reset by peer                                    │
│ E0317 17:39:49.398704       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:55846->172.18.0.2:10250: write: broken pipe                                                 │
│ E0317 17:39:49.525435       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:55858->172.18.0.2:10250: write: connection reset by peer                                    │
│ E0317 17:39:50.031627       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:56200->172.18.0.2:10250: write: broken pipe                                                 │
│ E0317 17:39:50.036459       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:56174->172.18.0.2:10250: write: broken pipe                                                 │
│ E0317 17:39:50.682290       1 upgradeaware.go:427] Error proxying data from client to backend: write tcp 172.18.0.4:37190->172.18.0.3:10250: write: broken pipe

To Reproduce

./ci/kind/test-e2e-kind.sh --setup-only --run TestAntreaPolicyExtendedNamespaces

Expected

=== NAME  TestAntreaPolicyExtendedNamespaces
    fixtures.go:523: Deleting 'testantreapolicyextendednamespaces-094aw0v1' K8s Namespace
I0318 10:00:25.699557   45626 framework.go:882] Deleting Namespace testantreapolicyextendednamespaces-094aw0v1 took 4.391875ms
--- PASS: TestAntreaPolicyExtendedNamespaces (159.34s)
    --- PASS: TestAntreaPolicyExtendedNamespaces/TestGroupACNPNamespaceLabelSelections (95.68s)
        --- PASS: TestAntreaPolicyExtendedNamespaces/TestGroupACNPNamespaceLabelSelections/Case=ACNPStrictNamespacesIsolationByLabels (46.99s)
        --- PASS: TestAntreaPolicyExtendedNamespaces/TestGroupACNPNamespaceLabelSelections/Case=ACNPStrictNamespacesIsolationBySingleLabel (48.69s)
PASS

Actual behavior

antreapolicy_test.go:91: test failed: after 10 tries, HTTP servers are not ready

Versions:

  • Antrea: latest on main
  • kind: 0.27.0
  • k8s
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.32.2

Additional context

  • This looks to be the same issue solved by Antonin in the commit 4077f80
    • However perhaps as the tests scaled up and/or laptops got slower/weaker (like mine) the issue reappeared
@petertran-avgo petertran-avgo added the kind/bug Categorizes issue or PR as related to a bug. label Mar 18, 2025
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 18, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes that the
    api-server would not handle
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* Using waitgroups, the probing is rate limited by doing them in batches
  of n
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 18, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes that the
    api-server would not handle
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* Using waitgroups, the probing is rate limited by doing them in batches
  of n

Signed-off-by: Peter Tran <peter-pt.tran@broadcom.com>
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 20, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes that the
    api-server would not handle
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* Using waitgroups, the probing is rate limited by doing them in batches
  of n

Signed-off-by: Peter Tran <peter-pt.tran@broadcom.com>
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 20, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes that the
    api-server would not handle
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* Using waitgroups, the probing is rate limited by doing them in batches
  of n

Signed-off-by: Peter Tran <peter-pt.tran@broadcom.com>
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 20, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes from all pods to
    a single pod which could not handle it
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* By randomizing the probing, the probing against any one pod is spread
  out and allows older laptops to complete the environment setup before
  testing

Signed-off-by: Peter Tran <peter-pt.tran@broadcom.com>
petertran-avgo added a commit to petertran-avgo/antrea that referenced this issue Mar 20, 2025
* On older machines, the tests `AntreaPolicyExtendedNamespaces` would
  fail but not on CI
  * The failure was due to the 1k+ simultaneous probes from all pods to
    a single pod which could not handle it
  * For that test the number of probes were ~1800:
    * n^2 x m
      * n = 5 namespace x 3 pods
        * squared for the nested for loop validating all pod to pod
	  connections
      * m = 8 ports
* By randomizing the probing, the probing against any one pod is spread
  out and allows older laptops to complete the environment setup before
  testing

Signed-off-by: Peter Tran <peter-pt.tran@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant