Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add App Net based CSM Observability Test #86

Merged
merged 5 commits into from
Jun 6, 2024

Conversation

stanley-cheung
Copy link
Contributor

  • Created a new test app_net_csm_observability_test.py that's largely a copy of gamma/csm_observability_test.py with some adjustments.
  • Refactor some flags to be passed to the k8s_xds_server_runner.
  • Use ServerDeploymentArgs everywhere

This PR was based on top of #33, and a replacement of #72.

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 23, 2024

@stanley-cheung stanley-cheung requested a review from sergiitk May 23, 2024 06:18
@stanley-cheung
Copy link
Contributor Author

@sergiitk sergiitk force-pushed the app-net-csm-o11y-2 branch from 8631e48 to dbfc419 Compare May 24, 2024 22:15
@sergiitk
Copy link
Member

@sergiitk
Copy link
Member

@sergiitk sergiitk force-pushed the app-net-csm-o11y-2 branch from 3389995 to de01990 Compare May 25, 2024 01:09
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 31, 2024

@stanley-cheung stanley-cheung requested a review from sergiitk May 31, 2024 21:19
sergiitk added a commit that referenced this pull request Jun 5, 2024
…ends (#93)

Fixes an issue with NEG size and Backend Health waiting logic.

This bug has been in the framework from the very beginning, because we
never tested the flow with (a) Regional or multizonal cluster, AND (b) a
single server.

The issue: when GKE cluster has nodes in more than one zone, a NEG for
the test server/service is created in each zone. If there's fewer
endpoints than zones in the cluster, not every NEG will end up having a
server endpoint. Therefore:

1. `wait_for_network_endpoint_group` will fail because it waits for
every NEG to have more than 0 endpoints.
2. `wait_for_backends_healthy_status` will fail because NEGs with 0
endpoints will never be marked as healthy.

This PR changes the logic of:
1. `wait_for_network_endpoint_group` now waits for a NEG to merely exist
in GCP APIs.
2. `wait_for_backends_healthy_status` now accounts for server
replica_count and will succeeded as soon as the number of healthy
Backends to be greater or equal the number of server replicas.

Discovered when debugging #86. Splitting to its own PR.
@sergiitk sergiitk force-pushed the app-net-csm-o11y-2 branch from 442fdea to 78ed0fe Compare June 5, 2024 21:58
@sergiitk sergiitk force-pushed the app-net-csm-o11y-2 branch from 78ed0fe to 75c4231 Compare June 5, 2024 21:59
@sergiitk
Copy link
Member

sergiitk commented Jun 5, 2024

  1. @sergiitk force-pushed the app-net-csm-o11y-2 branch from 442fdea to 78ed0fe: removed my commits were split to PR framework core: fixes a case when there are fewer endpoints than backends #93 and merged to main
  2. @sergiitk force-pushed the app-net-csm-o11y-2 branch from 78ed0fe to 75c4231: rebased against latest main

Copy link
Member

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as the tests pass

@sergiitk
Copy link
Member

sergiitk commented Jun 5, 2024

@stanley-cheung stanley-cheung force-pushed the app-net-csm-o11y-2 branch 2 times, most recently from 64fcc69 to 75c4231 Compare June 5, 2024 22:43
@sergiitk sergiitk merged commit 624948a into grpc:main Jun 6, 2024
7 checks passed
@stanley-cheung stanley-cheung deleted the app-net-csm-o11y-2 branch June 6, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants