Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSM O11y test] Ping GMP endpoint during test for debugging purpose #33

Merged
merged 29 commits into from
May 21, 2024

Conversation

stanley-cheung
Copy link
Contributor

@stanley-cheung stanley-cheung commented Feb 7, 2024

  • Ping GMP endpoint in client and server pod <pod IP>:9464/metrics during the CSM Observability test
  • This will provide insight into what GMP sees from the OTel plugin, before sending metrics to Cloud Monitoring
  • This will help debug in case we get these extra metrics time series without a valid data point, in a future test flake.

- Ping GMP endpoint in client and server pod curl localhost:9464/metrics during test
- This will provide insight into what GMP gets from the OTel plugin before sending metrics to Cloud Monitoring
@sergiitk
Copy link
Member

sergiitk commented Feb 7, 2024

I most definitely won't recommend this approach. Instead, we should consider making normal HTTP requests using requests or urllib3.

@stanley-cheung stanley-cheung marked this pull request as draft February 7, 2024 18:44
- But this doesn't work yet. Can't quite figure out what's the exact request URL to use. Tried quite a few combinations and none of them works.
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 7, 2024

I most definitely won't recommend this approach. Instead, we should consider making normal HTTP requests using requests or urllib3.

Do you know what the request URL should I be using? I tried quite a few combinations but none of them works

  • http://{server pod name}.{server namespace}.svc.cluster.local:9464/metrics
  • http://{server pod name}.{server namespace}.pod.cluster.local:9464/metrics
  • http://{frontend service name}.{server namespace}.svc.cluster.local:9464/metrics
  • http://{server pod IP replaced dots with dashes}.{server namespace}.pod.cluster.local:9464/metrics
  • http://{server pod IP}:9464/metrics

If I supply {server pod name}, {server namespace}, {container name} into pod_exec and then run curl -s localhost:9464/metrics, it does work. So it seems like it's a matter arranging {server pod name}, {server namespace}, {container name} into the correct FQDN?

I just realized I am running the requests.get() from the main test driver (csm_observability_test.py). That is outside of the cluster isn't it? So it won't understand the pod IP address or anything .svc.cluster.local isn't it.

- `enable_csm_observability` is now a constructor parameter for the client runner and the server runner, instead of a `run()` parameter.
- If `enable_csm_observability`, we want to do a port forwarding for the `pod_monitoring_port = 9464`, such that we can issue a GET request against it in the CSM test
- Added `monitoring_port` and `monitoring_host` as class variable to the client runner and server runner.
- Override `_xds_test_server_for_pod` in the `GammaServerRunner` class
- Minimize the places where we need to hardcode the port `9464`
@stanley-cheung stanley-cheung changed the title [CSM O11y test] [DO NOT MERGE] Ping GMP endpoint regularly for debug purpose [CSM O11y test] Ping GMP endpoint regularly for debug purpose Feb 8, 2024
@stanley-cheung stanley-cheung changed the title [CSM O11y test] Ping GMP endpoint regularly for debug purpose [CSM O11y test] Ping GMP endpoint during test for debugging purpose Feb 8, 2024
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 8, 2024

@stanley-cheung stanley-cheung marked this pull request as ready for review February 8, 2024 01:03
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 8, 2024

- Added a PrometheusLogger class to write the prometheus endpoint log to a separate log file related to logs_subdir
- monitoring_port is now stored in XdsTestClient and XdsTestServer rather than the runner
- use rpc_host instead of a separate monitoring_host
- put the constant DEFAULT_MONITORING_PORT in the KubernetesBaseRunner
- fixed requirements.txt and requirements.lock
- removed a previous hack is_legit_time_series()
- renamed function to ping_prometheus_endpoint instead of referring it as GMP
@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 21, 2024

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Feb 21, 2024

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented Mar 5, 2024

@stanley-cheung stanley-cheung requested a review from sergiitk March 27, 2024 23:21
@sergiitk sergiitk force-pushed the ping-gmp-endpoint branch 2 times, most recently from 2bb2bd3 to c71a050 Compare May 10, 2024 03:24
Copy link
Member

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with

  1. FIxing the log collection arg
  2. Running full csm test suite + lb test suite

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 17, 2024

@stanley-cheung
Copy link
Contributor Author

@sergiitk Incorporated latest feedback. Using ClientDeploymentArgs now. Will do ServerDeploymentArgs in a separate PR.

Want to do a final check and merge? Thanks.

@stanley-cheung stanley-cheung requested a review from sergiitk May 17, 2024 02:04
Copy link
Member

@sergiitk sergiitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that a bunch grpc/core/master/linux/psm-csm test run failed with

Traceback (most recent call last):
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/tests/app_net_ssa_test.py", line 111, in test_session_affinity_policy
    test_client: _XdsTestClient = self.startTestClient(
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/xds_k8s_testcase.py", line 928, in startTestClient
    return self._start_test_client(test_server.xds_uri, **kwargs)
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/xds_k8s_testcase.py", line 831, in _start_test_client
    test_client = self.client_runner.run(
  File "/tmp/tmp.2ObQJ9ydNu/psm-interop/framework/test_app/runners/k8s/k8s_xds_client_runner.py", line 188, in run
    **self.deployment_args.as_dict(),
AttributeError: 'NoneType' object has no attribute 'as_dict'

@sergiitk
Copy link
Member

The lb test suite failed with the same error.

@stanley-cheung
Copy link
Contributor Author

stanley-cheung commented May 20, 2024

@sergiitk sergiitk merged commit d07a3ab into grpc:main May 21, 2024
7 checks passed
@stanley-cheung stanley-cheung deleted the ping-gmp-endpoint branch May 21, 2024 19:44
sergiitk pushed a commit that referenced this pull request Jun 6, 2024
- Created a new test `app_net_csm_observability_test.py` that's largely
a copy of `gamma/csm_observability_test.py` with some adjustments.
- Refactor some flags to be passed to the `k8s_xds_server_runner`.
- Use `ServerDeploymentArgs` everywhere

This PR was based on top of #33, and a replacement of #72.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants