Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-38589: Fix the CPU spike and excessive logging in telemetry pods #1652

Closed
wants to merge 2 commits into from

Conversation

puremaxm
Copy link

What this PR does / why we need it:

This PR contains 2 changes which combine to greatly reduce the log spam in the px-telemetry-phonehome pod and 1 change to slightly reduce the px-telemetry-registration pod's log spam:

  1. In createDaemonSetTelemetryPhonehome, we are now seeding APPLIANCE_ID and APPLIANCE_NAME.
  2. In the various reconcile functions, we now seed APPLIANCE_NAME
  3. We reduce the uploadAll timerange from the last 8760 hours (1 year of hours) to 744 (31 days).

With the old config, we were logging 8760 hours * 11 logs-per-run = ~96k log lines

With the new config, we log 744 * 6 = ~3.8k log lines.

Which issue(s) this PR fixes (optional)
Closes PWX-38589

Special notes for your reviewer:

I ran this changeset on my local cluster, the (truncated) output for the log now looks like:

{"level":"info","ts":1724096621.8442233,"caller":"components/log_processor.go:129","msg":" >>>>> Starting Hour 2024071922"}
{"level":"info","ts":1724096621.845684,"caller":"components/log_scanner.go:49","msg":"Starting scan time extension 2024071922"}
{"level":"info","ts":1724096621.8470185,"caller":"components/log_scanner.go:65","msg":"Finished scan time extension 2024071922"}
{"level":"info","ts":1724096621.8470461,"caller":"components/log_scanner.go:44","msg":"Scanned 0 logs with timeExtension 2024071922"}
{"level":"info","ts":1724096621.8523428,"caller":"components/log_processor.go:187","msg":"Hour 2024071922 completed. 0 records deleted from table."}
{"level":"info","ts":1724096621.8523734,"caller":"components/log_processor.go:129","msg":" >>>>> Starting Hour 2024071921"}
{"level":"info","ts":1724096621.8533957,"caller":"components/log_scanner.go:49","msg":"Starting scan time extension 2024071921"}
{"level":"info","ts":1724096621.8543782,"caller":"components/log_scanner.go:65","msg":"Finished scan time extension 2024071921"}
{"level":"info","ts":1724096621.854402,"caller":"components/log_scanner.go:44","msg":"Scanned 0 logs with timeExtension 2024071921"}
{"level":"info","ts":1724096621.8610723,"caller":"components/log_processor.go:187","msg":"Hour 2024071921 completed. 0 records deleted from table."}
{"level":"info","ts":1724096621.861129,"caller":"components/log_processor.go:129","msg":" >>>>> Starting Hour 2024071920"}
{"level":"info","ts":1724096621.8622558,"caller":"components/log_scanner.go:49","msg":"Starting scan time extension 2024071920"}
{"level":"info","ts":1724096621.8632746,"caller":"components/log_scanner.go:65","msg":"Finished scan time extension 2024071920"}
{"level":"info","ts":1724096621.8633,"caller":"components/log_scanner.go:44","msg":"Scanned 0 logs with timeExtension 2024071920"}
{"level":"info","ts":1724096621.8681047,"caller":"components/log_processor.go:187","msg":"Hour 2024071920 completed. 0 records deleted from table."}
root@ip-10-13-161-24:~/go/pkg/mod/github.com/fullstorydev/grpcurl@v1.9.1# kubectl logs -n $NS px-telemetry-phonehome-8mvsr | wc -l
Defaulted container "log-upload-service" out of: log-upload-service, envoy, init-cont (init)
3806

Copy link

codecov bot commented Aug 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.61%. Comparing base (303cc37) to head (bf53561).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1652      +/-   ##
==========================================
+ Coverage   75.59%   75.61%   +0.01%     
==========================================
  Files          77       77              
  Lines       20862    20877      +15     
==========================================
+ Hits        15771    15786      +15     
  Misses       3967     3967              
  Partials     1124     1124              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@puremaxm puremaxm requested review from jrivera-px, zoxpx and a team August 20, 2024 16:35
@@ -28,7 +28,7 @@
"/var/cores/px_info.log",
"/var/cores/px_patch_fs.log"
],
"phonehome_hour_range": 8760,
"phonehome_hour_range": 744,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrivera-px / @zoxpx is there some way to flag to the larger org that this is changing?
I have a hard time believing that trying to extract the last year of logs was crucial to anyone's workflow, but it's probably best to double check

@puremaxm puremaxm closed this Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant