Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCJ-284] Increase tools pool datarepo_v1 from 1500->2000 #293

Merged

Conversation

okotsopoulos
Copy link
Contributor

@okotsopoulos okotsopoulos commented Apr 30, 2024

https://broadworkbench.atlassian.net/browse/DCJ-284

Background

TDR uses the datarepo_v1 pool in RBS' tools namespace in our integration tests. We periodically run into pool exhaustion when many test runs are running concurrently. The configuration change we made to cancel earlier active test runs on a PR helps manage this load, but isn’t enough given the high volume of developer activity on TDR these days.

RBS logs showed a spike in related errors yesterday, aligning with many test runs that ultimately failed (we didn’t yet have pool availability metrics exposed in Grafana): https://cloudlogging.app.goo.gl/Lvwz3ytmAb1Vwx3T8

Changes

  • Increase datarepo_v1 pool in RBS' tools namespace from 1500 -> 2000
  • Include a link to Grafana panel showing READY resource ratios for tools pools, which is getting populated as of this morning.
    • When deciding whether to modify our pool size in the past, we've wanted these metrics to be readily available to inform our decision-making.

Here's a view of the metrics now being exposed, we can see that from a few concurrent PR test runs this afternoon the pool was depleted to a low point of 58%.
Screenshot 2024-04-30 at 4 49 04 PM

And link to Grafana panel showing READY resource ratios for tools pools, which is getting populated as of this morning.

When deciding whether to modify our pool size in the past, we've wanted these metrics to be readily available to inform our decision-making.
@okotsopoulos okotsopoulos requested review from a team, rushtong and fboulnois and removed request for a team April 30, 2024 20:47
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

@rushtong rushtong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable 👍🏽

@okotsopoulos okotsopoulos merged commit 53f8f8f into master May 1, 2024
4 checks passed
@okotsopoulos okotsopoulos deleted the okotsopo-DCJ-284-increase-tools-datarepo_v1-size branch May 1, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants