-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Litestream replica stress test #340
Add Litestream replica stress test #340
Conversation
39afc5c
to
caa9324
Compare
test/int/litestream/test_replica.py
Outdated
|
||
client = inst.client | ||
# populate apsis db with a large number of runs | ||
with ThreadPoolExecutor() as exe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried this without the executor? The API handler is single-threaded async, so it's not clear to me that concurrent access will be significantly faster than sequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in fact getting very similar results, that would explain why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I suggest removing the executor then, to keep things simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good
test/int/litestream/test_replica.py
Outdated
# populate apsis db with a large number of runs | ||
with ThreadPoolExecutor() as exe: | ||
runs_fut = [exe.submit(client.schedule, "sleep", {"duration": "9"}) for _ in range(num_jobs)] | ||
sched_runs_fut = [exe.submit(client.schedule, "sleep", {"duration": "1"}, time=ora.now() + 14) for _ in range(num_jobs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the 9 and 14 come from? Did you determine this empirically while running it yourself? Seems like the right numbers to use will depend both on num_jobs
as well as on the environment the test is running in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you determine this empirically while running it yourself?
Correct.
Seems like the right numbers to use will depend both on
num_jobs
as well as on the environment the test is running in.
I could easily make these durations dependent on num_jobs
; as per the environment, which factors would you take into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the environment, which factors would you take into account?
Nothing useful to suggest. Just keep in mind that the VM that runs the GH Agent CI might be somewhat slower than your hosts.
Ultimately it would be good to build an integration test setup for Apsis with deterministic time, i.e. the test can roll application time forward explicitly, to avoid these kind of timing issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thank you
Great! |
As suggested here: https://github.com/alexhsamuel/apsis/pull/338#issuecomment-2241202464, I'm adding a stress test.
In particular, in the test 500 runs are started immediately and other 500 are scheduled with a small delay.
@alexhsamuel let me know if this is a reasonable load to run the test with.
The test takes ~20 sec to pass.