Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Litestream replica stress test #340

Merged

Conversation

LudovicoRighi
Copy link
Collaborator

@LudovicoRighi LudovicoRighi commented Jul 29, 2024

As suggested here: https://github.com/alexhsamuel/apsis/pull/338#issuecomment-2241202464, I'm adding a stress test.
In particular, in the test 500 runs are started immediately and other 500 are scheduled with a small delay.

@alexhsamuel let me know if this is a reasonable load to run the test with.

The test takes ~20 sec to pass.

@LudovicoRighi LudovicoRighi force-pushed the litestream-stress-test branch from 39afc5c to caa9324 Compare July 29, 2024 17:30

client = inst.client
# populate apsis db with a large number of runs
with ThreadPoolExecutor() as exe:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this without the executor? The API handler is single-threaded async, so it's not clear to me that concurrent access will be significantly faster than sequential.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in fact getting very similar results, that would explain why

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I suggest removing the executor then, to keep things simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good

# populate apsis db with a large number of runs
with ThreadPoolExecutor() as exe:
runs_fut = [exe.submit(client.schedule, "sleep", {"duration": "9"}) for _ in range(num_jobs)]
sched_runs_fut = [exe.submit(client.schedule, "sleep", {"duration": "1"}, time=ora.now() + 14) for _ in range(num_jobs)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 9 and 14 come from? Did you determine this empirically while running it yourself? Seems like the right numbers to use will depend both on num_jobs as well as on the environment the test is running in.

Copy link
Collaborator Author

@LudovicoRighi LudovicoRighi Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you determine this empirically while running it yourself?

Correct.

Seems like the right numbers to use will depend both on num_jobs as well as on the environment the test is running in.

I could easily make these durations dependent on num_jobs; as per the environment, which factors would you take into account?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the environment, which factors would you take into account?

Nothing useful to suggest. Just keep in mind that the VM that runs the GH Agent CI might be somewhat slower than your hosts.

Ultimately it would be good to build an integration test setup for Apsis with deterministic time, i.e. the test can roll application time forward explicitly, to avoid these kind of timing issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thank you

@alexhsamuel
Copy link
Collaborator

Great!

@alexhsamuel alexhsamuel merged commit 2916e79 into apsis-scheduler:master Aug 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants