Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Litestream Apsis db replica test #338

Merged
merged 11 commits into from
Jul 23, 2024

Conversation

LudovicoRighi
Copy link
Collaborator

Add test that mimics the situation where Apsis must be restarted using a db created from a Litestream replica.

@alexhsamuel
Copy link
Collaborator

LGTM. This is promising.

Not necessarily here, but it would also be good to test that the replica is usable if litestream goes down hard, i.e. SIGKILL. Also network partition, which you can't necessarily test, but perhaps suspend the process as a proxy. Likewise killing Apsis rather than orderly shutdown.

I think stress tests would also be a good idea. Kill Apsis and/or litestream while Apsis is busy and in particular some runs are running.

@LudovicoRighi
Copy link
Collaborator Author

LudovicoRighi commented Jul 22, 2024

I've added another test where:

  • Apsis and Litestream processes are terminated with SIGKILL signal;
  • Apsis is killed while there is still a run in the running state;
[lrighi]🌐 appl112-ny2 in apsis [ litestream][!?⇡][📦 v0.22.7][🐍 v3.10.12][🅒 prod7][⏱ 3s]
> pytest test/int/litestream/test_replica.py
===================================================================================================================== test session starts =====================================================================================================================
(...)
collected 2 items                                                                                                                                                                                                                                             

test/int/litestream/test_replica.py ..                                                                                                                                                                                                                  [100%]

================================================================================================================ 2 passed in 66.09s (0:01:06) =================================================================================================================

Also network partition, which you can't necessarily test, but perhaps suspend the process as a proxy.

@alexhsamuel, could you please elaborate a bit more on this idea?

@alexhsamuel
Copy link
Collaborator

Sorry if I wasn't clear. Please don't add new tests/functionality after the initial PR. Let's get this passing CI and merged, and use new PRs for further tests.

assert any(restored_db_name in l for l in log)

# check runs data is accurate in the restored db
client = inst.client
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have this above.

# -------------------------------------------------------------------------------


def test_replica():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a @pytest.mark.skipif if litestream isn't available.

expected_states = ["success", "failure", "error", "skipped"]
for state in expected_states:
run_id = client.schedule("job1", {})["run_id"]
inst.wait_run(run_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that the run succeeded.

@LudovicoRighi
Copy link
Collaborator Author

Ok sounds good, thanks!

Let's get this passing CI and merged

@alexhsamuel I don't have permissions for neither of the two, so please try to trigger the GH workflow.

)
except (subprocess.CalledProcessError, FileNotFoundError):
return False
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use shutil.which() for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alexhsamuel alexhsamuel merged commit 140b5a3 into apsis-scheduler:master Jul 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants