Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduled stop #431

Merged
merged 85 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
a29169e
Add stopping state.
alexhsamuel Dec 9, 2024
d1da248
Add stop config for Job.
alexhsamuel Dec 11, 2024
a39f372
Fix exception.
alexhsamuel Dec 13, 2024
21cb516
Stop schedules.
alexhsamuel Dec 13, 2024
8323f99
Refactor Schedule JSO logic.
alexhsamuel Dec 13, 2024
9063032
Just one schedule, no conds or actions.
alexhsamuel Dec 13, 2024
7361780
Attach StopSchedule to Schedule, but make them siblings in JSO.
alexhsamuel Dec 13, 2024
e8e9afd
Notes.
alexhsamuel Dec 13, 2024
5130ebf
Rename to stop_schedule.
alexhsamuel Dec 13, 2024
b93d8aa
Test.
alexhsamuel Dec 13, 2024
25b5c57
Get rid of Stop class.
alexhsamuel Dec 13, 2024
5cc2af9
Simplify stop schedule.
alexhsamuel Dec 13, 2024
ec8fd4e
We always go through the scheduled state.
alexhsamuel Dec 13, 2024
9a4439b
Set stop time in run.times.
alexhsamuel Dec 13, 2024
1048979
Update design notes.
alexhsamuel Dec 13, 2024
ba0d364
Todo.
alexhsamuel Dec 13, 2024
83fdd4b
Fix.
alexhsamuel Dec 13, 2024
cb05b2a
Set up to use stop time.
alexhsamuel Dec 13, 2024
de98cfe
Add FIXME.
alexhsamuel Dec 15, 2024
8b0c25d
Merge branch 'master' into feature/scheduled-stop
alexhsamuel Dec 17, 2024
f45b537
Provisional stop logic.
alexhsamuel Dec 17, 2024
7fbbc1b
Update .gitignore.
alexhsamuel Dec 17, 2024
ca714c2
Clean up logic.
alexhsamuel Dec 17, 2024
27f4640
Change stop time recording in run log.
alexhsamuel Dec 17, 2024
2e0506d
Update design.
alexhsamuel Dec 17, 2024
14ec55d
Add stop() to Program API.
alexhsamuel Dec 17, 2024
5e9cca1
stop() for no-op program.
alexhsamuel Dec 17, 2024
eecf760
Use Program.stop() for scheduled stop.
alexhsamuel Dec 17, 2024
fde2309
Add stop_time to schedule API.
alexhsamuel Dec 17, 2024
b3ad244
Todo.
alexhsamuel Dec 17, 2024
09c47c7
Todo.
alexhsamuel Dec 17, 2024
de6e3cf
Move _process_updates into its own module.
alexhsamuel Dec 17, 2024
a6e57dd
Fix.
alexhsamuel Dec 17, 2024
0082f77
Fix.
alexhsamuel Dec 17, 2024
9f59405
Allow signal while stopping.
alexhsamuel Dec 17, 2024
4c07602
Transition to stopping when stopping.
alexhsamuel Dec 17, 2024
4b2a177
Front end changes for stopping state.
alexhsamuel Dec 17, 2024
459e52d
Fixes.
alexhsamuel Dec 17, 2024
6f83ade
Update todo.
alexhsamuel Dec 17, 2024
a159f72
Split bound program, and fix stop.
alexhsamuel Dec 17, 2024
f3810ee
Procstar agent program stop implementation.
alexhsamuel Dec 17, 2024
d191cfd
Todo.
alexhsamuel Dec 18, 2024
e0d3a6f
Stopped run can succeed, WIP doesn't work.
alexhsamuel Dec 18, 2024
2b3a547
Refactor.
alexhsamuel Dec 19, 2024
ebd214f
Program may be missing in response.
alexhsamuel Dec 20, 2024
414b161
Running program object support and for no-op program.
alexhsamuel Dec 20, 2024
f818888
Reorganize module.
alexhsamuel Dec 20, 2024
543998d
ProcstarProgramRunning, WIP.
alexhsamuel Dec 20, 2024
d948b2a
Fixes.
alexhsamuel Dec 20, 2024
226115b
Support legacy programs in with RunningProgram.
alexhsamuel Dec 21, 2024
ea33b77
Fix.
alexhsamuel Jan 5, 2025
24600f5
RunningProgram for Procstar agent program.
alexhsamuel Jan 5, 2025
b9cb617
Fix running program for classic agent.
alexhsamuel Jan 5, 2025
f6c577e
Clean up.
alexhsamuel Jan 5, 2025
2ac12f6
Todo.
alexhsamuel Jan 5, 2025
3400fd6
Clean up no longer used.
alexhsamuel Jan 5, 2025
4299725
Fix stopping transition.
alexhsamuel Jan 5, 2025
08f7968
Record stop signals sent in meta.
alexhsamuel Jan 5, 2025
a31362f
Clean up.
alexhsamuel Jan 5, 2025
598ba04
Show stop time in run view.
alexhsamuel Jan 5, 2025
691f4ff
Test job.
alexhsamuel Jan 6, 2025
be75ee0
Add stop time to runs table.
alexhsamuel Jan 6, 2025
fcebe46
Procstar stop tests.
alexhsamuel Jan 6, 2025
be220a9
Another test.
alexhsamuel Jan 6, 2025
278a559
Update unit tests.
alexhsamuel Jan 7, 2025
ac644e5
Unit tests for ProcessProgram.
alexhsamuel Jan 7, 2025
b36bd13
Add BoundProcessProgram.
alexhsamuel Jan 7, 2025
88bb540
RunningProgram for process programs; factor out common Stop.
alexhsamuel Jan 8, 2025
e96257b
Round-trip JSO tests for programs.
alexhsamuel Jan 8, 2025
0677fd6
Todo.
alexhsamuel Jan 8, 2025
4af21a8
Clean up imports.
alexhsamuel Jan 8, 2025
27af132
Clean up import.
alexhsamuel Jan 8, 2025
e388e88
Stop run in API.
alexhsamuel Jan 8, 2025
5ea1c23
Proper stop behavior in process program.
alexhsamuel Jan 8, 2025
5256631
Agent stop tests.
alexhsamuel Jan 8, 2025
7a5f58e
Fix.
alexhsamuel Jan 8, 2025
db8c462
Fix.
alexhsamuel Jan 8, 2025
6533a06
Todo.
alexhsamuel Jan 8, 2025
1e50678
Stop operation in web UI.
alexhsamuel Jan 8, 2025
23b7bae
Stop operation in CLI.
alexhsamuel Jan 8, 2025
664bd4f
Todo.
alexhsamuel Jan 8, 2025
e0fb7e4
Fix states.
alexhsamuel Jan 9, 2025
e9429b3
Program stop docs.
alexhsamuel Jan 9, 2025
96363fa
Refactor.
alexhsamuel Jan 15, 2025
d7275fc
Docstring.
alexhsamuel Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
*.o
.pytest_cache
Makefile.local
apsis.db
apsis.db*
apsis.log
archive*.db
*-journal
Expand Down
11 changes: 11 additions & 0 deletions docs/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Each run, once created, is in one of these states:
- **waiting**: The run is waiting for a condition to be met.
- **starting**: The run is starting.
- **running**: The run has started and is currently running.
- **stopping**: Apsis is stopping the run.
- **success**: The run has completed successfully.
- **failure**: The run has completed unsuccesfully.
- **error**: Some other problem has occured with the run. This can include a
Expand Down Expand Up @@ -149,6 +150,16 @@ You can apply the following operations, to induce transitions explicitly:
- You can *skip* a **scheduled** or **waiting** run. Apsis no longer waits for
its schedule time or conditions, and transitions it to **skipped**.

- You can *stop* a **running** run. Apsis requests that the run shut down in an
orderly manner. How this works depends on the run's program. For a program
that runs a (local or remote) UNIX process, this entails sending a termination
signal (usuall SIGTERM), then waiting for a grace period and then sending
SIGKILL if the process has not terminated. While Apsis is waiting for the run
to terminate, it is in the **stopping** state.

You can also schedule Apsis to stop a run automatically; see
:ref:`stop-schedules`.

- You can *mark* a finished run (**success**, **failure**, **skipped**, or
**error**) to a different finished state.

47 changes: 47 additions & 0 deletions docs/programs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,53 @@ the user running the Procstar agent to run the command as the sudo user, without
any explicit password.


.. _program-stop:

Program Stop
------------

Many program types provide a stop method, by which Apsis can request an orderly
shutdown of the program before it terminates on its own. Keep in mind,

- Not all program types provide a program stop.
- The program may not stop immediately.
- The program stop may fail.

Apsis requests a program to stop if the program's run is configured with a stop
schedule, or in response to an explicit stop operation invoked by the user.

Before Apsis requests a program to stop, it transitions the run to the
*stopping* state. If the program terminates correctly in response to the stop
request, Apsis transitions the run to *success*; if the program terminates in an
unexpected way, *failure*.

The program types above that create a UNIX process (`program`, `shell`,
`procstar`, `procstar-shell`) all implement program stop similarly. In response
to a program stop request,

1. Apsis immediately sends the process a signal, by default `SIGTERM`.
2. Apsis waits for the process to terminate, up to a configured grace period, by
default 60 seconds.
3. If the process has not terminated, Apsis sends it `SIGKILL`.

To configure the program stop, use the `stop` key. For example, Apsis will
request this program to stop by sending `SIGUSR2` instead of `SIGTERM`, and will
only wait 15 seconds before sending `SIGKILL`.

.. code:: yaml

program:
type: procstar
group_id: default
argv: ["/usr/bin/echo", "Hello, world!"]
stop:
signal: SIGUSR2
grace_period: 15s

If the program terminates with an exit status that indicates the process ended
from `SIGUSR2`, Apsis considers the run to have succeeded.


Internal Programs
-----------------

Expand Down
51 changes: 51 additions & 0 deletions docs/schedules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,3 +210,54 @@ transition, the schedule will include no times on this date at all. For
example, a daily schedule with a start time between 2:00:00 and 3:00:00 and a
U.S. time zone will contain no times on the dates in the spring when DST begins.


.. _stop-schedules:

Stop schedules
--------------

You can also configure a job so that a run will stop at a certain time. When
this time elapsis, Apsis stops the run, assuming it is running. The ways you
schedule a run to stop are different from the schedule types above, since the
stop time is generally related to the schedule time.

To configure a stop schedule in addition to the normal start schedule, place the
latter into `start:` subkey, and use `stop:` to specify the stop schedule. The
available stop schedule types are `duration` and `daytime`. For example,

.. code:: yaml

schedule:
start:
type: daily
daytime: 10:30:00
tz: Europe/Berlin
stop:
type: duration
duration: 30m

This instructs Apsis to run the program for 30 minutes after the schedule time,
namely 11:00 Europe/Berlin.

The `daytime` stop schedule type instructs Apsis to stop the run at the next
occurence of a specific daytime after the schedule time. The following example
has the same effect as the previous:

.. code:: yaml

schedule:
start:
type: daily
daytime: 10:30:00
tz: Europe/Berlin
stop:
type: daytime
daytime: 11:00:00
tz: Europe/Berlin

You can use any of the schedule types in the previous section for the `start`
schedule.

When the stop time elapses, Apsis stops the run in accordance with the program's
stop method. This is discussed in :ref:`program-stop`.

31 changes: 31 additions & 0 deletions notes/notes.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,34 @@
# Scheduled stop

THIS IS CORRECT:
```yaml
program:
type: agent
argv: ["/path/to/my/service", "--foreground"]
stop:
signal: SIGTERM
grace_period: 1m
self_stop: false

schedule:
start:
type: interval
interval: 1h
stop:
type: duration
duration: 30m
```


| | running | stopping |
|-----------|---------|----------|
| exit ==0 | success | success |
| exit !=0 | failure | failure |
| stop sig | failure | success |
| other sig | failure | failure |



# SQLite

Performance test of appending to string fields, in `work/sqlite-concat.py`.
Expand Down
48 changes: 15 additions & 33 deletions notes/todo.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,3 @@
# Live updates

WebSocket API for web UI:
- run updates for all runs (as current)
- run log updates for a single run
- run data updates for a single run
- agent changes: connect, disconnect, timeout
- job changes

Persistent connection:
- run changes (run summaries only)
- job changes
- Procstar connection changes
- Apsis log (if we keep this)

Transient connection (single run):
- run log updates
- run output data updates
- run metadata updates


### The Plan

- [x] design internal run publisher protocol
Expand Down Expand Up @@ -54,26 +33,29 @@ Transient connection (single run):
- [x] move compression out of programs
- [x] roll in Procstar agent changes to `/summary`
- [ ] clean up API endpoints we don't need anymore
- [ ] roll in, or get rid of, log socket
- [x] roll in, or get rid of, log socket
- [x] add live endpoints to `Client`
- [x] live updates in CLUI
- [x] output
- [x] run log
- [x] improve and clean up `State` enum

- [x] scheduled stop
- [x] add stop method to program
- [x] actually stop the program at the stop time
- [x] go through _stopping_ state
- [x] Procstar stop method
- [x] add stop time to runs table
- [x] classic agent program stop method
- [x] ProcessProgram: RunningProgram and stop method
- [x] stop run API endpoint
- [x] "stop" operation
- [x] add stop time to run view
- [x] docs
- [x] refactor `apsis.stop` module
- [ ] if `send_signal` raises, error the run
- [ ] improve `apsis job` output style


# Exantium tasks?

- internals presentation
- reliable integration tests
- update Python deps, especially Sanic
- update JS deps, especially Vue3
- database schema cleanup
- move metadata into its own table; remove `meta` from `Apsis._transition()`
- code cleanup, reorg, documentation
- [ ] improve `apsis job` output style


# Cleanup
Expand Down
Loading
Loading