0.5.0 - Slurm config arrays, multi-node, CLI improvements
This release includes breaking changes to how jobs and experiments are associated. Further, a lot of convenience commands have been added.
New Experiment - Slurm Job Association
Previously, each experiment was directly associated with a specific job (by default, 1:1 or 1:n if experiments_per_job: n
). This was possible as each seml config may only contain a single slurm config. Now, the slurm
block accepts an array of slurm configs. This change implies that at submission time, the experiment is unaware of which Slurm job may execute it. Each slurm job then greedily pulls the experiments. This allows the following configuration to better utilize larger GPUs:
slurm:
- experiments_per_job: 1
sbatch_options:
gres: gpu:1
partition: gpu_gtx1080
- experiments_per_job: 8
sbatch_options:
gres: gpu:1
partition: gpu_a100
This will now spawn two job arrays, one on the gtx1080 partition and one on the gpu_a100 partition, which both greedily pull jobs to be executed.
Breaking Changes
The document layout for an experiment in the MongoDB has been altered! Old collections will be automatically migrated to the new format but there is no going back! Make sure to call seml --migration-backup <col>
if you want seml to first create a backup.
- The document layout for the MongoDB has been altered. Automatic migration is added, though we urge users to be careful!
- When running with multiple experiments per job, each job now gets its own log file.
- This is likely the last release supporting Python 3.8 and Python 3.9
Features
- seml now supports multi-process jobs (and multi-node jobs) (#135)
- seml now supports multiple slurm configurations (#137)
- The collection cache for autocompletion is automatically refreshed when calling
add
,delete
ordrop
. - We now support src-layouts by converting them to a flat-layout at runtime.
- Added
seml <col> restore-sources <path>
to restore source files form the MongoDB. - Added
seml <col> hold
andseml <col> release
to hold and release slurm jobs. - Added
seml <col> print-experiment
to retrieve experiment documents from the database. - Added
seml queue
to print the collection of slurm jobs (only works for jobs submitted with this version or newer) - For debugging, seml now supports clickable vscode links (#133)
- Experiment names may now contain
/
to create log directories. - Cancel experiments by default when deleting them.
- Seml also suggests commands that do not need a collection like
drop
,queue
orlist
in autocompletion. - The CLI has been organized into groups.
- Autocompletion got faster.
Development
- Updated CI to use pre-commit and
uv
instead ofpip
. - The repo has been restructured.
Fixes
- SSH connections are now handled in a separate process that tracks its health.
- Fixed multi-user SSH port forwarding.
- When encountering non-unicode symbols in reading the file output, we replace them with the Unicode replacement character.