Skip to content

0.5.0 - Slurm config arrays, multi-node, CLI improvements

Compare
Choose a tag to compare
@n-gao n-gao released this 21 Jun 15:28
· 40 commits to master since this release

This release includes breaking changes to how jobs and experiments are associated. Further, a lot of convenience commands have been added.

New Experiment - Slurm Job Association

Previously, each experiment was directly associated with a specific job (by default, 1:1 or 1:n if experiments_per_job: n). This was possible as each seml config may only contain a single slurm config. Now, the slurm block accepts an array of slurm configs. This change implies that at submission time, the experiment is unaware of which Slurm job may execute it. Each slurm job then greedily pulls the experiments. This allows the following configuration to better utilize larger GPUs:

slurm:
  - experiments_per_job: 1
    sbatch_options:
      gres: gpu:1
      partition: gpu_gtx1080
  - experiments_per_job: 8
    sbatch_options:
      gres: gpu:1
      partition: gpu_a100

This will now spawn two job arrays, one on the gtx1080 partition and one on the gpu_a100 partition, which both greedily pull jobs to be executed.

Breaking Changes

The document layout for an experiment in the MongoDB has been altered! Old collections will be automatically migrated to the new format but there is no going back! Make sure to call seml --migration-backup <col> if you want seml to first create a backup.

  • The document layout for the MongoDB has been altered. Automatic migration is added, though we urge users to be careful!
  • When running with multiple experiments per job, each job now gets its own log file.
  • This is likely the last release supporting Python 3.8 and Python 3.9

Features

  • seml now supports multi-process jobs (and multi-node jobs) (#135)
  • seml now supports multiple slurm configurations (#137)
  • The collection cache for autocompletion is automatically refreshed when calling add, delete or drop.
  • We now support src-layouts by converting them to a flat-layout at runtime.
  • Added seml <col> restore-sources <path> to restore source files form the MongoDB.
  • Added seml <col> hold and seml <col> release to hold and release slurm jobs.
  • Added seml <col> print-experiment to retrieve experiment documents from the database.
  • Added seml queue to print the collection of slurm jobs (only works for jobs submitted with this version or newer)
  • For debugging, seml now supports clickable vscode links (#133)
  • Experiment names may now contain / to create log directories.
  • Cancel experiments by default when deleting them.
  • Seml also suggests commands that do not need a collection like drop, queue or list in autocompletion.
  • The CLI has been organized into groups.
  • Autocompletion got faster.

Development

  • Updated CI to use pre-commit and uv instead of pip.
  • The repo has been restructured.

Fixes

  • SSH connections are now handled in a separate process that tracks its health.
  • Fixed multi-user SSH port forwarding.
  • When encountering non-unicode symbols in reading the file output, we replace them with the Unicode replacement character.