Releases: oumi-ai/oumi
Releases · oumi-ai/oumi
v0.1.4
What's Changed
- Add memory cleanup calls in e2e integration tests by @xrdaukar in #1277
- Set up versioning for our documentation by @taenin in #1275
- Make
qwen2-VL
evaluation job pass by @xrdaukar in #1278 - Add multi-modal (vlm) notebook with Llama 11B by @optas in #1258
- Documentation: Inference -> List supported models by @kaisopos in #1279
- [tiny] update website link by @oelachqar in #1280
- Update all documentation links to the new doc URL by @taenin in #1281
- Update Oumi - A Tour.ipynb by @brragorn in #1282
- Documentation: Judge (minor edits) by @kaisopos in #1283
- Fix citation by @oelachqar in #1285
- Add Deepseek R1 1.5B/32B configs by @wizeng23 in #1276
- Misc eval configs cleanup by @xrdaukar in #1286
- [docs] Describe parallel evaluation by @xrdaukar in #1284
- Update
microsoft/Phi-3-vision-128k-instruct
training config by @xrdaukar in #1287 - Add Together Deepseek R1 inference config by @wizeng23 in #1289
- [minor] vlm notebook minor updates (doc referencing, freeze visual backbone) by @optas in #1288
- Add missing
-m oumi evaluate
argument in eval config by @xrdaukar in #1291 - [docs] Add more references to VL-SFT and SFT notebooks by @xrdaukar in #1293
- Eval config change for
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
by @xrdaukar in #1292 - [notebooks] Update intro & installation instruction by @oelachqar in #1294
- Update notebook intros by @oelachqar in #1296
- [notebooks] Update installation instructions for colab by @oelachqar in #1297
- Add Apache license header to
src/oumi/**/*.py
by @wizeng23 in #1290 - Minor updates to VLM Multimodal notebook by @xrdaukar in #1299
- [docs] Add latest notebooks and update references by @oelachqar in #1300
- [tiny] Add docs auto-generated
.rst
files to gitignore by @wizeng23 in #1298 - [tiny] use GitHub link for header by @oelachqar in #1301
- [docs][tiny] update inference engines reference by @oelachqar in #1302
- Update README/docs to add new DeepSeek models by @wizeng23 in #1304
- [docs] Use
pip install oumi
overpip install .
by @wizeng23 in #1305 - Tune VLM SFT configs by @xrdaukar in #1306
- Tune VLM configs for SmolVLM and Qwen2-VL by @xrdaukar in #1307
- Update config/notebook pip installs to use PyPI by @wizeng23 in #1308
- [tiny] upgrade torch version by @oelachqar in #1295
- Update logging and unit tests related to chat templates by @xrdaukar in #1311
- fix(docs): "interested by joining" to "interested in joining" by @CharlesCNorton in #1312
- Add HF_TOKEN instructions to Oumi Multimodal notebook by @xrdaukar in #1313
- Update configuration.md by @penfever in #1314
- remove duplicate keys in config example by @lucyknada in #1315
- [Notebooks] Update VLM notebook by @xrdaukar in #1317
- Update parasail_inference_engine.py by @jgreer013 in #1320
- Fix typo and update warning message for OUMI trainer by @xrdaukar in #1319
- [Notebooks] Add a note that a notebook kernel restart may be needed after
pip install oumi
by @xrdaukar in #1318 - Update Phi3 to support multiple images by @xrdaukar in #1321
- Add more detailed comment headers to YAML configs by @wizeng23 in #1310
- [Notebooks] Add a note to Tour notebook to restart kernel after the first
pip install
by @xrdaukar in #1327 - Tweak
--mem-fraction-static
param in sample SGLang configs by @xrdaukar in #1328 - Disallow using
DatasetParams
field names as keys inDatasetParams.dataset_kwargs
by @xrdaukar in #1324 - Support
dataset_name_override
dataset_kwarg by @xrdaukar in #1188 - Add an util and a test marker for HF token by @xrdaukar in #1329
- Update
llama3-instruct
chat template to align with the original models template by @xrdaukar in #1326 - chore: update launcher.sh by @eltociear in #1333
- [Notebooks] Minor improvements in VLM and CNN notebooks by @xrdaukar in #1335
- Update VLM cluster names in sample commands by @xrdaukar in #1336
- Update our README and docs with the github trending badge. by @taenin in #1340
- Update README.md - Add DeepSeek to supported models by @mkoukoumidis in #1343
- Update index.md - Add DeepSeek to supported models by @mkoukoumidis in #1344
- Update "GPU Tests" status badge in README page by @xrdaukar in #1345
New Contributors
- @CharlesCNorton made their first contribution in #1312
- @lucyknada made their first contribution in #1315
- @eltociear made their first contribution in #1333
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Documentation: Judge | Custom Model page by @kaisopos in #1195
- [WIP] Add a notebook for using CNN with custom dataset by @xrdaukar in #1196
- [Cherrypick for launch] Evaluate: return dict of results by @kaisopos in #1197
- Configs Train/Infer/Eval and Llama 3.3v (70b) by @optas in #1200
- Adding an integration test for evaluation fn's output (see PR-1197) by @kaisopos in #1199
- [docs] Add more details and cross-references related to customization by @xrdaukar in #1198
- Define
single_gpu
test marker by @xrdaukar in #1201 - Native inference: Don't set
min_p
,temperature
inGenerationConfig
if sampling is disabled by @xrdaukar in #1202 - Update tests to make them runnable on GCP by @xrdaukar in #1203
- Add newline before
pformat(train_config)
by @xrdaukar in #1204 - GCP tests launcher script changes by @xrdaukar in #1205
- [Evaluation] Bug: serialization by @kaisopos in #1207
- [docs] Add inference snippet for together.ai and DeepSeek APIs by @oelachqar in #1208
- Exclude
multi_gpu
tests from GitHub GPU tests by @xrdaukar in #1210 - Update e2e tests to support multi-GPU machines by @xrdaukar in #1206
- Add wrappers for remote inference engines by @oelachqar in #1209
- Vision-Lang & Inference (including LoRA) by @optas in #1174
- [BugFix] Throw a runtime error for quantized models & inference=VLLM by @kaisopos in #1212
- Fix most job configs by @wizeng23 in #1213
- e2e tests update by @xrdaukar in #1216
- [Notebook] Evaluation with Oumi by @kaisopos in #1218
- gpt2: move
include_performance_metrics
param from script to yaml by @xrdaukar in #1217 - Simplify inference engine API by @oelachqar in #1214
- Move configs to experimental by @wizeng23 in #1215
- [docs] Update index page by @oelachqar in #1220
- Update ConsoleLogger to write to STDOUT by @xrdaukar in #1221
- Set
use_spot
to False in our JobConfigs by @wizeng23 in #1222 - Delete
oumi[optional]
install target by @wizeng23 in #1224 - Scaffolding and the first testcase for e2e evaluation tests by @xrdaukar in #1225
- [docs] Update inference engines doc page by @oelachqar in #1227
- Clean-up inference engine builder by @oelachqar in #1226
- [VLLM Engine] Enabling BitsAndBytes quantization by @kaisopos in #1223
- Add example distillation notebook by @jgreer013 in #1228
- Add a script to pre-download models for
gpu_tests
by @xrdaukar in #1231 - Fix multi-GPU inference integration test by @xrdaukar in #1229
- [tiny][docs] Update PEFT/LoRA content by @optas in #1233
- [BugFix] GGUF does not work with VLLM by @kaisopos in #1232
- Re-enable parallel evaluation for VLM-s by @xrdaukar in #1235
- Add multimodal exemplar dataset in our provided mini-datasets by @optas in #1234
- [Tiny] renaming a field name (
init_lora_weights
) by @optas in #1236 - Add more e2e evaluation tests by @xrdaukar in #1237
- Fix pyright breakage when vllm and llama_cpp are not installed by @taenin in #1240
- Update our oumi launch documentation. by @taenin in #1239
- Update index.md title for "Join the Community!" by @mkoukoumidis in #1242
- Update quickstart.md - nit for Oumi support request by @mkoukoumidis in #1241
- [VLLM Engine] Improve support for GGUF models (incl. auto-download) by @kaisopos in #1238
- Update README.md title to "Join the Community!" by @mkoukoumidis in #1243
- Update quickstart.md by @brragorn in #1251
- Update quickstart.md by @brragorn in #1253
- Update quickstart.md by @brragorn in #1252
- Update quickstart.md by @brragorn in #1250
- [Minor refactor] Moving model caching to
oumi.utils
by @kaisopos in #1246 - Add more details to troubleshooting FAQ by @wizeng23 in #1249
- Update training_methods.md - Change compute requirement suggestions by @mkoukoumidis in #1245
- Update train.md - nit description change by @mkoukoumidis in #1244
- [docs] misc docs feedback by @oelachqar in #1248
- [tiny] Qwen2-VL activate experimental datapipes by @optas in #1247
- Update Oumi - A Tour.ipynb by @brragorn in #1254
- [docs] more docs feedback by @oelachqar in #1255
- Update supported_models.md by @penfever in #1256
- Rename
experimental_use_torch_datapipes
data param by @xrdaukar in #1257 - Add pypi release workflow using testpypi by @oelachqar in #1259
- Update workflow names by @oelachqar in #1262
- Update default idle_minutes_to_autostop to 1 hour. by @taenin in #1264
- update pypi release workflow to use trusted env by @oelachqar in #1265
- Add
padding_side
param to internal model config by @xrdaukar in #1260 - Documentation: Updates on Evaluation/Judge (based on Manos' feedback) by @kaisopos in #1261
- [tiny] less strict requirements by @oelachqar in #1266
- Add Deepseek R1 Distill Llama 8B/70B configs by @wizeng23 in #1263
- Update index.md to highlight beta stage by @mkoukoumidis in #1268
- Update README.md to highlight beta stage by @mkoukoumidis in #1267
- Disable pre-release packages by @oelachqar in #1270
- Update common_workflows.md - Clarify OpenAI is just an example by @mkoukoumidis in #1271
- Documentation: Evaluation page (update to highlight multi-modal) by @kaisopos in #1269
- Update launch.md by @taenin in #1272
- Add pypi release workflow by @oelachqar in #1273
- Documentation: Judge | minor edit (bold) by @kaisopos in #1274
Full Changelog: v0.1.2...v0.1.3
v0.1.2.3
What's Changed
- Re-enable parallel evaluation for VLM-s by @xrdaukar in #1235
- Add multimodal exemplar dataset in our provided mini-datasets by @optas in #1234
- [Tiny] renaming a field name (
init_lora_weights
) by @optas in #1236 - Add more e2e evaluation tests by @xrdaukar in #1237
- Fix pyright breakage when vllm and llama_cpp are not installed by @taenin in #1240
- Update our oumi launch documentation. by @taenin in #1239
- Update index.md title for "Join the Community!" by @mkoukoumidis in #1242
- Update quickstart.md - nit for Oumi support request by @mkoukoumidis in #1241
- [VLLM Engine] Improve support for GGUF models (incl. auto-download) by @kaisopos in #1238
- Update README.md title to "Join the Community!" by @mkoukoumidis in #1243
- Update quickstart.md by @brragorn in #1251
- Update quickstart.md by @brragorn in #1253
- Update quickstart.md by @brragorn in #1252
- Update quickstart.md by @brragorn in #1250
- [Minor refactor] Moving model caching to
oumi.utils
by @kaisopos in #1246 - Add more details to troubleshooting FAQ by @wizeng23 in #1249
- Update training_methods.md - Change compute requirement suggestions by @mkoukoumidis in #1245
- Update train.md - nit description change by @mkoukoumidis in #1244
- [docs] misc docs feedback by @oelachqar in #1248
- [tiny] Qwen2-VL activate experimental datapipes by @optas in #1247
- Update Oumi - A Tour.ipynb by @brragorn in #1254
- [docs] more docs feedback by @oelachqar in #1255
- Update supported_models.md by @penfever in #1256
- Rename
experimental_use_torch_datapipes
data param by @xrdaukar in #1257 - Add pypi release workflow using testpypi by @oelachqar in #1259
- Update workflow names by @oelachqar in #1262
- Update default idle_minutes_to_autostop to 1 hour. by @taenin in #1264
- update pypi release workflow to use trusted env by @oelachqar in #1265
Full Changelog: v0.1.2.2...v0.1.2.3
v0.1.2.0-alpha
What's Changed
- Update README.md - Better highlight features & nits by @mkoukoumidis in #995
- [tiny] update docstring and cleanup by @oelachqar in #1006
Qwen2-VL
: minor updates by @xrdaukar in #1000- Update README.md - Describe Oumi's most common capabilities by @mkoukoumidis in #996
- Fix readme. by @taenin in #1009
- Updated our ascii logo by @taenin in #1008
- [docs] Update readme by @oelachqar in #1010
- Cleanup scripts by @oelachqar in #1011
- Cleanup experimental folder by @oelachqar in #1012
- Update lists of supported VLM-s in README and docs by @xrdaukar in #1014
- Freeze Python package versions by @xrdaukar in #1007
- Update
blip2
's chat template to use the "default" one by @xrdaukar in #1015 - Add docstrings how to start vLLM and SGLang servers for
Llama-3.2-11B-Vision-Instruct
by @xrdaukar in #1016 - Evaluation: bugfixing, corner case, unit tests by @kaisopos in #1003
- Configure
asyncio_default_fixture_loop_scope
to reduce pytest warnings by @xrdaukar in #1013 - Update the registry to load registered core values upon use. by @taenin in #1017
- Update default installation instructions to pypi by @taenin in #1018
- [tiny] Update debug datasets by @oelachqar in #1020
- [docs] Address misc docs feedback by @oelachqar in #1019
- [tiny] update pre-defined judges and docs by @oelachqar in #1021
- Parameterize e2e training test, and add config for
Qwen2-VL
by @xrdaukar in #1023 - Remove our docs password from the readme. by @taenin in #1024
- VLM docs update by @xrdaukar in #1025
- Fix loading registered pretrain datasets by @wizeng23 in #1005
- Update
@requires_gpus
test decorator to optionally specify min GPU memory requirement by @xrdaukar in #1029 - [tiny] Update GitHub workflows by @oelachqar in #1034
- Update
BaseConfig.from_yaml
to also support Path by @xrdaukar in #1026 - [tiny] Cleanup judge engine builder & fix circular dep by @oelachqar in #1035
- Create GPU GitHub Actions workflow by @oelachqar in #1004
- Add structured outputs support to gemini/vertex engines by @oelachqar in #1022
- [docs] Fix feedback on training and inference user guides by @oelachqar in #1037
- [docs][tiny] fix examples in inference guide by @oelachqar in #1038
- Add a sanity test for circular imports. by @taenin in #1030
- Resolve circular dependencies in Oumi by @taenin in #1039
- Move our circular dependency test to e2e to speed up GPU CI tests. by @taenin in #1040
- Add custom inference engine for gemini API by @oelachqar in #1036
- Define CLI in our quickstart. by @taenin in #1042
- Skip running GPU tests on low-risk code paths by @oelachqar in #1043
- Define more terms in our training docs. by @taenin in #1044
- Fix the broken python text snippet on the train page. by @taenin in #1045
- Fix the second python snippet in the train page. by @taenin in #1046
- [docs] Add Gemini to the list of supported inference API-s, and sort them by @xrdaukar in #1048
- Fix issues in most notebooks by @wizeng23 in #1047
- [docs][tiny] remove termynal from sphinx conf by @oelachqar in #1041
- Fix a typo in the VS Code environment page. by @taenin in #1049
- Define WSL in our vscode docs. by @taenin in #1052
- [tiny] disable unit tests on safe paths by @oelachqar in #1051
- [docs] Fix contributing and open issue links by @oelachqar in #1050
- [evaluations/generative_benchmark] Broken link by @kaisopos in #1054
- Remove dangling reference to
jupyter
in Makefile help by @xrdaukar in #1053 - [evaluations/generative_benchmark] Removing notebook link by @kaisopos in #1055
- Support constrained decoding in SGLang inference engine by @xrdaukar in #1032
- [tiny] Update tutorials page by @wizeng23 in #1056
- Minor updates to Launch.md by @taenin in #1059
- [docs] Update docs/user_guides/infer/infer.md by @xrdaukar in #1058
- Nits for common_workflows.md by @mkoukoumidis in #1061
- Nit fixes for acknowledgements.md by @mkoukoumidis in #1057
- Add sample trouble shooting for remote jobs. by @taenin in #1062
- Add a Github Issues selector for questions and have it redirect to Discord. by @taenin in #1064
- Package checking: Adding functionality for checking package versioning and fast failing by @kaisopos in #1031
- Fix various typos in contributing.md by @taenin in #1066
- SGLang inference documentation by @xrdaukar in #1065
- Replace assert in
NativeInferenceEngine
withRuntimeError
by @xrdaukar in #1068 - Update dev set up instructions to use a Fork. by @taenin in #1067
- Define inference configs for more models by @xrdaukar in #1069
- [Evaluation] HF Leaderboards yaml files by @kaisopos in #1071
- Specify
engine: NATIVE
is inference configs by @xrdaukar in #1075 - Improve handling of image path and URLs by @xrdaukar in #1074
- [Doc > Quickstart] Should we add links to guides for better discoverability? by @kaisopos in #1076
- Add e2e tests for running tutorial notebooks by @oelachqar in #1079
- Ignore all experimental files when running our circular dependency test. by @taenin in #1081
- [Super Nit Doc Update] environments.md by @kaisopos in #1082
- Add an env var for loading user registered values (dataset, models, clouds) when initializing the Oumi Registry by @taenin in #1077
- Update internal model configs to support default
tokenizer_pad_token
andchat_template
by model type by @xrdaukar in #1078 - [Minor] Notebook typo by @kaisopos in #1085
- Upgrade transformers to 4.47 by @wizeng23 in #1033
- [tiny][docs] Update recipes page by @wizeng23 in #1072
- Configure e2e integration test for Llama 3.2 Vision 11B by @xrdaukar in #1086
- Nits for cli_reference.md by @mkoukoumidis in #1063
- [Documentation] Evaluate | Leaderboards Page by @kaisopos in #1084
- [Documentation] Evaluate | Main Page (revision) by @kaisopos in #1089
- [tiny] Fix precommit by @oelachqar in #1092
- Add timeout for unit & integration tests by @oelachqar in #1091
- Add GitHub Actions workflow for doctests by @oelachqar in #1093
- [docs] remove unused page, fix links by @oelachqar in #1094
- [Documentation] Evaluate | Main Page (small refactor) by @kaisopos in #1095
- Rewrite of the main Oumi Launch page. by @taenin in #1087
- Remove
pytest.mark.skip()
for basic e2e tests by @xrdaukar in #1088 - [tiny] Upgrade minimum numpy version to unblock python3.12 installation by @oelachqar in #1099
- Update our Readme with a new header image. by @taenin in #1098
- [docs] Minor refresh to dataset resource pages by @oelachqar in #1097
- [docs] Add docs guide page by @oelachqar in #1096
- Add a quick unit test to ensure new dependencies are not added to the top-level CLI by @taenin in https://github.com/o...
v0.1.1.0-alpha.1
What's Changed
- Minimal SkyPilot config for
blip2
andllava
models for GCP withTRL_SFT
by @xrdaukar in #573 - Inference Engine async writes by @taenin in #574
- Misc cleanups in
JsonlinesDataset
by @xrdaukar in #576 - Split out cloud dependencies by @taenin in #575
- Disable
sdpa
forblip2
by @xrdaukar in #579 - Set accelerate version to fix FSDP model saving by @wizeng23 in #580
- Remove AWS as a default dep by @taenin in #582
- Update
ProfilerParams
docstrings to follow the new style by @xrdaukar in #583 - Minor updates in
scripts/benchmarks/minimal_multimodal_training.py
by @xrdaukar in #585 - Add
@override
annotations to methods of few Dataset subclasses by @xrdaukar in #584 - Add dataset class for dolly dataset by @oelachqar in #586
- Refactor debugging/device utils, and add new GPU stats measurement functions by @xrdaukar in #587
- Add text jsonlines dataset class by @oelachqar in #589
- Define
DataCollationParams
by @xrdaukar in #581 - Misc updates to Polaris launcher scripts by @xrdaukar in #591
- Set up a new version of the Oumi CLI using Typer by @taenin in #588
- Update handling of GPU fan speed info by @xrdaukar in #595
- Add support for magpie dataset variants by @oelachqar in #594
- Rename GenerationConfig to GenerationParams by @wizeng23 in #592
- Fix cli infer test by @wizeng23 in #598
- Judge Notebook 1: default judge by @kaisopos in #593
- [Tiny] update missing dataset import by @oelachqar in #599
- Update training script to support data collators by @xrdaukar in #590
- Update accelerate version to 1.0.0 by @wizeng23 in #601
- Remove deprecated dataset code paths by @oelachqar in #596
- Refactor Aya & Ultrachat to use oumi dataset sft classes by @oelachqar in #597
- Add Llama train/eval/infer E2E integration test by @wizeng23 in #602
- Set docstring for
NVidiaGpuRuntimeInfo
struct by @xrdaukar in #603 - Add generation params to inference engines by @oelachqar in #600
- [bug] Fix issue loading jsonl datasets from file by @oelachqar in #604
- Add Llama 3B configs by @wizeng23 in #605
- Align pyright checks with latest Pylance version by @oelachqar in #611
- Fix
apply_chat_template
issue inVisionLanguageSftDataset
by @xrdaukar in #609 - More robust make setup by @oelachqar in #610
- Fix a bug where the new CLI was improperly importing functions from top-level modules. by @taenin in #613
- Add support for the Launch command suite in the new CLI by @taenin in #612
- Support
HuggingFaceH4/llava-instruct-mix-vsft
dataset by @xrdaukar in #608 - [tiny] Fix .gitignore by @wizeng23 in #616
- [tiny] add gpt2 chat template, and update tests to use it by @oelachqar in #617
- Turn off pretty-printing exceptions in our CLI by @taenin in #618
- Cleanup dependencies by @oelachqar in #615
- Upgrade oumi dependencies by @oelachqar in #606
- Update makefile to use uv, add Jupyter target by @oelachqar in #614
- Add miniconda installation target, cleanup unused make commands by @oelachqar in #620
- Update several notebooks with the new EvaluationConfig format. by @taenin in #621
- Make sure conda env is registered by @oelachqar in #622
- Add Llama 3b sft/lora/qlora configs for Polaris by @wizeng23 in #626
- Add check if installation is successful by @oelachqar in #625
- Initial Cambrian integration by @xrdaukar in #557
- [tiny] alpaca - minor reproducibility boost by @optas in #619
- explicitly specify the model's dtype in LMH by @optas in #607
- [tiny] Add flops for T4 GPU by @wizeng23 in #628
- Use a timestamp for job directories on Polaris by @taenin in #627
- [tiny] Fix bug with Polaris job num by @wizeng23 in #629
- Update two VLLM configs. by @xrdaukar in #624
- Add
pip install -U uv;
tomake setup
for existing envs by @xrdaukar in #630 - Disable MFU logging for non-packed datasets by @wizeng23 in #632
- Add config example for long context fine-tuning by @oelachqar in #631
- Add distribution mode flag to llama_tune by @wizeng23 in #635
- Judge Notebook 2: Custom Judge by @kaisopos in #623
- Bugfixes for LLAVA by @xrdaukar in #634
- Update sphinx config and docs to fix misc errors and warnings by @oelachqar in #639
- Factor out OUMI_TOTAL_NUM_GPUS env var by @wizeng23 in #636
- Remove bitsandbytes from train dependencies by @oelachqar in #643
- Enable intershinx to allow linking to external documentation pages by @oelachqar in #640
- Tune few training params for LLAVA and blip2 models by @xrdaukar in #642
- Added support for specifying the inference engine via the InferenceConfig by @taenin in #638
- Add popular pre-training dataset classes by @oelachqar in #641
- Remove openai dependency by @oelachqar in #644
- Update our documentation to point to the new CLI. by @taenin in #645
- Enable dataloaders for VLLM-s (llava and blip2) by @xrdaukar in #646
- Allow gradient clipping to be optional by @optas in #649
- Add support for
add_generation_prompt
in LLAVA chat template by @xrdaukar in #648 - Add a description to the Launch CLI by @taenin in #651
- Add all Llama FSDP GCP configs by @wizeng23 in #637
- Coerce model params to correct dtype for QLoRA FSDP by @wizeng23 in #652
- Use uv for
pip install
commands by @wizeng23 in #653 - Update sphinx docs by @oelachqar in #654
- [Docs] Refactor docs pipeline by @oelachqar in #655
- [docs] swap and configure sphinx theme by @oelachqar in #656
- [Docs] Add documentation placeholders by @oelachqar in #658
- [Docs] Add sphinx-bibtex by @oelachqar in #659
- [Docs] fix rendering issues by @oelachqar in #660
- [docs] fix broken links by @oelachqar in #661
- Fix broken link in readme (dev_setup) by @kaisopos in #662
- [docs][tiny] fix minor doc typos by @oelachqar in #666
- [docs] add autodoc2 template by @oelachqar in #665
- [docs] Add content links and references by @oelachqar in #668
- [docs] switch to myst-nb for rendering notebooks by @oelachqar in #669
- [docs] Add script to generate module summaries by @oelachqar in #670
- [docs] Include cli reference by @oelachqar in #671
- Add dataset submodules by @oelachqar in #667
- Update notebooks to include a descriptive title by @oelachqar in #664
- Update tests/utils/test_device_utils.py by @xrdaukar in #672
- [Inference] Bug in generation config stop tokens by @kaisopos in #663
- Support rewriting special label values to -100 (
ignore_index
) to exclude from loss by @xrdaukar in #657 - Rename emails and website url to Oumi by @wizeng23 in #675
- Update scri...
Initial release
What's Changed
- Add python project configs by @oelachqar in #1
- Add repo skeleton by @oelachqar in #2
- Export lema entrypoint scripts by @oelachqar in #3
- Update static type checking config by @oelachqar in #5
- Add example jupyter / colab notebook by @oelachqar in #4
- Refactor config parsing to use omegaconf by @oelachqar in #6
- Updating documentation (Dev Environment Setup) by @kaisopos in #7
- Add tests and vscode config by @oelachqar in #8
- Added DPOTrainer example to repo, as well as cuda device cleanup to training loop by @jgreer013 in #9
- Adding torch as top-level module dependency by @optas in #10
- Add configs for specific hardware requirements by @jgreer013 in #11
- Sort pre-commit hooks lexicographically by @xrdaukar in #12
- Add logging config by @oelachqar in #13
- Lema inference by @xrdaukar in #14
- Panos dev by @optas in #16
- Add job launcher by @oelachqar in #15
- Making split of data a flexible variable by @optas in #17
- Configure max file size in precommit hooks by @xrdaukar in #18
- Minor bugfix and documentation update by @oelachqar in #19
- adding pynvml to train env by @kaisopos in #20
- Panos dev by @optas in #22
- Augmenting Types for training hyperparams by @optas in #23
- Train refactoring (config file visibility) + a few minor changes by @kaisopos in #21
- Minimal test for train function by @xrdaukar in #25
- Fix leftover '_torch_dtype' in 'ModelParams' by @xrdaukar in #26
- Update GPU types list in the default SkyPilot config by @xrdaukar in #27
- Add a missing lema-infer command under [project.scripts] by @xrdaukar in #28
- add basic pytests for evaluate and infer by @xrdaukar in #29
- Update README and pyproject.toml by @wizeng23 in #30
- A helper function to print info about available CUDA devices by @xrdaukar in #31
- Update SkyPilot cconfig to start using torchrun by @xrdaukar in #32
- Support basic single-node, multi-gpu training by @xrdaukar in #33
- Run all precommit hooks on the repo by @xrdaukar in #35
- Add experimental code for llama cpp inference by @jgreer013 in #37
- Create skeleton of STYLE_GUIDE.md by @xrdaukar in #36
- Adding support for training custom models (for now just a dummy model). by @kaisopos in #38
- Fix custom model name in test_train.py by @xrdaukar in #39
- Configure pyright (static type checker) and resolve existing type errors to make it pass by @xrdaukar in #41
- fix trailing whitespace warning in STYLE_GUIDE.md by @xrdaukar in #43
- Configure initial GitHub Actions workflow to run pre-commits and tests by @xrdaukar in #44
- A variety of proposed extensions to finetune a chat-based model (starting with Zephyr) by @optas in #34
- Fix syntax error in ultrachat by @xrdaukar in #48
- Create initial version of CONTRIBUTING.md by @xrdaukar in #46
- Reduce the number of training steps from 5 to 3 to make test_train.py faster by @xrdaukar in #49
- Adding registry for custom models. by @kaisopos in #42
- Add config and streaming args to DataParams by @wizeng23 in #47
- Update Pre-review Tests to only run on pull_request by @xrdaukar in #50
- Add training flags to computes tokens-based stats by @xrdaukar in #51
- reduce test training steps in another test which I missed before by @xrdaukar in #53
- Rename var names of *Params classes by @wizeng23 in #52
- Make some NVIDIA-specific dependencies optional by @xrdaukar in #54
- fix trl version as 0.8.6 by @xrdaukar in #56
- Remove reference to torch.cuda.clock_rate by @xrdaukar in #57
- Update inference to support non-interactive batch mode. by @kaisopos in #58
- Update README.md to include Linux/WSL specific instructions by @xrdaukar in #59
- Minor formatting improvements in README.md by @xrdaukar in #60
- Minor: Updating Lora Params by @optas in #55
- Support dataset packing by @wizeng23 in #63
- Disallow relative imports in LeMa by @xrdaukar in #65
- Add text_col param that's required for SFTTrainer by @wizeng23 in #66
- Refactor common config parsing logic (YAML, arg_list) into a common util by @xrdaukar in #68
- Standardize test naming convention by @wizeng23 in #69
- Adding support for a hardcoded evaluation with MMLU. by @kaisopos in #67
- Minor changes to the default configs/skypilot/sky.yaml config by @xrdaukar in #71
- Prototype to pass
config.model.model_max_length
to Trainers by @xrdaukar in #70 - [Inference] Remove the prepended prompts from model responses. by @kaisopos in #73
- Add a util to print versioning info by @xrdaukar in #74
- Switch to tempfile.TemporaryDirectory() in test_train.py by @xrdaukar in #75
- Update docstring verbs to descriptive form by @wizeng23 in #76
- Add sample accelerate and fsdp configs by @xrdaukar in #77
- Refactor code to get device rank and world size into a helper function by @xrdaukar in #79
- Add a simple util to print model summary e.g., layer names, architecture summary by @xrdaukar in #80
- Freeze numpy to pre 2.0 version by @xrdaukar in #81
- Adding inference support for next logit probability. by @kaisopos in #78
- Create FSDP configs for Phi3 by @xrdaukar in #82
- Auto-format pyproject.toml with "Even Better TOML" by @xrdaukar in #83
- Minor cleanup updates to SkyPilot configs by @xrdaukar in #84
- Mixed Precision Training, Flash-Attention-2, Print-trainable-params by @optas in #85
- Update README.md to include basic instructions for multi-GPU training (DDP, FSDP) by @xrdaukar in #86
- Start using $SKYPILOT_NUM_GPUS_PER_NODE in SkyPilot config by @xrdaukar in #90
- Add configs for FineWeb Llama2 pretraining by @wizeng23 in #89
- Quantization by @optas in #87
- Update the default SkyPilot config to print more debug/context info by @xrdaukar in #92
- Add license by @oelachqar in #93
- Initial version of SkyPilot config for multi-node training (num_nodes: N) by @xrdaukar in #94
- MMLU eval refactor. by @kaisopos in #88
- Remove comparison between LOCAL_RANK and RANK by @xrdaukar in #96
- Handling the loading of peft adapters and other minor issues (e.g., adding more logging parameters) by @optas in #91
- Update configs/skypilot/sky_llama2b.yaml to start using sky_init.sh by @xrdaukar in #97
- Add bool param to resume training from the last known checkpoint (if exists) by @xrdaukar in #99
- Inference: save/restore probabilities to/from file. by @kaisopos in #98
- Add support for dataset mixtures during training by @taenin in #95
- Add train, test, and validation splits to the LeMa config. by @taenin in #101
- nanoGPT (GPT2) pretraining recipe by @wizeng23 in #103
- Minor: Updates on Zephyr-Config by @optas in https://githu...