Releases · oumi-ai/oumi · GitHub

03 Feb 21:06

oelachqar

v0.1.4 Latest

Latest

What's Changed

Add memory cleanup calls in e2e integration tests by @xrdaukar in #1277
Set up versioning for our documentation by @taenin in #1275
Make qwen2-VL evaluation job pass by @xrdaukar in #1278
Add multi-modal (vlm) notebook with Llama 11B by @optas in #1258
Documentation: Inference -> List supported models by @kaisopos in #1279
[tiny] update website link by @oelachqar in #1280
Update all documentation links to the new doc URL by @taenin in #1281
Update Oumi - A Tour.ipynb by @brragorn in #1282
Documentation: Judge (minor edits) by @kaisopos in #1283
Fix citation by @oelachqar in #1285
Add Deepseek R1 1.5B/32B configs by @wizeng23 in #1276
Misc eval configs cleanup by @xrdaukar in #1286
[docs] Describe parallel evaluation by @xrdaukar in #1284
Update microsoft/Phi-3-vision-128k-instruct training config by @xrdaukar in #1287
Add Together Deepseek R1 inference config by @wizeng23 in #1289
[minor] vlm notebook minor updates (doc referencing, freeze visual backbone) by @optas in #1288
Add missing -m oumi evaluate argument in eval config by @xrdaukar in #1291
[docs] Add more references to VL-SFT and SFT notebooks by @xrdaukar in #1293
Eval config change for deepseek-ai/DeepSeek-R1-Distill-Llama-70B by @xrdaukar in #1292
[notebooks] Update intro & installation instruction by @oelachqar in #1294
Update notebook intros by @oelachqar in #1296
[notebooks] Update installation instructions for colab by @oelachqar in #1297
Add Apache license header to src/oumi/**/*.py by @wizeng23 in #1290
Minor updates to VLM Multimodal notebook by @xrdaukar in #1299
[docs] Add latest notebooks and update references by @oelachqar in #1300
[tiny] Add docs auto-generated .rst files to gitignore by @wizeng23 in #1298
[tiny] use GitHub link for header by @oelachqar in #1301
[docs][tiny] update inference engines reference by @oelachqar in #1302
Update README/docs to add new DeepSeek models by @wizeng23 in #1304
[docs] Use pip install oumi over pip install . by @wizeng23 in #1305
Tune VLM SFT configs by @xrdaukar in #1306
Tune VLM configs for SmolVLM and Qwen2-VL by @xrdaukar in #1307
Update config/notebook pip installs to use PyPI by @wizeng23 in #1308
[tiny] upgrade torch version by @oelachqar in #1295
Update logging and unit tests related to chat templates by @xrdaukar in #1311
fix(docs): "interested by joining" to "interested in joining" by @CharlesCNorton in #1312
Add HF_TOKEN instructions to Oumi Multimodal notebook by @xrdaukar in #1313
Update configuration.md by @penfever in #1314
remove duplicate keys in config example by @lucyknada in #1315
[Notebooks] Update VLM notebook by @xrdaukar in #1317
Update parasail_inference_engine.py by @jgreer013 in #1320
Fix typo and update warning message for OUMI trainer by @xrdaukar in #1319
[Notebooks] Add a note that a notebook kernel restart may be needed after pip install oumi by @xrdaukar in #1318
Update Phi3 to support multiple images by @xrdaukar in #1321
Add more detailed comment headers to YAML configs by @wizeng23 in #1310
[Notebooks] Add a note to Tour notebook to restart kernel after the first pip install by @xrdaukar in #1327
Tweak --mem-fraction-static param in sample SGLang configs by @xrdaukar in #1328
Disallow using DatasetParams field names as keys in DatasetParams.dataset_kwargs by @xrdaukar in #1324
Support dataset_name_override dataset_kwarg by @xrdaukar in #1188
Add an util and a test marker for HF token by @xrdaukar in #1329
Update llama3-instruct chat template to align with the original models template by @xrdaukar in #1326
chore: update launcher.sh by @eltociear in #1333
[Notebooks] Minor improvements in VLM and CNN notebooks by @xrdaukar in #1335
Update VLM cluster names in sample commands by @xrdaukar in #1336
Update our README and docs with the github trending badge. by @taenin in #1340
Update README.md - Add DeepSeek to supported models by @mkoukoumidis in #1343
Update index.md - Add DeepSeek to supported models by @mkoukoumidis in #1344
Update "GPU Tests" status badge in README page by @xrdaukar in #1345

New Contributors

@CharlesCNorton made their first contribution in #1312
@lucyknada made their first contribution in #1315
@eltociear made their first contribution in #1333

Full Changelog: v0.1.3...v0.1.4

Contributors

optas, oelachqar, and 11 other contributors

Assets 2

28 Jan 00:44

oelachqar

v0.1.3

What's Changed

Documentation: Judge | Custom Model page by @kaisopos in #1195
[WIP] Add a notebook for using CNN with custom dataset by @xrdaukar in #1196
[Cherrypick for launch] Evaluate: return dict of results by @kaisopos in #1197
Configs Train/Infer/Eval and Llama 3.3v (70b) by @optas in #1200
Adding an integration test for evaluation fn's output (see PR-1197) by @kaisopos in #1199
[docs] Add more details and cross-references related to customization by @xrdaukar in #1198
Define single_gpu test marker by @xrdaukar in #1201
Native inference: Don't set min_p, temperature in GenerationConfig if sampling is disabled by @xrdaukar in #1202
Update tests to make them runnable on GCP by @xrdaukar in #1203
Add newline before pformat(train_config) by @xrdaukar in #1204
GCP tests launcher script changes by @xrdaukar in #1205
[Evaluation] Bug: serialization by @kaisopos in #1207
[docs] Add inference snippet for together.ai and DeepSeek APIs by @oelachqar in #1208
Exclude multi_gpu tests from GitHub GPU tests by @xrdaukar in #1210
Update e2e tests to support multi-GPU machines by @xrdaukar in #1206
Add wrappers for remote inference engines by @oelachqar in #1209
Vision-Lang & Inference (including LoRA) by @optas in #1174
[BugFix] Throw a runtime error for quantized models & inference=VLLM by @kaisopos in #1212
Fix most job configs by @wizeng23 in #1213
e2e tests update by @xrdaukar in #1216
[Notebook] Evaluation with Oumi by @kaisopos in #1218
gpt2: move include_performance_metrics param from script to yaml by @xrdaukar in #1217
Simplify inference engine API by @oelachqar in #1214
Move configs to experimental by @wizeng23 in #1215
[docs] Update index page by @oelachqar in #1220
Update ConsoleLogger to write to STDOUT by @xrdaukar in #1221
Set use_spot to False in our JobConfigs by @wizeng23 in #1222
Delete oumi[optional] install target by @wizeng23 in #1224
Scaffolding and the first testcase for e2e evaluation tests by @xrdaukar in #1225
[docs] Update inference engines doc page by @oelachqar in #1227
Clean-up inference engine builder by @oelachqar in #1226
[VLLM Engine] Enabling BitsAndBytes quantization by @kaisopos in #1223
Add example distillation notebook by @jgreer013 in #1228
Add a script to pre-download models for gpu_tests by @xrdaukar in #1231
Fix multi-GPU inference integration test by @xrdaukar in #1229
[tiny][docs] Update PEFT/LoRA content by @optas in #1233
[BugFix] GGUF does not work with VLLM by @kaisopos in #1232
Re-enable parallel evaluation for VLM-s by @xrdaukar in #1235
Add multimodal exemplar dataset in our provided mini-datasets by @optas in #1234
[Tiny] renaming a field name (init_lora_weights) by @optas in #1236
Add more e2e evaluation tests by @xrdaukar in #1237
Fix pyright breakage when vllm and llama_cpp are not installed by @taenin in #1240
Update our oumi launch documentation. by @taenin in #1239
Update index.md title for "Join the Community!" by @mkoukoumidis in #1242
Update quickstart.md - nit for Oumi support request by @mkoukoumidis in #1241
[VLLM Engine] Improve support for GGUF models (incl. auto-download) by @kaisopos in #1238
Update README.md title to "Join the Community!" by @mkoukoumidis in #1243
Update quickstart.md by @brragorn in #1251
Update quickstart.md by @brragorn in #1253
Update quickstart.md by @brragorn in #1252
Update quickstart.md by @brragorn in #1250
[Minor refactor] Moving model caching to oumi.utils by @kaisopos in #1246
Add more details to troubleshooting FAQ by @wizeng23 in #1249
Update training_methods.md - Change compute requirement suggestions by @mkoukoumidis in #1245
Update train.md - nit description change by @mkoukoumidis in #1244
[docs] misc docs feedback by @oelachqar in #1248
[tiny] Qwen2-VL activate experimental datapipes by @optas in #1247
Update Oumi - A Tour.ipynb by @brragorn in #1254
[docs] more docs feedback by @oelachqar in #1255
Update supported_models.md by @penfever in #1256
Rename experimental_use_torch_datapipes data param by @xrdaukar in #1257
Add pypi release workflow using testpypi by @oelachqar in #1259
Update workflow names by @oelachqar in #1262
Update default idle_minutes_to_autostop to 1 hour. by @taenin in #1264
update pypi release workflow to use trusted env by @oelachqar in #1265
Add padding_side param to internal model config by @xrdaukar in #1260
Documentation: Updates on Evaluation/Judge (based on Manos' feedback) by @kaisopos in #1261
[tiny] less strict requirements by @oelachqar in #1266
Add Deepseek R1 Distill Llama 8B/70B configs by @wizeng23 in #1263
Update index.md to highlight beta stage by @mkoukoumidis in #1268
Update README.md to highlight beta stage by @mkoukoumidis in #1267
Disable pre-release packages by @oelachqar in #1270
Update common_workflows.md - Clarify OpenAI is just an example by @mkoukoumidis in #1271
Documentation: Evaluation page (update to highlight multi-modal) by @kaisopos in #1269
Update launch.md by @taenin in #1272
Add pypi release workflow by @oelachqar in #1273
Documentation: Judge | minor edit (bold) by @kaisopos in #1274

Full Changelog: v0.1.2...v0.1.3

Contributors

optas, oelachqar, and 8 other contributors

Assets 2

27 Jan 22:30

oelachqar

v0.1.2.3

What's Changed

Re-enable parallel evaluation for VLM-s by @xrdaukar in #1235
Add multimodal exemplar dataset in our provided mini-datasets by @optas in #1234
[Tiny] renaming a field name (init_lora_weights) by @optas in #1236
Add more e2e evaluation tests by @xrdaukar in #1237
Fix pyright breakage when vllm and llama_cpp are not installed by @taenin in #1240
Update our oumi launch documentation. by @taenin in #1239
Update index.md title for "Join the Community!" by @mkoukoumidis in #1242
Update quickstart.md - nit for Oumi support request by @mkoukoumidis in #1241
[VLLM Engine] Improve support for GGUF models (incl. auto-download) by @kaisopos in #1238
Update README.md title to "Join the Community!" by @mkoukoumidis in #1243
Update quickstart.md by @brragorn in #1251
Update quickstart.md by @brragorn in #1253
Update quickstart.md by @brragorn in #1252
Update quickstart.md by @brragorn in #1250
[Minor refactor] Moving model caching to oumi.utils by @kaisopos in #1246
Add more details to troubleshooting FAQ by @wizeng23 in #1249
Update training_methods.md - Change compute requirement suggestions by @mkoukoumidis in #1245
Update train.md - nit description change by @mkoukoumidis in #1244
[docs] misc docs feedback by @oelachqar in #1248
[tiny] Qwen2-VL activate experimental datapipes by @optas in #1247
Update Oumi - A Tour.ipynb by @brragorn in #1254
[docs] more docs feedback by @oelachqar in #1255
Update supported_models.md by @penfever in #1256
Rename experimental_use_torch_datapipes data param by @xrdaukar in #1257
Add pypi release workflow using testpypi by @oelachqar in #1259
Update workflow names by @oelachqar in #1262
Update default idle_minutes_to_autostop to 1 hour. by @taenin in #1264
update pypi release workflow to use trusted env by @oelachqar in #1265

Full Changelog: v0.1.2.2...v0.1.2.3

Contributors

optas, oelachqar, and 7 other contributors

Assets 2

17 Jan 17:31

oelachqar

v0.1.2.0-alpha Pre-release

Pre-release

What's Changed

Update README.md - Better highlight features & nits by @mkoukoumidis in #995
[tiny] update docstring and cleanup by @oelachqar in #1006
Qwen2-VL: minor updates by @xrdaukar in #1000
Update README.md - Describe Oumi's most common capabilities by @mkoukoumidis in #996
Fix readme. by @taenin in #1009
Updated our ascii logo by @taenin in #1008
[docs] Update readme by @oelachqar in #1010
Cleanup scripts by @oelachqar in #1011
Cleanup experimental folder by @oelachqar in #1012
Update lists of supported VLM-s in README and docs by @xrdaukar in #1014
Freeze Python package versions by @xrdaukar in #1007
Update blip2's chat template to use the "default" one by @xrdaukar in #1015
Add docstrings how to start vLLM and SGLang servers for Llama-3.2-11B-Vision-Instruct by @xrdaukar in #1016
Evaluation: bugfixing, corner case, unit tests by @kaisopos in #1003
Configure asyncio_default_fixture_loop_scope to reduce pytest warnings by @xrdaukar in #1013
Update the registry to load registered core values upon use. by @taenin in #1017
Update default installation instructions to pypi by @taenin in #1018
[tiny] Update debug datasets by @oelachqar in #1020
[docs] Address misc docs feedback by @oelachqar in #1019
[tiny] update pre-defined judges and docs by @oelachqar in #1021
Parameterize e2e training test, and add config for Qwen2-VL by @xrdaukar in #1023
Remove our docs password from the readme. by @taenin in #1024
VLM docs update by @xrdaukar in #1025
Fix loading registered pretrain datasets by @wizeng23 in #1005
Update @requires_gpus test decorator to optionally specify min GPU memory requirement by @xrdaukar in #1029
[tiny] Update GitHub workflows by @oelachqar in #1034
Update BaseConfig.from_yaml to also support Path by @xrdaukar in #1026
[tiny] Cleanup judge engine builder & fix circular dep by @oelachqar in #1035
Create GPU GitHub Actions workflow by @oelachqar in #1004
Add structured outputs support to gemini/vertex engines by @oelachqar in #1022
[docs] Fix feedback on training and inference user guides by @oelachqar in #1037
[docs][tiny] fix examples in inference guide by @oelachqar in #1038
Add a sanity test for circular imports. by @taenin in #1030
Resolve circular dependencies in Oumi by @taenin in #1039
Move our circular dependency test to e2e to speed up GPU CI tests. by @taenin in #1040
Add custom inference engine for gemini API by @oelachqar in #1036
Define CLI in our quickstart. by @taenin in #1042
Skip running GPU tests on low-risk code paths by @oelachqar in #1043
Define more terms in our training docs. by @taenin in #1044
Fix the broken python text snippet on the train page. by @taenin in #1045
Fix the second python snippet in the train page. by @taenin in #1046
[docs] Add Gemini to the list of supported inference API-s, and sort them by @xrdaukar in #1048
Fix issues in most notebooks by @wizeng23 in #1047
[docs][tiny] remove termynal from sphinx conf by @oelachqar in #1041
Fix a typo in the VS Code environment page. by @taenin in #1049
Define WSL in our vscode docs. by @taenin in #1052
[tiny] disable unit tests on safe paths by @oelachqar in #1051
[docs] Fix contributing and open issue links by @oelachqar in #1050
[evaluations/generative_benchmark] Broken link by @kaisopos in #1054
Remove dangling reference to jupyter in Makefile help by @xrdaukar in #1053
[evaluations/generative_benchmark] Removing notebook link by @kaisopos in #1055
Support constrained decoding in SGLang inference engine by @xrdaukar in #1032
[tiny] Update tutorials page by @wizeng23 in #1056
Minor updates to Launch.md by @taenin in #1059
[docs] Update docs/user_guides/infer/infer.md by @xrdaukar in #1058
Nits for common_workflows.md by @mkoukoumidis in #1061
Nit fixes for acknowledgements.md by @mkoukoumidis in #1057
Add sample trouble shooting for remote jobs. by @taenin in #1062
Add a Github Issues selector for questions and have it redirect to Discord. by @taenin in #1064
Package checking: Adding functionality for checking package versioning and fast failing by @kaisopos in #1031
Fix various typos in contributing.md by @taenin in #1066
SGLang inference documentation by @xrdaukar in #1065
Replace assert in NativeInferenceEngine with RuntimeError by @xrdaukar in #1068
Update dev set up instructions to use a Fork. by @taenin in #1067
Define inference configs for more models by @xrdaukar in #1069
[Evaluation] HF Leaderboards yaml files by @kaisopos in #1071
Specify engine: NATIVE is inference configs by @xrdaukar in #1075
Improve handling of image path and URLs by @xrdaukar in #1074
[Doc > Quickstart] Should we add links to guides for better discoverability? by @kaisopos in #1076
Add e2e tests for running tutorial notebooks by @oelachqar in #1079
Ignore all experimental files when running our circular dependency test. by @taenin in #1081
[Super Nit Doc Update] environments.md by @kaisopos in #1082
Add an env var for loading user registered values (dataset, models, clouds) when initializing the Oumi Registry by @taenin in #1077
Update internal model configs to support default tokenizer_pad_token and chat_template by model type by @xrdaukar in #1078
[Minor] Notebook typo by @kaisopos in #1085
Upgrade transformers to 4.47 by @wizeng23 in #1033
[tiny][docs] Update recipes page by @wizeng23 in #1072
Configure e2e integration test for Llama 3.2 Vision 11B by @xrdaukar in #1086
Nits for cli_reference.md by @mkoukoumidis in #1063
[Documentation] Evaluate | Leaderboards Page by @kaisopos in #1084
[Documentation] Evaluate | Main Page (revision) by @kaisopos in #1089
[tiny] Fix precommit by @oelachqar in #1092
Add timeout for unit & integration tests by @oelachqar in #1091
Add GitHub Actions workflow for doctests by @oelachqar in #1093
[docs] remove unused page, fix links by @oelachqar in #1094
[Documentation] Evaluate | Main Page (small refactor) by @kaisopos in #1095
Rewrite of the main Oumi Launch page. by @taenin in #1087
Remove pytest.mark.skip() for basic e2e tests by @xrdaukar in #1088
[tiny] Upgrade minimum numpy version to unblock python3.12 installation by @oelachqar in #1099
Update our Readme with a new header image. by @taenin in #1098
[docs] Minor refresh to dataset resource pages by @oelachqar in #1097
[docs] Add docs guide page by @oelachqar in #1096
Add a quick unit test to ensure new dependencies are not added to the top-level CLI by @taenin in https://github.com/o...

Read more

Contributors

optas, oelachqar, and 8 other contributors

Assets 2

08 Jan 04:27

oelachqar

v0.1.1.0-alpha.1 Pre-release

Pre-release

What's Changed

Minimal SkyPilot config for blip2 and llava models for GCP with TRL_SFT by @xrdaukar in #573
Inference Engine async writes by @taenin in #574
Misc cleanups in JsonlinesDataset by @xrdaukar in #576
Split out cloud dependencies by @taenin in #575
Disable sdpa for blip2 by @xrdaukar in #579
Set accelerate version to fix FSDP model saving by @wizeng23 in #580
Remove AWS as a default dep by @taenin in #582
Update ProfilerParams docstrings to follow the new style by @xrdaukar in #583
Minor updates in scripts/benchmarks/minimal_multimodal_training.py by @xrdaukar in #585
Add @override annotations to methods of few Dataset subclasses by @xrdaukar in #584
Add dataset class for dolly dataset by @oelachqar in #586
Refactor debugging/device utils, and add new GPU stats measurement functions by @xrdaukar in #587
Add text jsonlines dataset class by @oelachqar in #589
Define DataCollationParams by @xrdaukar in #581
Misc updates to Polaris launcher scripts by @xrdaukar in #591
Set up a new version of the Oumi CLI using Typer by @taenin in #588
Update handling of GPU fan speed info by @xrdaukar in #595
Add support for magpie dataset variants by @oelachqar in #594
Rename GenerationConfig to GenerationParams by @wizeng23 in #592
Fix cli infer test by @wizeng23 in #598
Judge Notebook 1: default judge by @kaisopos in #593
[Tiny] update missing dataset import by @oelachqar in #599
Update training script to support data collators by @xrdaukar in #590
Update accelerate version to 1.0.0 by @wizeng23 in #601
Remove deprecated dataset code paths by @oelachqar in #596
Refactor Aya & Ultrachat to use oumi dataset sft classes by @oelachqar in #597
Add Llama train/eval/infer E2E integration test by @wizeng23 in #602
Set docstring for NVidiaGpuRuntimeInfo struct by @xrdaukar in #603
Add generation params to inference engines by @oelachqar in #600
[bug] Fix issue loading jsonl datasets from file by @oelachqar in #604
Add Llama 3B configs by @wizeng23 in #605
Align pyright checks with latest Pylance version by @oelachqar in #611
Fix apply_chat_template issue in VisionLanguageSftDataset by @xrdaukar in #609
More robust make setup by @oelachqar in #610
Fix a bug where the new CLI was improperly importing functions from top-level modules. by @taenin in #613
Add support for the Launch command suite in the new CLI by @taenin in #612
Support HuggingFaceH4/llava-instruct-mix-vsft dataset by @xrdaukar in #608
[tiny] Fix .gitignore by @wizeng23 in #616
[tiny] add gpt2 chat template, and update tests to use it by @oelachqar in #617
Turn off pretty-printing exceptions in our CLI by @taenin in #618
Cleanup dependencies by @oelachqar in #615
Upgrade oumi dependencies by @oelachqar in #606
Update makefile to use uv, add Jupyter target by @oelachqar in #614
Add miniconda installation target, cleanup unused make commands by @oelachqar in #620
Update several notebooks with the new EvaluationConfig format. by @taenin in #621
Make sure conda env is registered by @oelachqar in #622
Add Llama 3b sft/lora/qlora configs for Polaris by @wizeng23 in #626
Add check if installation is successful by @oelachqar in #625
Initial Cambrian integration by @xrdaukar in #557
[tiny] alpaca - minor reproducibility boost by @optas in #619
explicitly specify the model's dtype in LMH by @optas in #607
[tiny] Add flops for T4 GPU by @wizeng23 in #628
Use a timestamp for job directories on Polaris by @taenin in #627
[tiny] Fix bug with Polaris job num by @wizeng23 in #629
Update two VLLM configs. by @xrdaukar in #624
Add pip install -U uv; to make setup for existing envs by @xrdaukar in #630
Disable MFU logging for non-packed datasets by @wizeng23 in #632
Add config example for long context fine-tuning by @oelachqar in #631
Add distribution mode flag to llama_tune by @wizeng23 in #635
Judge Notebook 2: Custom Judge by @kaisopos in #623
Bugfixes for LLAVA by @xrdaukar in #634
Update sphinx config and docs to fix misc errors and warnings by @oelachqar in #639
Factor out OUMI_TOTAL_NUM_GPUS env var by @wizeng23 in #636
Remove bitsandbytes from train dependencies by @oelachqar in #643
Enable intershinx to allow linking to external documentation pages by @oelachqar in #640
Tune few training params for LLAVA and blip2 models by @xrdaukar in #642
Added support for specifying the inference engine via the InferenceConfig by @taenin in #638
Add popular pre-training dataset classes by @oelachqar in #641
Remove openai dependency by @oelachqar in #644
Update our documentation to point to the new CLI. by @taenin in #645
Enable dataloaders for VLLM-s (llava and blip2) by @xrdaukar in #646
Allow gradient clipping to be optional by @optas in #649
Add support for add_generation_prompt in LLAVA chat template by @xrdaukar in #648
Add a description to the Launch CLI by @taenin in #651
Add all Llama FSDP GCP configs by @wizeng23 in #637
Coerce model params to correct dtype for QLoRA FSDP by @wizeng23 in #652
Use uv for pip install commands by @wizeng23 in #653
Update sphinx docs by @oelachqar in #654
[Docs] Refactor docs pipeline by @oelachqar in #655
[docs] swap and configure sphinx theme by @oelachqar in #656
[Docs] Add documentation placeholders by @oelachqar in #658
[Docs] Add sphinx-bibtex by @oelachqar in #659
[Docs] fix rendering issues by @oelachqar in #660
[docs] fix broken links by @oelachqar in #661
Fix broken link in readme (dev_setup) by @kaisopos in #662
[docs][tiny] fix minor doc typos by @oelachqar in #666
[docs] add autodoc2 template by @oelachqar in #665
[docs] Add content links and references by @oelachqar in #668
[docs] switch to myst-nb for rendering notebooks by @oelachqar in #669
[docs] Add script to generate module summaries by @oelachqar in #670
[docs] Include cli reference by @oelachqar in #671
Add dataset submodules by @oelachqar in #667
Update notebooks to include a descriptive title by @oelachqar in #664
Update tests/utils/test_device_utils.py by @xrdaukar in #672
[Inference] Bug in generation config stop tokens by @kaisopos in #663
Support rewriting special label values to -100 (ignore_index) to exclude from loss by @xrdaukar in #657
Rename emails and website url to Oumi by @wizeng23 in #675
Update scri...

Read more

Contributors

optas, oelachqar, and 7 other contributors

Assets 2

02 Oct 22:03

oelachqar

Initial release Pre-release

Pre-release

What's Changed

Add python project configs by @oelachqar in #1
Add repo skeleton by @oelachqar in #2
Export lema entrypoint scripts by @oelachqar in #3
Update static type checking config by @oelachqar in #5
Add example jupyter / colab notebook by @oelachqar in #4
Refactor config parsing to use omegaconf by @oelachqar in #6
Updating documentation (Dev Environment Setup) by @kaisopos in #7
Add tests and vscode config by @oelachqar in #8
Added DPOTrainer example to repo, as well as cuda device cleanup to training loop by @jgreer013 in #9
Adding torch as top-level module dependency by @optas in #10
Add configs for specific hardware requirements by @jgreer013 in #11
Sort pre-commit hooks lexicographically by @xrdaukar in #12
Add logging config by @oelachqar in #13
Lema inference by @xrdaukar in #14
Panos dev by @optas in #16
Add job launcher by @oelachqar in #15
Making split of data a flexible variable by @optas in #17
Configure max file size in precommit hooks by @xrdaukar in #18
Minor bugfix and documentation update by @oelachqar in #19
adding pynvml to train env by @kaisopos in #20
Panos dev by @optas in #22
Augmenting Types for training hyperparams by @optas in #23
Train refactoring (config file visibility) + a few minor changes by @kaisopos in #21
Minimal test for train function by @xrdaukar in #25
Fix leftover '_torch_dtype' in 'ModelParams' by @xrdaukar in #26
Update GPU types list in the default SkyPilot config by @xrdaukar in #27
Add a missing lema-infer command under [project.scripts] by @xrdaukar in #28
add basic pytests for evaluate and infer by @xrdaukar in #29
Update README and pyproject.toml by @wizeng23 in #30
A helper function to print info about available CUDA devices by @xrdaukar in #31
Update SkyPilot cconfig to start using torchrun by @xrdaukar in #32
Support basic single-node, multi-gpu training by @xrdaukar in #33
Run all precommit hooks on the repo by @xrdaukar in #35
Add experimental code for llama cpp inference by @jgreer013 in #37
Create skeleton of STYLE_GUIDE.md by @xrdaukar in #36
Adding support for training custom models (for now just a dummy model). by @kaisopos in #38
Fix custom model name in test_train.py by @xrdaukar in #39
Configure pyright (static type checker) and resolve existing type errors to make it pass by @xrdaukar in #41
fix trailing whitespace warning in STYLE_GUIDE.md by @xrdaukar in #43
Configure initial GitHub Actions workflow to run pre-commits and tests by @xrdaukar in #44
A variety of proposed extensions to finetune a chat-based model (starting with Zephyr) by @optas in #34
Fix syntax error in ultrachat by @xrdaukar in #48
Create initial version of CONTRIBUTING.md by @xrdaukar in #46
Reduce the number of training steps from 5 to 3 to make test_train.py faster by @xrdaukar in #49
Adding registry for custom models. by @kaisopos in #42
Add config and streaming args to DataParams by @wizeng23 in #47
Update Pre-review Tests to only run on pull_request by @xrdaukar in #50
Add training flags to computes tokens-based stats by @xrdaukar in #51
reduce test training steps in another test which I missed before by @xrdaukar in #53
Rename var names of *Params classes by @wizeng23 in #52
Make some NVIDIA-specific dependencies optional by @xrdaukar in #54
fix trl version as 0.8.6 by @xrdaukar in #56
Remove reference to torch.cuda.clock_rate by @xrdaukar in #57
Update inference to support non-interactive batch mode. by @kaisopos in #58
Update README.md to include Linux/WSL specific instructions by @xrdaukar in #59
Minor formatting improvements in README.md by @xrdaukar in #60
Minor: Updating Lora Params by @optas in #55
Support dataset packing by @wizeng23 in #63
Disallow relative imports in LeMa by @xrdaukar in #65
Add text_col param that's required for SFTTrainer by @wizeng23 in #66
Refactor common config parsing logic (YAML, arg_list) into a common util by @xrdaukar in #68
Standardize test naming convention by @wizeng23 in #69
Adding support for a hardcoded evaluation with MMLU. by @kaisopos in #67
Minor changes to the default configs/skypilot/sky.yaml config by @xrdaukar in #71
Prototype to pass config.model.model_max_length to Trainers by @xrdaukar in #70
[Inference] Remove the prepended prompts from model responses. by @kaisopos in #73
Add a util to print versioning info by @xrdaukar in #74
Switch to tempfile.TemporaryDirectory() in test_train.py by @xrdaukar in #75
Update docstring verbs to descriptive form by @wizeng23 in #76
Add sample accelerate and fsdp configs by @xrdaukar in #77
Refactor code to get device rank and world size into a helper function by @xrdaukar in #79
Add a simple util to print model summary e.g., layer names, architecture summary by @xrdaukar in #80
Freeze numpy to pre 2.0 version by @xrdaukar in #81
Adding inference support for next logit probability. by @kaisopos in #78
Create FSDP configs for Phi3 by @xrdaukar in #82
Auto-format pyproject.toml with "Even Better TOML" by @xrdaukar in #83
Minor cleanup updates to SkyPilot configs by @xrdaukar in #84
Mixed Precision Training, Flash-Attention-2, Print-trainable-params by @optas in #85
Update README.md to include basic instructions for multi-GPU training (DDP, FSDP) by @xrdaukar in #86
Start using $SKYPILOT_NUM_GPUS_PER_NODE in SkyPilot config by @xrdaukar in #90
Add configs for FineWeb Llama2 pretraining by @wizeng23 in #89
Quantization by @optas in #87
Update the default SkyPilot config to print more debug/context info by @xrdaukar in #92
Add license by @oelachqar in #93
Initial version of SkyPilot config for multi-node training (num_nodes: N) by @xrdaukar in #94
MMLU eval refactor. by @kaisopos in #88
Remove comparison between LOCAL_RANK and RANK by @xrdaukar in #96
Handling the loading of peft adapters and other minor issues (e.g., adding more logging parameters) by @optas in #91
Update configs/skypilot/sky_llama2b.yaml to start using sky_init.sh by @xrdaukar in #97
Add bool param to resume training from the last known checkpoint (if exists) by @xrdaukar in #99
Inference: save/restore probabilities to/from file. by @kaisopos in #98
Add support for dataset mixtures during training by @taenin in #95
Add train, test, and validation splits to the LeMa config. by @taenin in #101
nanoGPT (GPT2) pretraining recipe by @wizeng23 in #103
Minor: Updates on Zephyr-Config by @optas in https://githu...

Read more

Contributors

optas, oelachqar, and 6 other contributors

Assets 2