Skip to content
/ unitxt Public
forked from IBM/unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

License

Notifications You must be signed in to change notification settings

alonh/unitxt

 
 

Repository files navigation

Image Description

Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation. We aspire to be simple, adaptable and transperant.

Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the models using it. Separation allows training without caring for preprocessing, switching models without loading the data differently and changing formats (instruction\ICL\etc.) without changing anything else.

version license python tests codecov Read the Docs downloads

Unitxt Flow

Where to start? 🦄

Button Button Button Button Button

Why Unitxt? 🦄

🦄 Simplicity

Everything is unitxt is simple and designed to feel natural and self explenatory.

🦄 Adaptability

Adding new datasets, loading recpepies, instructions and formattors is possible and encoureged!

🦄 Transperancy

The reosurces and formators of Unitxt are stored as shared datasets and therfore can easily reviewed by the crowed. Moreover, when assembling dataset with Unitxt it is very clear to others whats in it.

Contributers

Please install unitxt from source by:

git clone git@github.com:IBM/unitxt.git
cd unitxt
pip install -e ".[dev]"
pre-commit install

About

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.3%
  • Other 0.7%