Intro

A lightweight implementation of a Decoder-only language model trained on the TinyStories dataset. The project features custom Triton kernels for optimized performance on NVIDIA GPUs.

Features

Transformer-based language model architecture
Custom Triton kernels for key operations:
- Softmax
- RMS Normalization
- Cross Entropy Loss
- Rotary Position Embeddings (RoPE)
Custom tokenizer training using SentencePiece

Prerequisites

pip install -r requirements.txt

Usage

# Download TinyStories dataset and train tokenizer
python train_vocab.py    
  
# Preprocess data
python preprocess.py

# Train Model
python train.py

# Generate text samples using trained model
python sample.py --prompt "your prompt"

Reference

https://github.com/datawhalechina/tiny-universe/
https://github.com/unslothai/unsloth
https://github.com/linkedin/Liger-Kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Intro

Features

Prerequisites

Usage

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Intro

Features

Prerequisites

Usage

Reference