A lightweight implementation of a Decoder-only language model trained on the TinyStories dataset. The project features custom Triton kernels for optimized performance on NVIDIA GPUs.
- Transformer-based language model architecture
- Custom Triton kernels for key operations:
- Softmax
- RMS Normalization
- Cross Entropy Loss
- Rotary Position Embeddings (RoPE)
- Custom tokenizer training using SentencePiece
pip install -r requirements.txt
# Download TinyStories dataset and train tokenizer
python train_vocab.py
# Preprocess data
python preprocess.py
# Train Model
python train.py
# Generate text samples using trained model
python sample.py --prompt "your prompt"