Books:
- Foundations of Large Language Models
- How to Scale Your Model
- The Ultra-Scale Playbook: Training LLMs on GPU Clusters
Agents:
- Hugging Face Agents Course
- ai-agents-for-beginners (Microsoft)
- Agents (Google's whitepaper)
- Agency Is Frame-Dependent
Fine-Tuning:
- 🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
- LoRA: Low-Rank Adaptation of Large Language Models
- QLORA: Efficient Finetuning of Quantized LLMs
RAG related:
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Chain-of-Retrieval Augmented Generation
- Introducing Contextual Retrieval
Evaluation:
- A Survey on LLM-as-a-Judge
- ARC Prize 2024: Technical Report
- FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
- MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Datasets:
Models:
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
- Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
- RoFormer: Enhanced Transformer with Rotary Position Embedding and Round and Round We Go! What makes Rotary Positional Encodings useful?
- Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
- Qwen2.5-VL and paper
- Qwen2.5-Math and paper
- PaliGemma 2: A Family of Versatile VLMs for Transfer
- Magma: A Foundation Model for Multimodal AI Agents
Chain-of-Thought
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
Visualisation-of-Thought:
Test-Time Scaling
Test-Time Compute
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
AlphaGeometry:
Apple:
- Machine Learning Research at Apple
- Distillation Scaling Laws
- Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
- Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Misc:
- Scaling Pre-training to One Hundred Billion Data for Vision Language Models
- Competitive Programming with Large Reasoning Models
- MoBA: Mixture of Block Attention for Long-Context LLMs
- Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model, repo
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- Deepseek Papers
- InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
- Leveraging the true depth of LLMs
- NOLIMA: Long-Context Evaluation Beyond Literal Matching
- Memory Layers at Scale
- Towards an AI co-scientist
- On the consistent reasoning paradox of intelligence and optimal trust in AI: The power of 'I don't know'
- LIMO: Less is More for Reasoning
- SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
MLX:
LangChain:
LangGraph:
LangSmith:
Ollama:
LLamaIndex:
Databases:
HuggingFace:
Leonie Notebooks:
- Fine-tuning Gemma 2 JPN for Yomigana with LoRA
- Advanced RAG with Gemma, Weaviate, and LlamaIndex
- RAG with Gemma on HF 🤗 and Weaviate in DSPy
GitHub Repos: