Stars
MSCCL++: A GPU-driven communication stack for scalable AI applications
Unofficial Python API for Anthropic's Claude LLM
Building blocks for foundation models.
An extremely fast Python linter and code formatter, written in Rust.
PyTorch emulation library for Microscaling (MX)-compatible data formats
Unit Scaling demo and experimentation code
Source code for Twitter's Recommendation Algorithm
Rust bindings for the C++ api of PyTorch.
A library to analyze PyTorch traces.
Optimized primitives for collective multi-GPU communication
Development repository for the Triton language and compiler
Tests performance of dataloader with multiprocessing vs. threads
magic-trace collects and displays high-resolution traces of what a process is doing
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
The Illustrated TLS 1.2 Connection: Every byte explained