Stars
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
C++ Library Manager for Windows, Linux, and MacOS
👀 MinGW 32bit and 64bit version of OpenCV compiled on Windows. Including OpenCV 3.3.1, 3.4.1, 3.4.1-x64, 3.4.5, 3.4.6, 3.4.7, 3.4.8-x64, 3.4.9, 4.0.0-alpha-x64, 4.0.0-rc-x64, 4.0.1-x64, 4.1.0, 4.1.…
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Tensor✖️ is a minimalistic robust library to build deep neural network models
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"
OpenMMLab 3D Human Parametric Model Toolbox and Benchmark
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Transformer related optimization, including BERT, GPT
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A Neural Network For Automatic Image Colorization
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…