Stars
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
React app for inspecting, building and debugging with the Realtime API
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
first base model for full-duplex conversational audio
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
Confusion Matrix in Python: plot a pretty confusion matrix (like Matlab) in python using seaborn and matplotlib
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
On the detection of synthetic images generated by diffusion models
The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% att…
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
Betty: an automatic differentiation library for generalized meta-learning and multilevel optimization
Google Research
3FabRec: Fast Few-shot Face alignment by Reconstruction - PyTorch implementation
DomainBed is a suite to test domain generalization algorithms
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.
Datasets derived from US census data
[NeurIPS 2020] “ Robust Pre-Training by Adversarial Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
High-Resolution Image Synthesis with Latent Diffusion Models
A data augmentations library for audio, image, text, and video.