Huggingface example is broken #234

pbarker · 2025-02-25T22:49:59Z

Backend impacted

The PyTorch implementation

Operating system

Linux

Hardware

GPU with CUDA

Description

Folowing the huggingface example on https://huggingface.co/docs/transformers/en/model_doc/moshi

from datasets import load_dataset, Audio
import torch, math
from transformers import MoshiForConditionalGeneration, AutoFeatureExtractor, AutoTokenizer


librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
feature_extractor = AutoFeatureExtractor.from_pretrained("kyutai/moshiko-pytorch-bf16")
tokenizer = AutoTokenizer.from_pretrained("kyutai/moshiko-pytorch-bf16")
device = "cuda"
dtype = torch.bfloat16

# prepare user input audio 
librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
audio_sample = librispeech_dummy[-1]["audio"]["array"]
user_input_values = feature_extractor(raw_audio=audio_sample, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt").to(device=device, dtype=dtype)

# prepare moshi input values - we suppose moshi didn't say anything while the user spoke
moshi_input_values = torch.zeros_like(user_input_values.input_values)

# prepare moshi input ids - we suppose moshi didn't say anything while the user spoke
num_tokens = math.ceil(moshi_input_values.shape[-1] * waveform_to_token_ratio)
input_ids = torch.ones((1, num_tokens), device=device, dtype=torch.int64) * tokenizer.encode("<pad>")[0]

# generate 25 new tokens (around 2s of audio)
output = model.generate(input_ids=input_ids, user_input_values=user_input_values.input_values, moshi_input_values=moshi_input_values, max_new_tokens=25)

text_tokens = output.sequences
audio_waveforms = output.audio_sequences

This line:

feature_extractor = AutoFeatureExtractor.from_pretrained("kyutai/moshiko-pytorch-bf16")

Fails with:

OSError: kyutai/moshiko-pytorch-bf16 does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/kyutai/moshiko-pytorch-bf16/tree/main' for available files.

Any idea how to use this with huggingface?

Extra information

NA

Environment

Ubuntu 22.04, L40s GPU

The text was updated successfully, but these errors were encountered:

pbarker added the bug Something isn't working label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huggingface example is broken #234

Huggingface example is broken #234

pbarker commented Feb 25, 2025

Huggingface example is broken #234

Huggingface example is broken #234

Comments

pbarker commented Feb 25, 2025

Backend impacted

Operating system

Hardware

Description

Extra information

Environment