Discussions section in repo #282

AncientLust · 2024-11-23T15:29:35Z

AncientLust
Nov 23, 2024

Is these an opportunity to add Discussions section to the repo like it's done for MAUI?

It's not always a bug or feature request. Sometimes it's something Whisper.net communtiy could know and you can find it in Discussions section. For example:

I'm building realtime S2T with mic using this lib and found interesting thing. When I try to feed exactly 16k samples into processor.ProcessAsync I see:

Error: whisper_full_with_state: input is too short - 990 ms < 1000 ms. consider padding the input audio with silence

If I add 160 more, the error disappears.

@sandrohanea, could you please comment if it might be related to missed header which can be obligatory? Because I use stream obviously no header is provided.

Attaching the repro draft which uses mic from NAudio as input if it will be helpful.

using System.Diagnostics;
using System.Text;
using System.Threading.Channels;
using NAudio.CoreAudioApi;
using NAudio.Wave;
using Whisper.net;
using Whisper.net.Ggml;
using Whisper.net.Logger;

namespace Whisper_Realtime;

internal static class Program
{
    private static async Task Main()
    {
        Console.OutputEncoding = Encoding.UTF8;
        Console.InputEncoding = Encoding.UTF8;
        
        var ggmlType = GgmlType.Tiny;
        var modelFileName = $"ggml-{ggmlType.ToString().ToLower()}.bin";

        if (!File.Exists(modelFileName))
        {
            await DownloadModel(modelFileName, ggmlType);
        }

        LogProvider.Instance.OnLog += (level, message) =>
        {
            Console.Write($"{level}: {message}");
        };
        
        using var whisperFactory = WhisperFactory.FromPath(modelFileName);
        
        var builder = whisperFactory.CreateBuilder()
            .WithPrompt("To jest duzy dom. Novy sklep.")
            .WithNoSpeechThreshold(0.8f)
            .WithLanguage("pl");

        await using var processor = builder.Build();
        
        var channel = Channel.CreateUnbounded<short[]>(new UnboundedChannelOptions
        {
            SingleReader = true,
            SingleWriter = true
        });

        var audioCaptureThread = new Thread(() => CaptureAudio(channel))
        {
            IsBackground = true
        };

        audioCaptureThread.Start();

        var reader = channel.Reader;
        var stopwatch = Stopwatch.StartNew();
        var sampleBuffer = new float[16000 * 60];
        var bufferSize = 0;
        var second = 1;
        
        while (await reader.WaitToReadAsync() && second < 16)
        {
            var samples = await reader.ReadAsync();
            var floatSamples = CastShortToFloat(samples);
            var targetSize = 16000 * second; 
            
            AddToBuffer(sampleBuffer, floatSamples, ref bufferSize);
            
            if (bufferSize >= targetSize)
            {
                stopwatch.Reset();
                stopwatch.Start();

                await foreach (var result in processor.ProcessAsync(sampleBuffer.AsMemory(0, targetSize)))
                {
                    if (result.Text.StartsWith(" [")) continue;

                    Console.WriteLine($"  {result.Start}->{result.End}: {result.Text} => with probability: {result.Probability}");
                }
                
                Console.WriteLine($"Seconds sent: {second:000} Buffer size: {bufferSize:000000} Spent: {stopwatch.Elapsed.TotalMilliseconds} ms.");
                
                second++;
            }
        }
    }

    private static void CaptureAudio(Channel<short[]> channel)
    {
        var writer = channel.Writer;

        var waveIn = new WaveInEvent
        {
            DeviceNumber = 0,
            WaveFormat = new WaveFormat(16000, 16, 1)
        };

        waveIn.DataAvailable += (sender, args) =>
        {
            Console.WriteLine($"Bytes Recorded: {args.BytesRecorded}");
            var buffer = new short[args.BytesRecorded / 2];
            Buffer.BlockCopy(args.Buffer, 0, buffer, 0, args.BytesRecorded);
            writer.TryWrite(buffer); 
        };

        waveIn.RecordingStopped += (_, _) =>
        {
            writer.Complete();
            waveIn.Dispose();
        };

        waveIn.StartRecording();
        Console.WriteLine("Press any key to stop recording...");
        Console.ReadKey();
        waveIn.StopRecording();
    }

    private static float[] CastShortToFloat(short[] samples)
    {
        var floatSamples = new float[samples.Length];
        for (var i = 0; i < samples.Length; i++)
        {
            floatSamples[i] = samples[i] / 32768.0f;
        }
        
        return floatSamples;
    }

    private static void AddToBuffer(float[] destination, float[] source, ref int count)
    {
        Array.Copy(source, 0, destination, count, source.Length);
        count += source.Length;
    }

    private static async Task DownloadModel(string fileName, GgmlType ggmlType)
    {
        Console.WriteLine($"Downloading Model {fileName}");
        await using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(ggmlType);
        await using var fileWriter = File.OpenWrite(fileName);
        await modelStream.CopyToAsync(fileWriter);
    }
}

Answered by sandrohanea

Nov 26, 2024

Hey @AncientLust ,
Created the discussion page and converted this one to discussion as well.
Thanks for the suggestion!

Now, trying to answer your question as well: indeed, you'll need at least 1000 ms to perform the inference:

https://github.com/ggerganov/whisper.cpp/blob/8c6a9b8bb6a0273cc0b5915903ca1ff9206c6285/src/whisper.cpp#L5375C5-L5375C39

It seems (based on the logs) that you're short of 10ms.

Indeed, as you're sending a Memory, no header is required and shouldn't be the cause of this missing 10ms.

It seems that 16k frames of audio, only produce 990ms of mel spectogram in the whisper.cpp library (one missing mel sample):
https://github.com/ggerganov/whisper.cpp/blob/8c6a9b8bb6a0273…

View full answer

sandrohanea · 2024-11-26T20:05:25Z

sandrohanea
Nov 26, 2024
Maintainer

Hey @AncientLust ,
Created the discussion page and converted this one to discussion as well.
Thanks for the suggestion!

Now, trying to answer your question as well: indeed, you'll need at least 1000 ms to perform the inference:

https://github.com/ggerganov/whisper.cpp/blob/8c6a9b8bb6a0273cc0b5915903ca1ff9206c6285/src/whisper.cpp#L5375C5-L5375C39

It seems (based on the logs) that you're short of 10ms.

Indeed, as you're sending a Memory, no header is required and shouldn't be the cause of this missing 10ms.

It seems that 16k frames of audio, only produce 990ms of mel spectogram in the whisper.cpp library (one missing mel sample):
https://github.com/ggerganov/whisper.cpp/blob/8c6a9b8bb6a0273cc0b5915903ca1ff9206c6285/src/whisper.cpp#L3044

Just adding 100 frames before calling the processor should fix it.
However, for the overall problem of continuous recognition (or real-time recognition), the problem is a little more complex in gereral, as you probably don't want to send to the recognition for everything that was recorded before for every new frames. But also, you don't want to split a word in half and keep the context in recognition. I'm working on another library that will facilitate realtime processing of sound with multiple components ( Speech Recognition, Voice Activity Detection, Diarization ,etc) and whisper.net will be a supported component for the SpeechRecognition part.

Unfortunetly, I don't have an exact ETA for the new library (as I work on it only in my free time and weekend) but will announce here as well once it will be available.

0 replies

sandrohanea · 2024-12-26T18:11:27Z

sandrohanea
Dec 26, 2024
Maintainer

Hey @AncientLust,

I’m excited to share that the new library, EchoSharp, is now available (still in its early stages): https://github.com/sandrohanea/echosharp/.

It’s designed to leverage Whisper.net as well as other Speech-to-Text components and VAD modules for near-real-time audio processing.

I’d greatly appreciate it if you could take some time to try it out and share any early feedback—it would mean a lot!

Thank you!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussions section in repo #282

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Discussions section in repo #282

AncientLust Nov 23, 2024

Replies: 2 comments

sandrohanea Nov 26, 2024 Maintainer

sandrohanea Dec 26, 2024 Maintainer

AncientLust
Nov 23, 2024

sandrohanea
Nov 26, 2024
Maintainer

sandrohanea
Dec 26, 2024
Maintainer