You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for such an amazing work. I wonder if one wants to process multiple users concurrently, how this library can be used in such a scenario with real-time transcription. Furthermore, can we achieve this with a single inference of the model? Especially if one user takes a pause or not speaking then inference for another user should be possible.
The text was updated successfully, but these errors were encountered:
Only with horizontal scaling. This lib relies on faster_whisper for transcription, which can only serve in parallel when using multiple GPUs (or in CPU mode but that's too slow for realworld applications). There's batching support, but it does not help in this case: it does only work on a single audio and besides this if we wait for enough transcriptions so that a batch makes sense we introduce artificial waiting times for users, so latency would go up. So currently it's one GPU per user (=one RealtimeSTT instance setup to use a single dedicated GPU per user), otherwise you run into latency problems.
I know this project is designed for running a local model, but have you considered adding optional support for cloud STT models like Azure STT? I can see a strong use case for this, especially for companies using contact centers that require real-time transcription for their agents.
A hybrid approach could be valuable—defaulting to a local model but allowing cloud STT integration as an option? Would open this project up to more usecases :)
Thank you for such an amazing work. I wonder if one wants to process multiple users concurrently, how this library can be used in such a scenario with real-time transcription. Furthermore, can we achieve this with a single inference of the model? Especially if one user takes a pause or not speaking then inference for another user should be possible.
The text was updated successfully, but these errors were encountered: