Realtime Subtitles is a Python-based script that captures system audio in real-time, transcribes it using OpenAI Whisper, and optionally translates the transcribed text. The subtitles are displayed in an overlay window on your desktop.
- Captures system audio using loopback devices.
- Transcribes audio in real-time using OpenAI Whisper.
- Supports optional translations using Google Translate or DeepL.
- Customizable subtitle overlay with font, color, size, and transparency options.
- Captures system audio using Python libraries (
soundcard
andsounddevice
). - Sends audio chunks to an OpenAI Whisper API running in a Docker container.
- Displays the transcription as subtitles in a transparent overlay window.
- Optionally translates the transcription using Google Translate or DeepL.
- Python 3.8+
- NVIDIA GPU for running Whisper API (optional for GPU acceleration)
- Docker (for hosting the Whisper API)
-
Clone the repository:
git clone https://github.com/<your_username>/realtime_subtitles.git cd realtime_subtitles
-
Install dependencies:
pip install -r requirements.txt
-
Set up and run the Whisper API:
- Install Docker if not already installed.
- Create and start the Whisper service:
docker-compose up -d
-
Run the script:
python main.py
The config.json
file allows you to customize the following:
font_name
: Font used for subtitles (default:Comic Sans MS
).font_size
: Font size (default: 24).font_color
: Subtitle text color (default:yellow
).translation
: Enable/disable translation (default:False
).deepl_api
: API key for DeepL translation (if enabled).language
: Target language for translation (default:english
).audio_threshold
: Minimum audio level to trigger transcription (default:0.01
).text_expiry
: Time in seconds before the subtitles fade (default:3.0
).window_opacity
: Transparency of the subtitle window (default:0.8
).
The docker-compose.yml
file is included for setting up the Whisper transcription service with GPU support. It uses the onerahmet/openai-whisper-asr-webservice
Docker image.
The following Python libraries are required:
soundcard
sounddevice
numpy
wave
requests
googletrans==4.0.0rc1
deepl
langdetect
tkinter
Feel free to fork the repository and make your own changes! Contributions are always welcome.
This is my shot at this and migth not work perfectly, i invite you to make pull requests and contribute to make this better. (I am kind of a beginner and i was not able to fix that the sometimes it skips audio)
MIT License