SubtitleAI is a tool that processes YouTube videos to generate scene descriptions, translate them, and create a subtitled video with text-to-speech narration. This project leverages state-of-the-art models for video processing, translation, and text-to-speech synthesis.
test_with_subtitles_with_tts.mp4
Note: This is a prototype and further development is planned to enhance its features and capabilities.
- Download YouTube Videos: Automatically download videos from YouTube using a URL.
- Scene Detection: Detects scene transitions in the video.
- Frame Description: Generates English descriptions for each scene using a pre-trained model.
- Translation: Translates descriptions to Turkish using the M2M100 model.
- Subtitled Video: Creates a video with subtitles based on the translated descriptions.
- Text-to-Speech: Generates a narrated video using TTS models for English and Turkish.
- Summary Generation: Provides a summary of the video content in the selected language using Ollama.
-
Clone the repository:
git clone https://github.com/oztrkoguz/SubtitleAI.git cd SubtitleAI
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Run the application:
python app.py
-
Upload a video or enter a YouTube URL: You can either upload a video file or provide a YouTube URL for processing.
-
Select Language: Choose the language for descriptions and voice (English or Turkish).
-
Customize Subtitles: Adjust font size, color, and position for subtitles.
-
Process Video: Click the "Process Video" button to start the processing.
-
Output: The processed video with subtitles and narration will be generated, along with a text summary.
- describe.py: Handles video downloading, scene detection, frame description, and translation.
- subtitle.py: Manages subtitle creation and video rendering.
- app.py: Main application logic using Gradio for the user interface.
- tts.py: Generates text-to-speech audio for the video.
- MiniCPM: For generating scene descriptions.MiniCPM
- M2M100: For translating descriptions from English to Turkish.M2M100
- TTS Models: For generating audio narration in English and Turkish.TTS Models
- Ollama: For generating a coherent summary of the video content.Mistral
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
For any questions or support,open an issue on GitHub.