A multilingual tool to convert PDF ebooks to audiobooks using XTTS v2 TTS model by cloning a speaker voice.
- Clone speaker voice
- Keep sentences together
- Clean text input (remove line breaks, etc.)
- Split PDF into pages
- Convert pages to audio
- Add silence between sections
- Concatenate audio files automatically
- Progress tracking and journaling with crash recovery
- Python 3.10+
- pip install -r requirements.txt
- Copy .env.example to .env and fill in the missing values
- Download a speaker voice sample and put it in the source_audio folder (any .wav file of more than 10 seconds should work)
- Put your PDF in the proper folder as specified in the .env file
NOTE: You can and should clean the output by removing the audio_pages folder after you're done (example in clean_output
file)
- python src/main.py
- A folder with all the audio pages of the PDF and their chunks if splitted.
- A final audiobook file as specified in the .env file.
While the model is capable of running on CPU, it's recommended to use a GPU for faster processing.
The model size is slightly smaller than 2GB, so it's recommended to use a GPU with at least 4GB of VRAM or to ensure that your RAM is large enough to handle the model.
We use a lot of small .wav files to enable crash recovery, avoid corruption and to enable progress tracking in a more reliable way.
This project is licensed under the MIT License - see the LICENSE file for details.
- Coqui XTTS v2
- PyTorch
- All the authors of the used libraries