MeloTTS API Server provides an interface to generate high-quality text-to-speech (TTS) audio using the MeloTTS model. The server exposes a RESTful API, allowing users to convert text into speech with support for all base configurations such as speaker voice, speed, and sampling rate.
- Streamlined REST API for TTS conversion.
- Support for multiple speakers and languages.
- Adjustable parameters for speech speed, noise, and sampling rate.
- Real-time processing for fast and efficient audio generation.
- Easy integration with client applications.
- Python 3.9 or higher
- Required libraries (managed via
requirements.txt
) - Dependencies for audio processing:
libsndfile1
(Linux/Unix)- FFmpeg (if using
pydub
for audio manipulation)
- Docker (optional, for containerized deployment)
-
Clone the repository:
git clone https://github.com/nyedr/MeloTTS_Server_Api.git cd melotts-api-server
-
Install required dependencies:
pip install -r requirements.txt
-
Download necessary linguistic resources and pre-trained models:
python -m unidic download python melo/init_downloads.py
-
Run the server:
python main.py
-
The server will start on
http://127.0.0.1:8000
by default.
Generates a speech audio file from the given text.
- URL:
http://127.0.0.1:8000/tts/generate
- Method:
POST
- Headers:
Content-Type: application/json
- Request Body:
{ "text": "Your text to convert to speech.", "voice_id": "EN-US", "sr": 22050, "speed": 1.0 }
- Response: Streams the audio file in WAV format.
Fetches the list of available speaker IDs.
- URL:
http://127.0.0.1:8000/speakers
- Method:
GET
- Response:
{ "available_speakers": ["EN-US", "EN-BR", "EN-AU", "EN-INDIA", "EN-Default"] }
curl -X POST "http://127.0.0.1:8000/tts/generate" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, this is a test of the MeloTTS API.",
"voice_id": "EN-US",
"sr": 22050,
"speed": 1.0
}' --output output.wav
import requests
url = "http://127.0.0.1:8000/tts/generate"
data = {
"text": "Hello, this is a test of the MeloTTS API.",
"voice_id": "EN-US",
"sr": 22050,
"speed": 1.0
}
response = requests.post(url, json=data, stream=True)
with open("output.wav", "wb") as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
docker build -t melotts-api-server .
docker run -p 8000:8000 melotts-api-server
You can configure the server behavior using the following environment variables:
Variable | Default Value | Description |
---|---|---|
HOST |
0.0.0.0 |
Server host address. |
PORT |
8000 |
Server port. |
DEFAULT_SPEED |
1.0 |
Default speech speed. |
DEFAULT_LANGUAGE |
EN |
Default language for the TTS model. |
DEFAULT_SPEAKER_ID |
EN-US |
Default speaker voice ID. |
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
-
Clone the repository:
git clone https://github.com/nyedr/MeloTTS_Server_Api.git
-
Create a feature branch:
git checkout -b feature-name
-
Commit your changes and push:
git add . git commit -m "Add new feature" git push origin feature-name
-
Open a pull request on GitHub.
Fork of MeloTTS