dc_tts-phonetic-transfer-learning

This repo contains attempts to improve the accuracy of SeanPLeary's dc_tts-transfer-learning code. This is done by first converting all text to a phonetic representation using Kyubyong's g2p library. The conversion is done automatically without invervention from the user. These modifications necessitate changes in the model structure that are not compatible with the original code, so new models are required. A model which has been trained using the LJSpeech dataset can be found here.

Additional Changes

For consitency and reduced code, datasets must match the format of the LJSpeed dataset including using the filename "metadata.csv" rather than "transcript.csv".

Command line options have been added to make it easy to switch between models without editing the hyperparams.py file

prepro.py

The "--data" option allows users to select the directory in which the metadata.csv file should be found. The "mags" and "mels" subdirectories will be created in that directory.

train_transfer.py

The "1" and "2" options select which network to train. "1" for Text2Mel, "2" for SSRN.

The "--data" option opperates the same as with the prepro.py script.

The "--restore" option selects a directory containing a previously trained model. This can be useful if you need to interrupt training and start again later.

The "--new" option prevents the script from loading a previously trained model. If you use this option you should also use the "--all" option.

The "--all" option will train all layers, rather than only the layers selected in the hyperparams.py file.

synthesize.py

The previous version of the code skipped the first line of the text file and the first word of each remaining line. I could see no reason for this behavior so I have changed it. Output filenames were previously indexed from 1, I have changed this to index from 0.

The "--voice" option selects a diretory containing a trained model, similar to the "--data" option of the train_transfer.py script.

The "--text" option selects a text file to read.

The "--outdir" option selects a directory to save the .wav files.

normalize.py

This script accepts a text string. It will output the phonetic representation of that string. This can be useful when tracking down the source of pronounciation issues.

Issues

The models are trained using only ASCII characters. Unrecognized characters are converted to spaces. This can cause problems if unicode characters are used. For example, an ASCII apostrophe will work as expected but a unicode U+2019 will not.

I have had difficulty training exceptionally deep voices. Using the "--all" option to train all layers has been helpful.

g2p can have difficulty with initialisms. For example, "FBI" will not be pronouced correctly. This can be resolved by spacing out the letters as "F B I". You can force a hard-A as "ay"

If you would like to train a model from scratch, the LJSpeech 1.1 dataset can be found here. I recommend using a modified metadata.csv file found here.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
data_load.py		data_load.py
hyperparams.py		hyperparams.py
modules.py		modules.py
networks.py		networks.py
normalize.py		normalize.py
prepo.py		prepo.py
synth_dctts.ipynb		synth_dctts.ipynb
synthesize.py		synthesize.py
test_sentences.txt		test_sentences.txt
train_transfer.py		train_transfer.py
tvars_ssrn.csv		tvars_ssrn.csv
tvars_text2mel.csv		tvars_text2mel.csv
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dc_tts-phonetic-transfer-learning

Additional Changes

prepro.py

train_transfer.py

synthesize.py

normalize.py

Issues

About

Releases

Packages

Languages

kwmkwm/dc_tts-phonetic-transfer-learning

Folders and files

Latest commit

History

Repository files navigation

dc_tts-phonetic-transfer-learning

Additional Changes

prepro.py

train_transfer.py

synthesize.py

normalize.py

Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages