The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio samples can be found here.
Feel free to create issues or send an email to if you have problems running the code.
Before running the code, you need to install the dependicies by pip install -r requirements.txt
The configs for model architecture and training scheme is saved in config.yaml
. You can overwrite some of the attributes by adding the --hparams
flag when running a command. The general way to run a python script is
python $SRC$ --config $CONFIG$ --hparams $KEY1$=$VALUE1$,$KEY2$=$VALUE2$,...
for more details.
Before training, you need to binarize the data first. The raw wav files should be put in the hparams['raw_data_path']
. The binarized data would be put in the hparams['binary_data_path']
Specifically, for the VCTK corpus, the file structure should be like
where the model checkpoints are in checkpoints/wsrglow
The command to binarize is
python --config config.yaml
The current WSRGlow model in
is designed for x4 super-resolution and takes waveform, spectrogram and phase information as input.
Run python --config config.yaml
on a GPU.
Change the code in
to specify the checkpoint you want to load and the sample inputs you want to use for inference.
Run python --config config.yaml
on a GPU, modify the code for the correct path of checkpoints and wav files.