This is a modified version of P2FA for Python3 compatibility. Everything else remains the same as the original P2FA. Forced alignment helps to align linguistic units (e.g., phoneme or words) with the corresponding sound file. All you need is to have a sound file with a transcription file. The output will be .TextGrid file with time-aligned phone and word tiers.
This was tested on macOS Sierra and Arch Linux.
First, you need to download HTK source code (http://htk.eng.cam.ac.uk/). This HTK installation guide is retrieved from Link. Installation is based on macOS Sierra.
Note: I couldn't run HTK-3.4.1 on Arch Linux. I switched to 3.4.0 and everything works fine. Installation of HTK is the same as the one described below.
Unzip HTK-3.4.1.tar.gz file
$ tar -xvf HTK-3.4.1.tar.gz
After extracting the tar file, switch to htk directory.
$ cd htk
Compile HTK in the htk directory.
$ export CPPFLAGS=-UPHNALG
$ ./configure --disable-hlmtools --disable-hslab
$ make clean # necessary if you're not starting from scratch
$ make -j4 all
$ sudo make -j4 install
$ sudo apt-get install sox
# or in Arch
$ sudo pacman -S sox
# or using brew
$ brew install sox
$ python align.py examples/ploppy.wav examples/ploppy.txt examples/ploppy.TextGrid
You can invoke the aligner from your code:
from p2fa import align
phoneme_alignments, word_alignments = align.align('WAV_FILE_PATH', 'TRANSCRIPTION_FILE_PATH')
- http://www.ling.upenn.edu/phonetics/p2fa/
- Jiahong Yuan and Mark Liberman. 2008. Speaker identification on the SCOTUS corpus. Proceedings of Acoustics '08.
- https://github.com/prosodylab/Prosodylab-Aligner (P2FA seems better than Prosodylab-Aligner based on my qualitative evaluation)
- English HMM-state level aligner: Link
- Korean Forced Aligner: Link from EMCSLabs.