A C++ library for working with the CMU Pronouncing Dictionary
The CMU Pronouncing Dictionary (CMUdict) is
an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.
There's various tooling built around it, e.g. this lovely Python library. But I was working in C++ and WebAssembly, and ended up writing my own interface for using CMUdict.
You can include use this in your own projects: you'll need the header file (include/phonetic.hpp
), the source file (src/phonetic.cpp
), and the CMUdict data, which is included as a sub-module of this repo, and which phonetic.cpp
expects to find at ../data/CMUdict/cmudict-0.7b
, relative to itself.
This can also be compiled with CMake:
mkdir build
cd build
cmake ..
make
Which will also build a build/tests/test_phonetic
Catch-2 test file.
This library can also be compiled to WebAssembly using Emscripten:
mdkir build
cd build
emcmake cmake ..
emmake make
This will generate a phonetic.js
, phonetic.wasm
, and phonetic.data
. Using these you'll be able to call any of the Phonetic
class methods straight from Javascript.
Use this library to convert English words into:
- Possible pronunciations, as ARPABET which encodes IPA into two letter ASCII sequences.
- Possible patterns of syllabic stress, as strings of numbers
0
,1
, &2
, where0
is unstressed,1
is primary stress, and2
is secondary stress. - Possible syllable counts (counting the number of vowel phones).
Note that if a word has multiple pronunciations, stress patterns, or syllable counts, all of these will be returned.
The tests (test/test_phonetic.cpp
) offers a chance to see all of the methods in action.
- Search by phones.
- Search by stress.
- Search by syllable count.
- More useful WebAssembly error handling.