Library for Microsoft Cognitive Services speech recognition. For more details about usage, take a look at my blog post.
This is the very first version that works. Do not use it in any serious application yet!
- libwebsockets at least v2.1-stable, v2.2-stable.
- json-c.
- libuuid
autoreconf --force --install
./configure
make
Start by running exampleProgram
to learn how to use the library:
Usage: exampleProgram [OPTION...] <key> <language>
-d Produce debug output.
-f FILE Audio input file, stdin if omitted.
-m MODE Recognition mode:
-p MODE Set profanity handling mode {raw|masked|removed}. Default is masked.
{interactive|dictation|conversation}. Default is interactive.
-t Request detailed recognition output.
To recognize a file:
exampleProgram -f <path to wav> -m interactive <your subscription key> en-us
On Linux, you can stream audio directly from microphone using Debian alsa-utils
:
arecord -c 1 -r 16000 -f S16_LE | ./exampleProgram -m interactive <your subscription key> en-us
or perform long dictation on Steve Jobs Standford University commencement speech:
curl -L -s https://archive.org/download/SteveJobsSpeechAtStanfordUniversity/SteveJobsSpeech_64kb.mp3 | \
mpg123 -w - -m -r 16000 -e s16 - | \
./exampleProgram -m dictation <your subscription key> en-us
More explanation and details on how to use the library can be found in this blog post.