Synchronous and Asynchronous Mic #402

aaronchantrill · 2024-03-02T22:33:56Z

Description

I have turned the Mic class into an abstract class and used it to create two new classes, MicSynchronous and MicAsynchronous. I'm hoping to expand it to all the different mic classes, including the local (text) mic and the batch mic.

I'm attempting to support both active listen mode (where the computer only starts listening for a command after hearing its wakeword - Siri-like mode) and passive listen mode (where the computer records blocks of audio, then checks for the wake word and then checks the same block of audio for a command - Echo-like).

Right now, I am having trouble with the expect function when using passive listen mode with the asynchronous listener. This has to do with the pyaudio device play_file, which returns when it has finished writing to the queue, but before the audio is done playing. This leads to situations where the next audio starts getting queued before the last audio finishes playing. If the audio's have different frame sizes, this leads to a segmentation fault.

I have been testing by using the "knock knock joke" and "time" speechhandler plugins. Knock-knock joke uses expect quite a bit. I have been using Pocketsphinx_KWS for my passive STT engine, Pocketsphinx for my special STT engine and VOSK (which is available here: https://github.com/aaronchantrill/Naomi_VOSK_STT) as my active STT engine. VOSK works well, at least in English, but requires some additional training if you have non-standard words in your vocabulary. I'd like to make VOSK officially available through NPE but the last time I trained VOSK to recognize some additional words, it required a computer with 32GiB of ram. I will test on my Raspberry Pi 5 with 8 GiB and see if it can handle it, but have low expectations. I would like to add an option to export the Naomi vocabulary so VOSK can be trained on another computer, as it does run well on the Raspberry Pi 4 and 5.

Related Issue

Naomi does not listen while thinking #340

Motivation and Context

The microphone does not currently continue to collect audio while Naomi is processing. This is especially a problem when entering a room, as the VAD still often captures noises as audio to process. If you walk into the room and then address Naomi while it is processing the audio of you walking into the room, it will miss your request.

How Has This Been Tested?

I have tested with both "listen while talking=True" (asynchronous) and "listen while talking=False" (synchronous) modes.
I have tested with both "passive_listen=True" (passive listening) and "passive_listen=False" (active listen) modes
I have been testing by asking Naomi to tell me a knock-knock joke (which uses the "expect" method) and then either allowing it to finish the joke, or asking it to tell me the time before it completes the joke:
User: Tel me a knock knock joke
Naomi: Knock knock
User: Naomi, what time is it?
Naomi: It is 12:15 PM right now

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

I have turned the Mic class into an abstract class and used it to create two new classes, MicSynchronous and MicAsynchronous. I'm hoping to expand it to all the different mic classes, including the local (text) mic and the batch mic. I'm attempting to support both active listen mode (where the computer only starts listening for a command after hearing its wakeword - Siri-like mode) and passive listen mode (where the computer records blocks of audio, then checks for the wake word and then checks the same block of audio for a command - Echo-like). Right now, I am having trouble with the expect function when using passive listen mode with the asynchronous listener. This has to do with the pyaudio device play_file, which returns when it has finished writing to the queue, but before the audio is done playing. This leads to situations where the next audio starts getting queued before the last audio finishes playing. If the audio's have different frame sizes, this leads to a segmentation fault. I have been testing by using the "knock knock joke" and "time" speechhandler plugins. Knock-knock joke uses expect quite a bit. I have been using Pocketsphinx_KWS for my passive STT engine, Pocketsphinx for my special STT engine and VOSK (which is available here: https://github.com/aaronchantrill/Naomi_VOSK_STT) as my active STT engine. VOSK works well, at least in English, but requires some additional training if you have non-standard words in your vocabulary. I'd like to make VOSK officially available through NPE but the last time I trained VOSK to recognize some additional words, it required a computer with 32GiB of ram. I will test on my Raspberry Pi 5 with 8 GiB and see if it can handle it, but have low expectations. I would like to add an option to export the Naomi vocabulary so VOSK can be trained on another computer, as it does run well on the Raspberry Pi 4 and 5.

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

naomi/mic.py

Mic class is a child class of i18n.GettextMixin but was not using that parent class correctly. Edited the __init__ method.

aaronchantrill · 2024-03-03T18:40:27Z

One thing I'm not real happy with is having the listen() and active_listen() methods returning both the transcription and the audio itself. This would be a breaking change, although it probably needs to happen since I also want to add the speaker's identity and may come up with additional needs moving forward. I am planning to create a new Utterance class that will contain additional meta-information that will be made available. I'll define a default property so that referencing the utterance object directly will return the transcription, which should make it work with plugins that call listen() expecting a string.

I added an Utterance class that returns information from the mic listen() and active_listen() methods. This object returns the transcription when called without an parameters, so it is backwards compatible with plugins that expect to get a string or list of strings back from those methods.

aaronchantrill · 2024-03-03T21:18:28Z

I think this is ready to go now. If anyone is interested, please try it. Let me know if you encounter any issues. If not, I will merge it in a week.

aaronchantrill self-assigned this Mar 2, 2024

aaronchantrill added Status: In Progress Status: Review Needed Type: Enhancement Priority: Low and removed Status: In Progress labels Mar 2, 2024

github-advanced-security bot found potential problems Mar 2, 2024

View reviewed changes

Small fixes for flake8

c80299a

github-advanced-security bot found potential problems Mar 3, 2024

View reviewed changes

naomi/mic.py Fixed Show fixed Hide fixed

naomi/mic.py Fixed Show fixed Hide fixed

Fix Mic initialization

77b4f54

Mic class is a child class of i18n.GettextMixin but was not using that parent class correctly. Edited the __init__ method.

aaronchantrill marked this pull request as draft March 3, 2024 17:25

aaronchantrill marked this pull request as ready for review March 3, 2024 21:17

aaronchantrill added the Status: Completed label Mar 3, 2024

aaronchantrill merged commit 579a8b3 into NaomiProject:naomi-dev Mar 10, 2024
4 checks passed

aaronchantrill mentioned this pull request Oct 19, 2024

Remove profile value passing #279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronous and Asynchronous Mic #402

Synchronous and Asynchronous Mic #402

aaronchantrill commented Mar 2, 2024

github-advanced-security bot left a comment

aaronchantrill commented Mar 3, 2024 •

edited

Loading

aaronchantrill commented Mar 3, 2024

Synchronous and Asynchronous Mic #402

Synchronous and Asynchronous Mic #402

Conversation

aaronchantrill commented Mar 2, 2024

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

github-advanced-security bot left a comment

Choose a reason for hiding this comment

aaronchantrill commented Mar 3, 2024 • edited Loading

aaronchantrill commented Mar 3, 2024

aaronchantrill commented Mar 3, 2024 •

edited

Loading