Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sd_piper: add module for piper speech synthesis #998

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

samoverton
Copy link

I have been working on a dedicated piper module (for #866). I just pulled from master and noticed that the latest commit (bec5519) alludes to work in progress on a cxxpiper module, so I wanted to share my work asap to ensure that I wasn't stepping on any toes and see if you would like to collaborate.

The module makes use of the user's installation of the piper binary instead of linking in the piper codebase directly. This means that each speak request forks a child process with the appropriate arguments. Communication with the child is done over pipes. Server audio is used so speechd handles output to the configured audio device.

The module supports usual speechd configuration and runtime parameters:

  • AddVoice directives for configuring the voice models to use
  • rate- setting of voice speed (between 0.5x and 2x)
  • TEXT, CHAR, SPELL message types
  • STOP and PAUSE events

As well as some piper specific configuration:

  • Voice sample rate is read from model manifest json file
  • sentence_silence - seconds of silence after each sentence
  • noise_scale - generator noise
  • noise_w - phoneme width noise
  • audio buffer size (ms)

This changeset is based on master, but I originally did this work on the 0.11 branch, so it can be easily backported if required.

@samoverton
Copy link
Author

I just came across #996 and I see that it takes a different approach, but I will leave this work here in case it is useful to anyone, or in the event that bringing piper code into the speech-dispatcher codebase turns out not to be viable.

@azakharchenko-msol
Copy link

@samoverton Great work 👍 works great and solves #999
Not exactly understand the logic of const char* piper_get_voice(SPDMsgSettings* p_settings)
the spd-say works only if I pass -t female1 otherwise it prints "no voice found" in log.
Is the default voice should be set somewhere?

@samoverton
Copy link
Author

the spd-say works only if I pass -t female1 otherwise it prints "no voice found" in log. Is the default voice should be set somewhere?

You can set the default voice for the module in the piper.conf:

DefaultVoice "en_GB-alba-medium.onnx"

Or you can set the default language and voice-type in speechd.conf:

DefaultVoiceType "female1"
DefaultLanguage "en-GB"

You're right that there is some weird behavior here that's worth looking at though. Did you have any of these defaults set in speechd.conf?

@azakharchenko-msol
Copy link

@samoverton Thank you, I missed DefaultVoiceType "female1" in my speechd.conf

@net-ddavies
Copy link
Contributor

Very happy to collaborate with you on piper work. Your work got me started so big Thank You!! Off the top of my head it would be great to unify the .conf as much as we can.

Samuel T mentions sttreaming/pipelining in his review of cxxpiper and
I'm very much thinking of that. It would be very cool to have real
time voice adjustment for any parameter. It would actually be useful
and not just cool when tweaking voices by ear to get them just right.
I've done quite a bit of experimenting with this and ran into trouble
so I shelved it for the moment, but it is a live issue.

I was also thinking of caching and how it might be nice to have a
caching module that all the OM's could share, but that's not piper related, of course.

What are your thoughts on how to work together on stuff?

Thanks!
Derek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants