-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio quality degradation in c5a3e13
"Converted the stream writer to use pyav"
#206
Comments
Using git-bisect I found that commit
MethodI used this curl command to fetch an mp3 file against each commit.
You can see this in the generated files.
Full bisect output
|
b00c9ec
c5a3e136708c28f8118cf8555d6fcd3c173f4407
(since b00c9ec
)
c5a3e136708c28f8118cf8555d6fcd3c173f4407
(since b00c9ec
)c5a3e13
(since b00c9ec
)
c5a3e13
(since b00c9ec
)c5a3e13
"Converted the stream writer to use pyav"
yeah I have been investigating this issue through I'm not fully sure why its happening |
Some further digging shows it might just be forcing a lower bitrate on the mp3. Pre change Post change
https://www.ibiblio.org/mp3info/mp3info.html
|
do wav files behave the same way? like is their bitrate also lower? |
Does this fix it #207 ? |
Sure does!
|
I also found that simply removing the
|
thanks for telling me. I didn't realize that the rate arg was bitrate xD |
Describe the bug
Somewhere in this window of commits, the audio generated has noticeably degraded in quality.
Producing an MP3 file via the web tool, or via the openai API, the latest version is noticeably muddier.
Both CPU and GPU generation are affected, and on Apple Silicon and on Nvidia GPU.
The test text "The quick brown fox jumps over the lazy dog", with the
af_sky
voice.On versions
b00c9ec
produces a ~35kb mp3 file with better quality7d73c3c
produces a ~12kb "mp3" file with worse "muddier" sounding audio.The
7d73c3c
versions log the following when run throughffprobe
.Screenshots or console output
Comparing the generated files using ffprobe.
Example output
Branch / Deployment used
Tested on
master
, between two commits.b00c9ec
GOOD7d73c3c
BADGit log over period
Tested with both
start-cpu
andstart-gpu
scripts, usinguv
and with all dependencies installed.Operating System
Include the platform, version numbers of your docker, etc. Whether its GPU (Nvidia or other) or CPU, Mac, Linux, Windows, etc.
Reproduced on macOS, and Linux Mint.
Macos on Apple Silicon
Linux Mint running on Core i7 8700k, Nvidia GTX 1080 8GB
Using both cpu and gpu, resulting files are identical.
Additional context
Initially I noticed this difference in quality between the mac cpu outputs, and GPU outputs on the linux box.
When I updated the mac to latest commit
7d73c3c
it started producing the same muddy output files.Originally noticed the usage with the SillyTavern OpenAI compatible TTS extension, but all reproduction steps have been using the Kokoro-FastAPI
:8880/web/
interface.Confirmed not GPU/CPU or OS related.
The text was updated successfully, but these errors were encountered: