You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i head the demo voice in 3.00kbps, and it appears that the ESC result isn't as satisfactory as the DAC result. Could you provide a fair comparison when the parameters are similar?
For example, without reducing the model size by a factor of nine, could we compare the results using the same model size?
The text was updated successfully, but these errors were encountered:
i head the demo voice in 3.00kbps, and it appears that the ESC result isn't as satisfactory as the DAC result. Could you provide a fair comparison when the parameters are similar?
For example, without reducing the model size by a factor of nine, could we compare the results using the same model size?
Hi, thanks for your comment!
ESC is indeed inferior to the original DAC model (Base-DAC) in terms of reconstruction. In our experiments, what we demonstrate is that ESC is much more efficient than Base-DAC(model size, inference latency, etc.). Meanwhile it has better reconstruction performance than another DAC model reproduced in similar parameter number (Tiny-DAC).
We didn't upscale ESC to match Base-DAC as we want a parameter-efficient codec, Base-DAC is actually very slow when inference on CPUs (making it a bad candidate in real application). However, we do believe that scaling ESC up will yield better reconstruction performance due to transformer's scaling capability.
We will include Tiny-DAC outputs in the demo page as well. Besides, we may consider releasing an online speech coding interface to demonstrate additional features such as codec complexity.
Hi,
i head the demo voice in 3.00kbps, and it appears that the ESC result isn't as satisfactory as the DAC result. Could you provide a fair comparison when the parameters are similar?
For example, without reducing the model size by a factor of nine, could we compare the results using the same model size?
The text was updated successfully, but these errors were encountered: