Inference result of Qwen2 quantized by QoQ is wrong,how to fix? #44

limertang · 2025-02-06T14:19:25Z

Hi,

I quantize Qwen2/Qwen2.5 with the command as below:
python -m deepcompressor.app.llm.ptq
configs/qoq-g128.yaml
--model-name qwen-2-7b --model-path /PATH/TO/QWEN-2-7B
--smooth-proj-alpha 0 --smooth-proj-beta 1
--smooth-attn-alpha 0.5 --smooth-attn-beta 0
I deploy the quantized qwen2 with tensorrt-llm, but the inference output is wrong.
But I quantize Qwen1.5 with the same command and deploy it with Tensorrt-llm, it works well, inference resut is right.
So, is the smooth value not correct for Qwen2? How to set smooth value for different model?

Provide feedback