You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I quantize Qwen2/Qwen2.5 with the command as below:
python -m deepcompressor.app.llm.ptq
configs/qoq-g128.yaml
--model-name qwen-2-7b --model-path /PATH/TO/QWEN-2-7B
--smooth-proj-alpha 0 --smooth-proj-beta 1
--smooth-attn-alpha 0.5 --smooth-attn-beta 0
I deploy the quantized qwen2 with tensorrt-llm, but the inference output is wrong.
But I quantize Qwen1.5 with the same command and deploy it with Tensorrt-llm, it works well, inference resut is right.
So, is the smooth value not correct for Qwen2? How to set smooth value for different model?
The text was updated successfully, but these errors were encountered:
Hi,
python -m deepcompressor.app.llm.ptq
configs/qoq-g128.yaml
--model-name qwen-2-7b --model-path /PATH/TO/QWEN-2-7B
--smooth-proj-alpha 0 --smooth-proj-beta 1
--smooth-attn-alpha 0.5 --smooth-attn-beta 0
But I quantize Qwen1.5 with the same command and deploy it with Tensorrt-llm, it works well, inference resut is right.
So, is the smooth value not correct for Qwen2? How to set smooth value for different model?
The text was updated successfully, but these errors were encountered: