You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
硬件:AVX2 18cores*2 1t内存 24G显存
模型: DeepSeek-R1-Q4_K_M
ktransformers:0.2.2rc1+cu128torch26avx2
双节点部署
prefill:25-35 tokens/s
decode:4-5 tokens/s
目前使用低成本方案进行生产环境测试最大的阻碍反而是prefill速度,官方测试是54.21 (32 cores) → 74.362 (dual-socket, 2×32 cores),这个速度在长文本问答环境仍然需要很长的等待时间,而用户输入时prefill过程不像decode过程可以流式输出,所以prefill速度很影响用户体验,希望有途径能够大幅提高AVX的prefill速度
Beta Was this translation helpful? Give feedback.
All reactions