能否将更多的操作移动到GPU? #739
vonchenplus
started this conversation in
General
Replies: 1 comment
-
同问,是否可以在内存基本够用的情况下,通过提升显存来提高token的速度?KT是否有相应的分配机制? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
目前加载DeepseekV3-q4km (int4)使用了大量内存和少量的显存,是不是可以考虑将更多的操作放到GPU。
目前的策略无法提供很高的并发,所以如果能有一个机制可以配置更多参数到GPU,提高并发量,会不会更好?
Beta Was this translation helpful? Give feedback.
All reactions