You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently using ktransformers:0.2.1 with docker image and testing DeepSeek V3-Q4_kM with ktransformers.local_chat .
(the used command : python -m ktransformers.local_chat --gguf_path /models/gguf/DeepSeek-V3-Q4_K_M --model_path deepseek-ai/DeepSeek-V3)
But the first prompt goes through successfully, but if by chance I press Enter in the middle or if there is an Enter after the first prompt, the response continues with a completely unrelated Quicksort algorithm C++ implementation, which seems wrong. How can I fix this?
Currently, my execution environment is:
CPU: INTEL(R) XEON(R) SILVER 4509Y, 16 cores per socket, 125GB DRAM / GPU: NVIDIA [GeForce RTX 4090] 24GB.
It’s already quite slow, and having this irrelevant response follow up is highly inefficient.
============ Showing as the following:
Chat: Certainly! Below is a C++ implementation of the Quicksort algorithm:
#include<iostream>
#include<vector>
#include<algorithm>usingnamespacestd;// Function to partition the array and return the pivot indexintpartition(vector<int>& arr, int low, int high) {
int pivot = arr[high]; // Choosing the last element as the pivotint i = low - 1; // Index of the smaller elementfor (int j = low; j < high; j++) {
// If the current element is smaller than or equal to the pivotif (arr[j] <= pivot) {
i++; // Increment the index of the smaller elementswap(arr[i], arr[j]); // Swap the elements
}
===============
Additionally, is there any way to optimize performance in the above environment? It seems that neither the GPU nor the CPU is being fully utilized, and it’s slow. If you have any suggestions, please let me know
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am currently using ktransformers:0.2.1 with docker image and testing DeepSeek V3-Q4_kM with ktransformers.local_chat .
(the used command : python -m ktransformers.local_chat --gguf_path /models/gguf/DeepSeek-V3-Q4_K_M --model_path deepseek-ai/DeepSeek-V3)
But the first prompt goes through successfully, but if by chance I press Enter in the middle or if there is an Enter after the first prompt, the response continues with a completely unrelated Quicksort algorithm C++ implementation, which seems wrong. How can I fix this?
Currently, my execution environment is:
CPU: INTEL(R) XEON(R) SILVER 4509Y, 16 cores per socket, 125GB DRAM / GPU: NVIDIA [GeForce RTX 4090] 24GB.
It’s already quite slow, and having this irrelevant response follow up is highly inefficient.
============ Showing as the following:
Chat: Certainly! Below is a C++ implementation of the Quicksort algorithm:
Beta Was this translation helpful? Give feedback.
All reactions