-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Top-K Sampling Support #59
Comments
Thanks for bringing this to our attention. The MLX core library does indeed support a top-k matrix operation, but the MLX LLM library does not support top-k sampling. Here are the supported generation/sampling options as of today https://github.com/ml-explore/mlx-examples/blob/dfa4dd6/llms/mlx_lm/utils.py#L200-L215 . Please track this issue ml-explore/mlx-examples#1167 for adding support in |
Looks like the upstream PR got merged, any chance we could have the sampler in the MLX engine? |
Shouldn't be difficult to add to our app now, I should be able to get this done soon. |
@mikemykhaylov FYI this is slightly delayed since we need to update the LM Studio UI to inform users about the limitation described here ml-explore/mlx-examples#1219 |
Hello,
I find that for some models, like Mistral Nemo, it is very beneficial to restrict the number of considered completions to improve the coherence of the output. However, compared to llama.cpp, MLX does not seem to support top K sampling in LM Studio. It does look like the underlying library supports that, so implementing that would be much appreciated.
llama.cpp models support Top-K

MLX models do not support Top-K

The text was updated successfully, but these errors were encountered: