-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sync llama.cpp #110
feat: sync llama.cpp #110
Conversation
Hey there, I was going to also bump the val[2048] size to [4096] to support the newer DeepSeekR1 prompt format which exceeds the old buffer size in PR #111 I wanted to ask if there was a better method of loading the model metadata, as reserving 4096 bytes for what is often a 2-4 byte float/uint seems somewhat wasteful. Ultimately its a very small optimization in the grand scheme of things, but it would be nice not to unnecessarily reserve memory. |
yes agree, I didn't spend time on this as increasing the buffer was a quick solution, although not optimal. If you have a way to optimize it would be nice. |
@Vali-98 I am not getting consistent responses for Deepseek R1. Testing on The output from Llama.cpp ( > Hi
<think>
Okay, the user just said "Hi". That's pretty straightforward. I should respond in a friendly and welcoming way.
Maybe I can ask them how they're doing or if there's something specific they need help with.
Keeping it simple and open-ended seems like the best approach.
</think>
Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask!
> what is 1+1?
<think>
Alright, the user just asked "what is 1+1?" which is a basic math problem. I need to provide a clear and correct answer.
I should respond by stating the result of the addition and maybe add a fun fact to keep it light and engaging. That should cover it.
</think>
1+1 equals **2**. That's a fundamental fact in mathematics!
> how about 8/2?
<think>
Okay, the user just asked "how about 8/2?" which is a division problem. I should respond in a friendly and helpful manner.
I'll provide the answer and maybe add a fun fact to make it more interesting. That should cover it.
</think>
8 divided by 2 is **4**. That's a simple division fact! I hope that's helpful! 😊 from PocketPal (using the same model and backend llama.cpp version) from ChatterUI (although, I didn't change any settings not sure if I had to apply any settings). Are you getting good results with any of DeepSeek-R1-Distills? |
Hey there, its also not working on my part. I decided to test I only actually tested the 8B distill which works flawlessly. |
If I just build lib by |
Good catch! Indeed, using
If that's the case, could this be an issue on the llama.cpp side? |
Just wanted to give an update in case you're looking into this. The culprit seems to be Perhaps we could investigate the compilation process with something like this:
and
With this config, I was able to run the setup successfully on the:
All worked without any obvious issues. I am no expert in these settings and compiler settings, but since what maters most for ggml/llama.cpp are i8mm and dotprods we should be good? I'll run a few more tests, and if successful, we could use these settings. But, this obviously won't help with the issue on iOS. |
Just as a quick check, iirc many flags are now checked within llama.cpp to check for neon/i8mm compatibility using Is it possible to now collapse all builds into |
unfortunately, won't work:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Very appreciate for the testing.
If the new build settings on Android are not breaks anything, we can use it. I have confirmed the problem doesn't happen on iOS.
This PR syncs llama.cpp (https://github.com/ggerganov/llama.cpp/releases/tag/b4518).
Closes #109