feat: sync llama.cpp #110

a-ghorbani · 2025-01-21T10:46:59Z

This PR syncs llama.cpp (https://github.com/ggerganov/llama.cpp/releases/tag/b4518).

Closes #109

Vali-98 · 2025-01-21T15:46:11Z

Hey there, I was going to also bump the val[2048] size to [4096] to support the newer DeepSeekR1 prompt format which exceeds the old buffer size in PR #111

I wanted to ask if there was a better method of loading the model metadata, as reserving 4096 bytes for what is often a 2-4 byte float/uint seems somewhat wasteful. Ultimately its a very small optimization in the grand scheme of things, but it would be nice not to unnecessarily reserve memory.

a-ghorbani · 2025-01-21T16:40:19Z

yes agree, I didn't spend time on this as increasing the buffer was a quick solution, although not optimal. If you have a way to optimize it would be nice.

a-ghorbani · 2025-01-22T15:08:36Z

@Vali-98 I am not getting consistent responses for Deepseek R1. Testing on unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q8_0.gguf

The output from Llama.cpp (aea8ddd5) cli seems very sensible and consistent:

> Hi
<think>
Okay, the user just said "Hi". That's pretty straightforward. I should respond in a friendly and welcoming way.

Maybe I can ask them how they're doing or if there's something specific they need help with.

Keeping it simple and open-ended seems like the best approach.
</think>

Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask!

> what is 1+1?
<think>
Alright, the user just asked "what is 1+1?" which is a basic math problem. I need to provide a clear and correct answer.

I should respond by stating the result of the addition and maybe add a fun fact to keep it light and engaging. That should cover it.
</think>

1+1 equals **2**. That's a fundamental fact in mathematics!

> how about 8/2?
<think>
Okay, the user just asked "how about 8/2?" which is a division problem. I should respond in a friendly and helpful manner.

I'll provide the answer and maybe add a fun fact to make it more interesting. That should cover it.
</think>

8 divided by 2 is **4**. That's a simple division fact! I hope that's helpful! 😊

from PocketPal (using the same model and backend llama.cpp version)

from ChatterUI (although, I didn't change any settings not sure if I had to apply any settings).

Are you getting good results with any of DeepSeek-R1-Distills?

Vali-98 · 2025-01-22T16:23:36Z

Hey there, its also not working on my part.

I decided to test bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF to check if this was a model conversion error on unsloth's part, unfortunately not. Testing in Q4_0 also resulted in gibberish output. I'll see if I have time to investigate this thoroughly later.

I only actually tested the 8B distill which works flawlessly.

jhen0409 · 2025-01-23T04:49:39Z

I am not getting consistent responses for Deepseek R1. Testing on unsloth/DeepSeek-R1-Distill-Qwen-1.5B-GGUF/DeepSeek-R1-Distill-Qwen-1.5B-Q8_0.gguf

If I just build lib by -march=armv8-a, the q8 model is works fine, so I guess the issue may caused by some cpu features.

a-ghorbani · 2025-01-23T08:25:27Z

If I just build lib by -march=armv8-a, the q8 model is works fine, so I guess the issue may caused by some cpu features.

Good catch! Indeed, using -march=armv8-a works much better (see the screenshot below).

I guess the issue may caused by some cpu features

If that's the case, could this be an issue on the llama.cpp side?

a-ghorbani · 2025-01-23T14:14:40Z

Just wanted to give an update in case you're looking into this. The culprit seems to be +fp16 for Android.

Perhaps we could investigate the compilation process with something like this:

    build_library("rnllama_v8" "-march=armv8-a")
    build_library("rnllama_v8_2" "-march=armv8.2-a")
    build_library("rnllama_v8_2_dotprod" "-march=armv8.2-a+dotprod")
    build_library("rnllama_v8_2_i8mm" "-march=armv8.2-a+i8mm")
    build_library("rnllama_v8_2_dotprod_i8mm" "-march=armv8.2-a+dotprod+i8mm")

and

  if (hasDotProd && hasI8mm) {
        Log.d(NAME, "Loading librnllama_v8_2_dotprod_i8mm.so");
        System.loadLibrary("rnllama_v8_2_dotprod_i8mm");
      } else if (hasDotProd) {
        Log.d(NAME, "Loading librnllama_v8_2_dotprod.so");
        System.loadLibrary("rnllama_v8_2_dotprod");
      } else if (hasI8mm) {
        Log.d(NAME, "Loading librnllama_v8_2_i8mm.so");
        System.loadLibrary("rnllama_v8_2_i8mm");
      } else if (hasFp16) {
        Log.d(NAME, "Loading librnllama_v8_2.so");
        System.loadLibrary("rnllama_v8_2");
      } else {
        Log.d(NAME, "Loading default librnllama_v8.so");
        System.loadLibrary("rnllama_v8");
      }

With this config, I was able to run the setup successfully on the:

OnePlus 6 (loads librnllama_v8_2.so)
Pixel 9 (loads librnllama_v8_2_dotprod_i8mm.so)
Emulator (loads librnllama_v8_2_dotprod.so)

All worked without any obvious issues.

I am no expert in these settings and compiler settings, but since what maters most for ggml/llama.cpp are i8mm and dotprods we should be good?

I'll run a few more tests, and if successful, we could use these settings.

But, this obviously won't help with the issue on iOS.

Vali-98 · 2025-01-23T14:22:59Z

Just as a quick check, iirc many flags are now checked within llama.cpp to check for neon/i8mm compatibility using lm_ggml_cpu_has_neon, lm_ggml_cpu_has_dotprod and lm_ggml_cpu_has_matmul_int8.

Is it possible to now collapse all builds into rnllama_v8_2_dotprod_i8mm" "-march=armv8.2-a+dotprod+i8mm to encompass all arm SOCs, and see if older devices would work?

a-ghorbani · 2025-01-23T14:34:53Z

Just as a quick check, iirc many flags are now checked within llama.cpp to check for neon/i8mm compatibility using lm_ggml_cpu_has_neon, lm_ggml_cpu_has_dotprod and lm_ggml_cpu_has_matmul_int8.

Is it possible to now collapse all builds into rnllama_v8_2_dotprod_i8mm" "-march=armv8.2-a+dotprod+i8mm to encompass all arm SOCs, and see if older devices would work?

unfortunately, won't work:

2025-01-23 15:33:16.443 31626-25330 RNLLAMA_ANDROID_JNI     com.pocketpalai                      I  llama_init_from_model: graph splits = 1
2025-01-23 15:33:16.444 31626-25463 RNLLAMA_LOG_ANDROID     com.pocketpalai                      I  common_init_from_params: setting dry_penalty_last_n to ctx_size = 512
--------- beginning of crash
2025-01-23 15:33:16.444 31626-25463 RNLLAMA_LOG_ANDROID     com.pocketpalai                      W  common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
2025-01-23 15:33:16.471 31626-25464 libc                    com.pocketpalai                      A  Fatal signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0x6ebdfeeb10 in tid 25464 (AsyncTask #1), pid 31626 (com.pocketpalai)
2025-01-23 15:33:16.612 31626-25318 110                     com.pocketpalai                      I   OptJank - total:108 frameGap:108 delta#0#2#1#0#0#107
2025-01-23 15:33:16.612 31626-25318 111                     com.pocketpalai                      I  OptJank - big and big
2025-01-23 15:33:16.953 31626-25318 110                     com.pocketpalai                      I   OptJank - total:290 frameGap:300 delta#266#13#12#0#2#8
2025-01-23 15:33:16.968 25471-25471 DEBUG                   crash_dump64                         A  pid: 31626, tid: 25464, name: AsyncTask #1  >>> com.pocketpalai <<<
2025-01-23 15:33:16.983 25471-25471 DEBUG                   crash_dump64                         A        #00 pc 00000000000ccb10  /data/app/~~3rSfwEXkaoqDDGLws5rnkA==/com.pocketpalai-eRbNUmyfFjybNlMO-a8uvg==/base.apk!librnllama_v8_2_dotprod_i8mm.so (offset 0x2670000) (BuildId: 9f41aa9d98b500a46cf72895ca9d3c69daf184d1)
2025-01-23 15:33:16.983 25471-25471 DEBUG                   crash_dump64                         A        #01 pc 00000000000aa6b4  /data/app/~~3rSfwEXkaoqDDGLws5rnkA==/com.pocketpalai-eRbNUmyfFjybNlMO-a8uvg==/base.apk!librnllama_v8_2_dotprod_i8mm.so (offset 0x2670000) (BuildId: 9f41aa9d98b500a46cf72895ca9d3c69daf184d1)
2025-01-23 15:33:16.983 25471-25471 DEBUG                   crash_dump64                         A        #02 pc 00000000000ba5f0  /data/app/~~3rSfwEXkaoqDDGLws5rnkA==/com.pocketpalai-eRbNUmyfFjybNlMO-a8uvg==/base.apk!librnllama_v8_2_dotprod_i8mm.so (offset 0x2670000) (BuildId: 9f41aa9d98b500a46cf72895ca9d3c69daf184d1)

jhen0409

Thanks! Very appreciate for the testing.

If the new build settings on Android are not breaks anything, we can use it. I have confirmed the problem doesn't happen on iOS.

a-ghorbani added 3 commits January 20, 2025 20:02

feat: sync llama.cpp to b4518

e711996

fix: apply patches

d4a10c4

fix: apply api changes

223aa74

Vali-98 mentioned this pull request Jan 21, 2025

Increase char buffer size #111

Closed

a-ghorbani marked this pull request as ready for review January 22, 2025 21:00

Merge branch 'main' into feat/sync-llama.cpp

fc0b73a

fix: removing +fp16 flag from compiler

4017a2d

a-ghorbani mentioned this pull request Jan 23, 2025

fix(android): remove fp16 compiler flag for llama.rn to resolve deeps… a-ghorbani/pocketpal-ai#179

Merged

7 tasks

Merge branch 'main' into feat/sync-llama.cpp

1f4c1d3

jhen0409 approved these changes Jan 24, 2025

View reviewed changes

jhen0409 merged commit 7e56a2b into mybigday:main Jan 24, 2025
4 checks passed

This was referenced Jan 24, 2025

Support for Deepseek-r1-Qwen tokenizer #114

Closed

DeepSeek-R1-Distill gibberish output on iOS (simulator-only) #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sync llama.cpp #110

feat: sync llama.cpp #110

a-ghorbani commented Jan 21, 2025 •

edited

Loading

Vali-98 commented Jan 21, 2025 •

edited

Loading

a-ghorbani commented Jan 21, 2025

a-ghorbani commented Jan 22, 2025

Vali-98 commented Jan 22, 2025 •

edited

Loading

jhen0409 commented Jan 23, 2025

a-ghorbani commented Jan 23, 2025

a-ghorbani commented Jan 23, 2025

Vali-98 commented Jan 23, 2025 •

edited

Loading

a-ghorbani commented Jan 23, 2025

jhen0409 left a comment

feat: sync llama.cpp #110

feat: sync llama.cpp #110

Conversation

a-ghorbani commented Jan 21, 2025 • edited Loading

Vali-98 commented Jan 21, 2025 • edited Loading

a-ghorbani commented Jan 21, 2025

a-ghorbani commented Jan 22, 2025

Vali-98 commented Jan 22, 2025 • edited Loading

jhen0409 commented Jan 23, 2025

a-ghorbani commented Jan 23, 2025

a-ghorbani commented Jan 23, 2025

Vali-98 commented Jan 23, 2025 • edited Loading

a-ghorbani commented Jan 23, 2025

jhen0409 left a comment

Choose a reason for hiding this comment

a-ghorbani commented Jan 21, 2025 •

edited

Loading

Vali-98 commented Jan 21, 2025 •

edited

Loading

Vali-98 commented Jan 22, 2025 •

edited

Loading

Vali-98 commented Jan 23, 2025 •

edited

Loading