RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:
In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.
-
RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
-
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
-
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.
- RK3588 Series
- RK3576 Series
- TinyLLAMA 1.1B
- Qwen 1.8B
- Qwen2 0.5B
- Phi-2 2.7B
- Phi-3 3.8B
- ChatGLM3 6B
- Gemma 2B
- InternLM2 1.8B
- MiniCPM 2B
- You can also download all packages, docker image, examples, docs and platform-tools from RKLLM_SDK, fetch code: rkllm
If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:
https://github.com/airockchip/rknn-toolkit2
- Optimize model conversion memory occupation
- Optimize inference memory occupation
- Increase prefill speed
- Reduce initialization time
- Improve quantization accuracy
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
- Add Server invocation
- Add inference interruption interface
- Add logprob and token_id to the return value
for older version, please refer CHANGELOG