High-speed Large Language Model Serving for Local Deployment
-
Updated
Jan 28, 2025 - C++
High-speed Large Language Model Serving for Local Deployment
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Tool for test diferents large language models without code.
Local LLM Inference Library
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
script which performs RAG and use a local LLM for Q&A
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Script which takes a .wav audio file, performs speech-to-text using OpenAI/Whisper, and then, using Llama3, summarization and action point from the transcript generated
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."