- An end-to-end RAG pipeline with both text and audio input-output support with fully customizable system architecture.
- Supports cross lingual usage (any source language to any target language)
- Pluggable modular architecture for any LLM, ASR, TTS, Embedding and Translation technology
simple_chat_2.mp4
This is the heavy computation services of the system.
- LLM Service - LLM is up and running here
- Embedding Service - Sentence/Document embedding service is running here
- Translator Service - All direction translation service is running here
- STT Service - Speech-To-Text service is running here
- TTS Service - Text-To-Speech service is running here
This is the full RAG pipeline which answers a user query using the available knowledge bases fed to the system.
- Bot Service - The RAG pipeline
- DB Service - The RAG knowledge base store
Client frontend App that the user interacts with the bot/system.
- You can create the conda environment named
rag_env
with the givenenvironment.yml
file.
conda env create -f environment.yml
The 3 services should be run as 3 separate services (in separate terminals).
- Compute Service is independent of others
- Bot Backend is depending upon the Compute Service
- Bot Frontend is depending upon the Bot Backend
You can access the services as follows
- Start the compute service
Check .env file and the yml files of each service. You may need to fill certain fields in yml files. In .env file, keep fields empty if a variable should be set as False
.
conda activate rag_env
cd compute_service
python main.py
- Start the bot backend
- Check the .env file. Keep fields empty if a variable should be set as
False
. - For advanced PDF processing (i.e. table data extraction) we recommend to use unstructured-API, i.e. PDF_LOADER="Unstructured" in .env (defaults to "PyPDF")
conda activate rag_env
cd bot_backend
python main.py
- Start the frontend app
conda activate rag_env
cd bot_frontend
python app_v2.py
The services can be containerized using the following steps.
docker build -t rag_services .
docker run --gpus all -p 8001:8001 -p 8002:8002 -p 7860:7860 rag_services
You can access the services as follws
- compute service: http://127.0.0.1:8001
- bot backend: http://127.0.0.1:8002
- client app: http://127.0.0.1:7860
- compute service: http://host.docker.internal:8001
- bot backend: http://host.docker.internal:8002
- client app: http://host.docker.internal:7860
- Complete Bot Backend
- Basic RAG Flow
- Session Management
- RAG mode and LLM-only chat mode
- Handle both text and voice input and output
- Add knowledge to vector db through API
- Trace Responses
- Tool Calling
- Further Improvements
- Complete Compute Service
- LLM Service
- Embedding Service
- Huggingface
- Sentence Transformers
- Openai
- Translation Service
- Huggingface
- Google Translate API
- ASR Service
- Huggingface
- Openai-whisper
- TTS Service
- Huggingface
- CoquiTTS
- Complete Frontend APP
- Basic chat interface
- Add knowledge to RAG (i.e. File Upload, URL fetch)
- Get rid of Gradio
- Update Docker Image
- Generalize Multilingual Support
- Voice Streaming Capability
Follow our social media channels for latest updates