Skip to content

xphoenix-ai/rag-services

Repository files navigation

Rag Services - Any Language

  • An end-to-end RAG pipeline with both text and audio input-output support with fully customizable system architecture.
  • Supports cross lingual usage (any source language to any target language)
  • Pluggable modular architecture for any LLM, ASR, TTS, Embedding and Translation technology

System Overview

Model

System Architecture

Model

Demo Interface

simple_chat_2.mp4

Watch our Demo Video

Components

Compute Service:

This is the heavy computation services of the system.

  • LLM Service - LLM is up and running here
  • Embedding Service - Sentence/Document embedding service is running here
  • Translator Service - All direction translation service is running here
  • STT Service - Speech-To-Text service is running here
  • TTS Service - Text-To-Speech service is running here

Bot Backend:

This is the full RAG pipeline which answers a user query using the available knowledge bases fed to the system.

  • Bot Service - The RAG pipeline
  • DB Service - The RAG knowledge base store

Client App:

Client frontend App that the user interacts with the bot/system.

Getting Started

Setup the Environment

  • You can create the conda environment named rag_env with the given environment.yml file.
  conda env create -f environment.yml

Start the System

The 3 services should be run as 3 separate services (in separate terminals).

  • Compute Service is independent of others
  • Bot Backend is depending upon the Compute Service
  • Bot Frontend is depending upon the Bot Backend

You can access the services as follows

  1. Start the compute service

Check .env file and the yml files of each service. You may need to fill certain fields in yml files. In .env file, keep fields empty if a variable should be set as False.

conda activate rag_env
cd compute_service
python main.py
  1. Start the bot backend
  • Check the .env file. Keep fields empty if a variable should be set as False.
  • For advanced PDF processing (i.e. table data extraction) we recommend to use unstructured-API, i.e. PDF_LOADER="Unstructured" in .env (defaults to "PyPDF")
conda activate rag_env
cd bot_backend
python main.py
  1. Start the frontend app
conda activate rag_env
cd bot_frontend
python app_v2.py

Dockerization

The services can be containerized using the following steps.

Build the Image:

docker build -t rag_services .

Run the Container

docker run --gpus all -p 8001:8001 -p 8002:8002 -p 7860:7860 rag_services

You can access the services as follws

Linux:

Windows (127.0.0.1 may not work in Windows):

Roadmap

  • Complete Bot Backend
    • Basic RAG Flow
    • Session Management
    • RAG mode and LLM-only chat mode
    • Handle both text and voice input and output
    • Add knowledge to vector db through API
    • Trace Responses
    • Tool Calling
    • Further Improvements
  • Complete Compute Service
  • Complete Frontend APP
    • Basic chat interface
    • Add knowledge to RAG (i.e. File Upload, URL fetch)
    • Get rid of Gradio
  • Update Docker Image
  • Generalize Multilingual Support
  • Voice Streaming Capability

Contributors

Contact Us

xphoenixai@gmail.com

Social Media

Follow our social media channels for latest updates

Version Version