Skip to content

This is a RAG (Retrieval-Augmented Generation) model that leverages Qdrant as a vector store and Google Gemini for intelligent document retrieval and context-aware response generation. It efficiently processes PDF documents to provide detailed answers to user queries based on the extracted context.

License

Notifications You must be signed in to change notification settings

AtharvaKulkarniIT/rag-qdrant-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RAG-Qdrant-pipeline

Overview

A powerful implementation of a Retrieval-Augmented Generation (RAG) model leveraging Qdrant as a vector store and Google Gemini for generative AI. This project enables intelligent document retrieval and response generation based on the context extracted from various PDF documents. 📄

Architecture: Architecture

Table of Contents

Workflow

  • 📥 Load and process PDF documents from a specified directory.
  • ✂️ Split documents into manageable chunks for efficient processing.
  • 🔗 Use SentenceTransformers for generating embeddings.
  • 💾 Store embeddings in Qdrant for fast retrieval.
  • 🤖 Leverage Google Gemini for advanced question answering.
  • 🔍 Hybrid search implementation combining vector similarity and keyword matching.
  • 📝 Detailed context-aware responses based on user queries.

Technologies Used

  • Langchain: Framework for working with LLMs and document loaders.
  • Qdrant: Vector database for efficient storage and retrieval of embeddings.
  • Google Gemini: Generative AI model for producing intelligent responses.
  • Sentence Transformers: Used for creating document embeddings.
  • PyMuPDF: For loading and processing PDF files.

Installation

To set up the project locally, follow these steps:

  1. Clone the repository:

    git clone https://github.com/AtharvaKulkarniIT/rag-qdrant-pipeline.git
    cd rag-qdrant-pipeline-main
  2. Install the required packages:

    pip install langchain PyMuPDF
    pip install langchain_google_genai
    pip install sentence-transformers
  3. Set up your API keys:

    Make sure to add your Qdrant and Gemini API keys to the environment variables or replace them in the code directly.

Usage

  1. Load Documents: Ensure your PDF documents are in the specified data folder.
  2. Run the Script: Execute the notebook to start the RAG processing.

Example

To get a response about the rivers in Maharashtra, you can use the following input:

input_text = "Describe the rivers in Maharashtra"
response = get_gemini_response(input_text)
print(response)

Hybrid Search

The hybrid search combines results from both Qdrant's vector similarity search and keyword searches on the original documents. The results are ranked using Reciprocal Rank Fusion (RRF). 🔄

Contributing

🤝 Contributions are welcome! If you have suggestions for improvements or want to add new features, feel free to create an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This is a RAG (Retrieval-Augmented Generation) model that leverages Qdrant as a vector store and Google Gemini for intelligent document retrieval and context-aware response generation. It efficiently processes PDF documents to provide detailed answers to user queries based on the extracted context.

Topics

Resources

License

Stars

Watchers

Forks