RAG-Qdrant-pipeline

Overview

A powerful implementation of a Retrieval-Augmented Generation (RAG) model leveraging Qdrant as a vector store and Google Gemini for generative AI. This project enables intelligent document retrieval and response generation based on the context extracted from various PDF documents. 📄

Architecture:

Workflow

📥 Load and process PDF documents from a specified directory.
✂️ Split documents into manageable chunks for efficient processing.
🔗 Use SentenceTransformers for generating embeddings.
💾 Store embeddings in Qdrant for fast retrieval.
🤖 Leverage Google Gemini for advanced question answering.
🔍 Hybrid search implementation combining vector similarity and keyword matching.
📝 Detailed context-aware responses based on user queries.

Technologies Used

Langchain: Framework for working with LLMs and document loaders.
Qdrant: Vector database for efficient storage and retrieval of embeddings.
Google Gemini: Generative AI model for producing intelligent responses.
Sentence Transformers: Used for creating document embeddings.
PyMuPDF: For loading and processing PDF files.

Installation

To set up the project locally, follow these steps:

Clone the repository:

git clone https://github.com/AtharvaKulkarniIT/rag-qdrant-pipeline.git
cd rag-qdrant-pipeline-main

Install the required packages:

pip install langchain PyMuPDF
pip install langchain_google_genai
pip install sentence-transformers

Set up your API keys:

Make sure to add your Qdrant and Gemini API keys to the environment variables or replace them in the code directly.

Usage

Load Documents: Ensure your PDF documents are in the specified data folder.
Run the Script: Execute the notebook to start the RAG processing.

Example

To get a response about the rivers in Maharashtra, you can use the following input:

input_text = "Describe the rivers in Maharashtra"
response = get_gemini_response(input_text)
print(response)

Hybrid Search

The hybrid search combines results from both Qdrant's vector similarity search and keyword searches on the original documents. The results are ranked using Reciprocal Rank Fusion (RRF). 🔄

Contributing

🤝 Contributions are welcome! If you have suggestions for improvements or want to add new features, feel free to create an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rag_qdrant_pipeline.ipynb		rag_qdrant_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-Qdrant-pipeline

Overview

Table of Contents

Workflow

Technologies Used

Installation

Usage

Example

Hybrid Search

Contributing

License

About

Languages

License

AtharvaKulkarniIT/rag-qdrant-pipeline

Folders and files

Latest commit

History

Repository files navigation

RAG-Qdrant-pipeline

Overview

Table of Contents

Workflow

Technologies Used

Installation

Usage

Example

Hybrid Search

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages