RAG with Docling and LlamaIndex for Personal Documents

This repository demonstrates how to use Docling and LlamaIndex to build a RAG (Retrieval-Augmented Generation) system with personal documents in PDF format. The goal is to allow language models to access and process information stored locally in personal documents to answer questions accurately and efficiently.

Repository Features

RAG for personal documents: The code allows you to ask questions directly about the content of personal PDF documents.
Docling: Used for document processing and analysis.
LlamaIndex: Responsible for indexing and building an efficient data retrieval pipeline.

Repository Files

README.md: This documentation file.
run_first_prepare_data.ipynb: A Jupyter Notebook dedicated to the preparation of data for the Retrieval-Augmented Generation (RAG) system.
run_second_qa.ipynb: A Jupyter Notebook designed to implement the question-answering (QA) capabilities of the RAG system.

Code Features

Loading PDFs: The code uses libraries to load and process files in PDF format.
Content Indexing: Documents are processed and indexed using LlamaIndex.
Query and Response Generation: It is possible to ask questions based on the content of the documents and obtain accurate answers.
Simple Interface: Implemented in Jupyter Notebook to facilitate execution and understanding of the workflow.

Prerequisites

Make sure you have the following items installed in your environment:

Python: Version 3.12 or higher.
Miniconda:
- Install Miniconda according to your operating system:
  - Download Miniconda
- After downloading, follow the installation instructions available on the official website.
VSCode:
- Download and install Visual Studio Code:
  - Download VSCode
- Install the recommended extensions:
  - Python Extension: For Python support.

Como Configurar o Ambiente

Clone this repository:

git clone https://github.com/homerokzam/rag-docling-llamaindex.git
cd rag-docling-llamaindex

Create and activate the virtual environment using Miniconda:

conda create -n venv-rag-docling-llamaindex python=3.12.7
conda activate venv-rag-docling-llamaindex

Install the Jupyter kernel and dependencies:

pip install ipykernel
pip install -r requirements.txt

Open the repository in VSCode:
```
code .
```
Ensure that the Python and Jupyter extensions are installed in VSCode.
Select the kernel of the virtual environment created in Jupyter Notebook.
Create the directories: database, input/pdfs, and input/mds.
Copy your files to the directory: input/pdfs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG with Docling and LlamaIndex for Personal Documents

Repository Features

Repository Files

Code Features

Prerequisites

Como Configurar o Ambiente

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_first_prepare_data.ipynb		run_first_prepare_data.ipynb
run_second_qa.ipynb		run_second_qa.ipynb

homerokzam/rag-docling-llamaindex

Folders and files

Latest commit

History

Repository files navigation

RAG with Docling and LlamaIndex for Personal Documents

Repository Features

Repository Files

Code Features

Prerequisites

Como Configurar o Ambiente

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages