This folder want to solve the issue of technical documentation and huge laboratory notes: How to retrieve easily the information I need from a big corpora of documentation? The solution is Retrieval-Augmented Generation (RAG). RAG-LLM-LOCAL is an open source project which uses LongChain and Ollama to perform RAG through an embedding model and uses an LLM to reply the specific question of the user, leveraging the knowledge in the documentation retrieved by RAG.
This script is useful for tools like:
- technical documentation
- laboratory diary
- knowledge base system (like Obsidian)
which are saved as Markdown files in a nested directory (directory with subdirectory). Files which are not Markdown are not considered by the program.
Why Markdown only? Markdown is the format used in Obsidian and documentation in repos, and it is open source.
In config.py you can set:
- DOCS_DIRECTORY: (str) : Directory with your documentation.
- DATABASE_DIRECTORY: (str) : Directory with your database.
- DATABASE_CREATION (bool): if True, the vector database is created: fetch the documents, split them in chunks and embed them.
- If False, just retrieve the vector database with the embeddings, ready for query.
- EMBEDDING_MODEL (str): give the Ollama model for embedding.
- LLM_MODEL (str): give the Ollama model for LLM task.
- RAG_LLM (bool): Choose between RG+LLM and just information retrieval.
- Run RAG with an LLM locally, to maintain privacy on your data and do it for free (no API_KEY needed), and offline.
- Using the command line and use as few packages as possible has been a design choice to make the program easily portable and maintainable by the community
- Choose your model for embedding and LLM from the one available in Ollama.
- Automatic correct data fetching of Markdown files (in some other RAGs on markdown, the files are not correctly fetched, stripping for example the titles and incorrectly splitting data. In this project, markdown headers are maintained and split is based on this, as it is supposed that content is divided in paragraphs).
- Prompt engineered for RAG-LLM interplay.
A virtual environment is preferred, but optional.
- First of all, install the package required to run the application:
pip install -r requirements.txt
- Then, make sure to have Ollama installed (it is literally one line of code in the terminal).
- Make the local server on your machine to run the neural networks:
ollama serve &
The neural network I suggest to install (using ollama pull <model>
) are Qwen7B for LLM and nomic-embed-txt for the embeddings.
- You are ready to run the program and query your documentation! Run the program:
python main_rag.py
You should see something like:
$ python main_rag.py
Welcome to information retrieval LLM! To exit, type `exit`
Ask something
As the splitting is made with the assumptions that ideas are divided in paragraphs (with headers from # to ####) The data you feed should be with headers and subheaders, in a structured manner. This is not only good for the RAG-LLM but actually also for the documentation itself!
Please feel free to contact me for any queries/collaborations!
- Implements python and Jupyter notebooks as docs for code base queries.
- feel free to propose!