A conversational AI assistant that answers questions about its creator using RAG (Retrieval Augmented Generation) technology. The bot processes PDF documents containing personal and professional information, creating a knowledge base to provide accurate, context-aware responses about the portfolio owner.
- PDF document processing and vectorization
- Context-aware question answering
- Chat history management
- Professional response formatting
- Integration with OpenAI and Pinecone services
- Python 3.8+
- OpenAI API key
- Pinecone account and API key
- PDF documents containing portfolio information
- Clone the repository:
git clone <your-repository-url>
cd personal-portfolio-helper
- Install required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory with the following variables:
OPENAI_API_KEY=your_openai_api_key
INDEX_NAME=your_pinecone_index_name
PINECONE_API_KEY=your_pinecone_api_key
personal-portfolio-helper/
├── bot.py # Main chatbot implementation
├── reader.py # PDF processing and vector database setup
├── requirements.txt # Project dependencies
├── data/ # Directory for PDF documents
│ └── rag-training-doc.pdf
└── README.md
Handles document processing and vector database setup:
- Loads PDF documents using PyPDFLoader
- Splits text into manageable chunks
- Creates embeddings using OpenAI
- Stores vectors in Pinecone database
Implements the conversational interface:
- Manages chat history
- Processes user queries with context awareness
- Retrieves relevant information from vector storage
- Generates appropriate responses using OpenAI's GPT model
- First, process your PDF documents to create the vector database:
python reader.py
- Start the chatbot:
python bot.py
- Interact with the bot by typing questions. Type 'exit' to end the session.
The project uses several key technologies:
- LangChain for RAG implementation
- OpenAI's embeddings and chat models
- Pinecone for vector storage
- Python's dotenv for environment management
The system follows a two-step process:
- Document Processing: Converting PDF content into searchable vectors
- Interactive QA: Using chat history and context-aware retrieval for accurate responses
Feel free to submit issues and enhancement requests!
Albert Derevski