❄️ Lexis - Research Assistant 📘 📑

Project X is an advanced research assistant designed to streamline and enhance the research process by leveraging state-of-the-art AI technologies. With hybrid retrieval, multi-modal support, and advanced citation capabilities, Project X empowers researchers to efficiently analyze, retrieve, and cite information from diverse data sources.

❄️ Core Features

1. Robust Retrieval Augmented Generation (RAG) 🤖

Hybrid Retrieval: Combines full-text and vector-based retrieval with re-ranking to ensure the highest quality of results.
Multi-File Search: Search data across multiple documents, not limited to a single file.
AI Agent Assistance: If local data is insufficient, a mode with AI agents can perform searches on the web.
Multi-Agent Architecture: Powered by Autogen and Mistral AI, enabling dynamic and context-aware retrieval-augmented generation (RAG). The agents are designed to collaborate seamlessly by performing advanced tasks:
- Web Search: Agents can autonomously search the web to gather real-time, relevant information for user queries.
- Research Paper Retrieval: By querying Arxiv databases and repositories, agents can locate and summarize research papers or articles pertinent to the topic of interest.
- Knowledge Aggregation: Through intelligent filtering and integration of diverse data sources, the agents ensure high-quality and accurate responses tailored to the user's needs.

2. Multi-Modal Question Answering (QA) 🗨️

Document Parsing: Perform QA on documents containing text, figures, and tables.
Multi-Document Support: Seamlessly analyze and extract insights from multiple documents simultaneously.

3. Advanced Citations 📖

Detailed Citations with Previews:
- Ensure correctness with detailed citations directly in the in-browser PDF viewer.
- View relevant scores and highlights for context.
- Receive warnings if retrieval pipelines return low-relevance articles.
Video Citations (Future Feature):
- Cite specific quotes from videos with accurate timestamps and proper format.
- Save time on manual video referencing, ensuring precision in both time and content.

4. Interactive Mind Maps ✏️

AI-Powered Generation: Create mind maps automatically from any topic or text using Mistral AI.
Interactive Editing:
- Expand nodes to explore related concepts
- Delete nodes and their connections
- Visual node selection and highlighting
Dynamic Visualization: Interactive graph layout with physics simulation for optimal readability.

5. Data Visualization 💹

Automatically generate visualizations from data files using Python and matplotlib.
Interactive plots and charts to support research findings and presentations.

❄️ Technical Architecture

*️⃣ Frontend Stack

Streamlit: Modern Python web framework for interactive UI components
Streamlit-Agraph: Interactive graph visualization for mind maps
Custom CSS: Responsive design with modern animations and styling

*️⃣ Backend Stack

Mistral AI: Large language model for intelligent responses and mind map generation
Snowflake Cortex Search Service: Enterprise data platform for efficient RAG implementation including pre-processing, labelling, storing documents, and searching relevant text chunks.
Trulens: Evaluation and monitoring framework for AI applications
Python 3.8+: Core programming language

*️⃣ Agentic RAG with Multi-Agents Pipeline

User Proxy: Initial query handling
Intent Classifier: A semantic router to identify user's intent, whether reading uploaded documents, searching for papers, searching for real-time data, or general query.
Specialized Agents:
- Document Reading Agent: extracts and retrieves relevant information from uploaded documents using LAYOUT mode, embeddings, and search algorithms.
- Web Search Agent: fetches real-time data via search engine APIs or scraping, with result ranking for relevance.
- Articles Research Agent: searches academic databases for papers relevant to the query.
- Writer Agent: aggregates context from all agents and generates a cohesive, factually accurate response. Prioritizes data based on agent reliability and relevance.
Critic Agent: Response refinement based on user's query, retrieved context information, writer agent's response to reduce hallucinations from LLMs.

*️⃣ Evaluation and Tracking for LLM Experiments

Utilizing Trulens for tracking and evaluating performance across three core metrics:

Answer Relevance: Ensures responses are coherent, accurate, and reasoned.
Context Relevance: Validates the quality of retrieved information, ensuring it aligns with the query.
Groundedness: Functions as hallucination detection, verifying all responses against source documents.

Our development process included creating multiple RAG versions:

Agent-Based Version: Combines multiple AI agents (Document Reading, Article Research, Web Search) with Trulens' context filter guardrails. This approach enhanced context quality, reduced hallucinations, and balanced latency.
Agent-Free Version: Offers simplicity with low latency by retrieving relevant context solely from uploaded documents. Although limited in accessing real-time data, it maintains high context relevance.

Our agent_v3's iterative refinements, including Trulens context filtering at threshold 0.5 and enhanced processing through multiple AI agents, make it a more robust and efficient RAG system compared to Agent_v1. It achieves better groundedness and higher answer quality.

❄️ Installation and Usage

Prerequisites

Python 3.8 or higher
Required Python libraries (see requirements.txt)
Mistral AI API key, OPENAI API key
Snowflake account configuration

Installation

Clone the repository:

git clone https://github.com/stephanienguyen2020/project-x.git
cd project-x

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate # On Windows use: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your secrets: Create .streamlit/secrets.toml following the structure in secrets.toml.sample
Run the application:

streamlit run app.py

The application will be available at http://localhost:8501

❄️ Project Structure

project-x/
├── assistance                # Multi-agents
│   ├── ...
│   ├── writer_agent.py       # Main researcher agent
│
├── components/               # UI components
│   ├── chatbot.py            # Chat interface
│   ├── info_panel.py         # Information panel
│   ├── ...
│   ├── mindmap.py            # Mind map visualization
│   └── settings.py           # Settings interface
│
├── services/                 # Backend services
│   └── rag_agents.py         # RAG version with agents
│   └── rag_no_agents.py      # RAG Version without agents 
├── ...
├── utils/                    # Utility functions
│   └── ...
│   └── trulens_feedbacks.py  # Trulensfeedback metrics
│   └── trulens_utils.py      # Trulens utilities
│
├── prompts/                  # System prompts
├── app.py                    # Main Streamlit application
└── requirements.txt          # Dependencies

Troubleshooting

Connection Issues
- Verify your Mistral AI API key is correct
- Check Snowflake credentials and network access
- Ensure all required environment variables are set
Visualization Problems
- Make sure matplotlib and streamlit-agraph are properly installed
- Check browser console for any JavaScript errors
- Try clearing browser cache and Streamlit cache
Performance Issues
- Adjust Snowflake warehouse size if needed
- Consider reducing chunk size for large documents
- Monitor memory usage for large mind maps

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

🎗️ Support

For support, please:

Check the Issues page
Review existing documentation
Create a new issue if needed

❄️ Acknowledgments

Mistral AI for their powerful language model
Snowflake for enterprise data platform capabilities
Streamlit team for the excellent web framework
All contributors who have helped shape this project

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.devcontainer		.devcontainer
assistance		assistance
components		components
prompts		prompts
services		services
static		static
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
secrets.toml.sample		secrets.toml.sample
terminal		terminal
test.ipynb		test.ipynb
test.py		test.py
test_questions.csv		test_questions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❄️ Lexis - Research Assistant 📘 📑

❄️ Core Features

1. Robust Retrieval Augmented Generation (RAG) 🤖

2. Multi-Modal Question Answering (QA) 🗨️

3. Advanced Citations 📖

4. Interactive Mind Maps ✏️

5. Data Visualization 💹

❄️ Technical Architecture

*️⃣ Frontend Stack

*️⃣ Backend Stack

*️⃣ Agentic RAG with Multi-Agents Pipeline

*️⃣ Evaluation and Tracking for LLM Experiments

❄️ Installation and Usage

Prerequisites

Installation

❄️ Project Structure

Troubleshooting

Contributing

License

🎗️ Support

❄️ Acknowledgments

About

Releases

Packages

Contributors 2

Languages

stephanienguyen2020/lexis

Folders and files

Latest commit

History

Repository files navigation

❄️ Lexis - Research Assistant 📘 📑

❄️ Core Features

1. Robust Retrieval Augmented Generation (RAG) 🤖

2. Multi-Modal Question Answering (QA) 🗨️

3. Advanced Citations 📖

4. Interactive Mind Maps ✏️

5. Data Visualization 💹

❄️ Technical Architecture

*️⃣ Frontend Stack

*️⃣ Backend Stack

*️⃣ Agentic RAG with Multi-Agents Pipeline

*️⃣ Evaluation and Tracking for LLM Experiments

❄️ Installation and Usage

Prerequisites

Installation

❄️ Project Structure

Troubleshooting

Contributing

License

🎗️ Support

❄️ Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages