Skip to content

Latest commit

 

History

History
263 lines (196 loc) · 7.27 KB

README.md

File metadata and controls

263 lines (196 loc) · 7.27 KB

Simba - Your Knowledge Management System

Simba Logo

Connect your knowledge to any RAG system

Simba  - Connect your Knowledge into any RAG based system | Product Hunt

License Stars Forks Issues Pull Requests PyPI Downloads

Twitter Follow

📖 Overview

Simba is an open-source, portable Knowledge Management System (KMS) designed specifically for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its intuitive UI, modular architecture, and powerful SDK, Simba simplifies knowledge management, allowing developers to focus on building advanced AI solutions.

Table of Contents

🚀 Features

  • 🔌 Powerful SDK: Comprehensive Python SDK for easy integration.
  • 🧩 Modular Architecture: Flexible integration of vector stores, embedding models, chunkers, and parsers.
  • 🖥️ Modern UI: User-friendly interface for managing document chunks.
  • 🔗 Seamless Integration: Effortlessly connects with any RAG-based system.
  • 👨‍💻 Developer-Centric: Simplifies complex knowledge management tasks.
  • 📦 Open Source & Extensible: Community-driven with extensive customization options.

🎥 Demo

Watch the demo

🛠️ Getting Started

📋 Prerequisites

Ensure you have the following installed:

🔌 Quickstart Simba SDK Usage

pip install simba-client

Leverage Simba's SDK for powerful programmatic access:

from simba_sdk import SimbaClient

client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first 

document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]

parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True)

retrieval_results = client.retriever.retrieve(query="your-query")

for result in retrieval_results["documents"]:
    print(f"Content: {result['page_content']}")
    print(f"Metadata: {result['metadata']['source']}")
    print("====" * 10)

Explore more in the Simba SDK documentation.

📦 Installation

Install Simba core :

pip install simba-core

Or Clone and set up the repository:

git clone https://github.com/GitHamza0206/simba.git
cd simba
poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate

🔑 Configuration

Create a .env file:

OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=localhost
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Configure config.yaml:

# config.yaml

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai"
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "mps"  # Changed from mps to cpu for container compatibility
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # Options: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

🚀 Running Simba

Start the server, frontend, and parsers:

simba server
simba front
simba parsers

🐳 Docker Deployment

Deploy Simba using Docker:

  • CPU:
DEVICE=cpu make build
DEVICE=cpu make up
  • NVIDIA GPU:
DEVICE=cuda make build
DEVICE=cuda make up
  • Apple Silicon:
DEVICE=cpu make build
DEVICE=cpu make up

🏁 Roadmap

  • 💻 pip install simba-core
  • 🔧 pip install simba-sdk
  • 🌐 www.simba-docs.com
  • 🔒 Auth & access management
  • 🕸️ Web scraping
  • ☁️ Cloud integrations (Azure/AWS/GCP)
  • 📚 Additional parsers and chunkers
  • 🎨 Enhanced UX/UI

🤝 Contributing

We welcome contributions! Follow these steps:

  • Fork the repository
  • Create a feature or bugfix branch
  • Commit clearly documented changes
  • Submit a pull request

💬 Support & Contact

For support or inquiries, open an issue on GitHub or contact Hamza Zerouali.