Connect your knowledge to any RAG system
Simba is an open-source, portable Knowledge Management System (KMS) designed specifically for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its intuitive UI, modular architecture, and powerful SDK, Simba simplifies knowledge management, allowing developers to focus on building advanced AI solutions.
- 🔌 Powerful SDK: Comprehensive Python SDK for easy integration.
- 🧩 Modular Architecture: Flexible integration of vector stores, embedding models, chunkers, and parsers.
- 🖥️ Modern UI: User-friendly interface for managing document chunks.
- 🔗 Seamless Integration: Effortlessly connects with any RAG-based system.
- 👨💻 Developer-Centric: Simplifies complex knowledge management tasks.
- 📦 Open Source & Extensible: Community-driven with extensive customization options.
Ensure you have the following installed:
pip install simba-client
Leverage Simba's SDK for powerful programmatic access:
from simba_sdk import SimbaClient
client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first
document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]
parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True)
retrieval_results = client.retriever.retrieve(query="your-query")
for result in retrieval_results["documents"]:
print(f"Content: {result['page_content']}")
print(f"Metadata: {result['metadata']['source']}")
print("====" * 10)
Explore more in the Simba SDK documentation.
Install Simba core :
pip install simba-core
Or Clone and set up the repository:
git clone https://github.com/GitHamza0206/simba.git
cd simba
poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate
Create a .env
file:
OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=localhost
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1
Configure config.yaml
:
# config.yaml
project:
name: "Simba"
version: "1.0.0"
api_version: "/api/v1"
paths:
base_dir: null # Will be set programmatically
faiss_index_dir: "vector_stores/faiss_index"
vector_store_dir: "vector_stores"
llm:
provider: "openai"
model_name: "gpt-4o-mini"
temperature: 0.0
max_tokens: null
streaming: true
additional_params: {}
embedding:
provider: "huggingface"
model_name: "BAAI/bge-base-en-v1.5"
device: "mps" # Changed from mps to cpu for container compatibility
additional_params: {}
vector_store:
provider: "faiss"
collection_name: "simba_collection"
additional_params: {}
chunking:
chunk_size: 512
chunk_overlap: 200
retrieval:
method: "hybrid" # Options: default, semantic, keyword, hybrid, ensemble, reranked
k: 5
# Method-specific parameters
params:
# Semantic retrieval parameters
score_threshold: 0.5
# Hybrid retrieval parameters
prioritize_semantic: true
# Ensemble retrieval parameters
weights: [0.7, 0.3] # Weights for semantic and keyword retrievers
# Reranking parameters
reranker_model: colbert
reranker_threshold: 0.7
# Database configuration
database:
provider: litedb # Options: litedb, sqlite
additional_params: {}
celery:
broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}
Start the server, frontend, and parsers:
simba server
simba front
simba parsers
Deploy Simba using Docker:
- CPU:
DEVICE=cpu make build
DEVICE=cpu make up
- NVIDIA GPU:
DEVICE=cuda make build
DEVICE=cuda make up
- Apple Silicon:
DEVICE=cpu make build
DEVICE=cpu make up
- 💻 pip install simba-core
- 🔧 pip install simba-sdk
- 🌐 www.simba-docs.com
- 🔒 Auth & access management
- 🕸️ Web scraping
- ☁️ Cloud integrations (Azure/AWS/GCP)
- 📚 Additional parsers and chunkers
- 🎨 Enhanced UX/UI
We welcome contributions! Follow these steps:
- Fork the repository
- Create a feature or bugfix branch
- Commit clearly documented changes
- Submit a pull request
For support or inquiries, open an issue on GitHub or contact Hamza Zerouali.