diff --git a/config.yaml b/config.yaml index 156abc7..05d3267 100644 --- a/config.yaml +++ b/config.yaml @@ -12,7 +12,7 @@ paths: llm: provider: "openai" - model_name: "gpt-4o-mini" + model_name: "gpt-4o" temperature: 0.0 max_tokens: null streaming: true diff --git a/docs/api-reference/overview.mdx b/docs/api-reference/overview.mdx index 984ca4f..4c156b1 100644 --- a/docs/api-reference/overview.mdx +++ b/docs/api-reference/overview.mdx @@ -2,97 +2,3 @@ title: 'API Reference Overview' description: 'Complete reference for Simba API endpoints' --- - -# Simba API Reference - -Simba provides a REST API that allows you to interact with all aspects of the system. This reference outlines the key endpoints and usage patterns. - -## API Basics - -### Base URL - -``` -http://localhost:8000 -``` - -### Authentication - -When authentication is enabled, include an API key in the `Authorization` header: - -``` -Authorization: Bearer YOUR_API_KEY -``` - -### Response Format - -All API responses are in JSON with a consistent structure: - -```json -{ - "status": "success", - "data": {}, - "message": "Operation successful" -} -``` - -Error responses: - -```json -{ - "status": "error", - "message": "Error description", - "error_code": "ERROR_CODE" -} -``` - -## API Categories - - - - Manage document uploads and processing - - - Work with document chunks and their metadata - - - Perform semantic searches and knowledge retrieval - - - -## Key Endpoints - -| Endpoint | Method | Description | -|-------------------------------|--------|-----------------------------------| -| `/health` | GET | Check service health | -| `/api/v1/documents` | GET | List all documents | -| `/api/v1/documents` | POST | Upload new document(s) | -| `/api/v1/documents/{id}` | GET | Get document details | -| `/api/v1/chunks` | GET | List document chunks | -| `/api/v1/retrieval/search` | POST | Semantic search in knowledge base | - -## Example Requests - -### Document Upload - -```bash -curl -X POST http://localhost:8000/api/v1/documents \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -F "file=@/path/to/document.pdf" \ - -F "metadata={\"tags\":[\"report\",\"2023\"]}" -``` - -### Semantic Search - -```bash -curl -X POST http://localhost:8000/api/v1/retrieval/search \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" \ - -d '{ - "query": "What is retrieval-augmented generation?", - "top_k": 5 - }' -``` - -## SDK Alternative - -While you can use the REST API directly, the [Simba SDK](/sdk/overview) provides a more convenient way to interact with Simba in Python applications. \ No newline at end of file diff --git a/docs/assets/demo.gif b/docs/assets/demo.gif new file mode 100644 index 0000000..00e9ee2 Binary files /dev/null and b/docs/assets/demo.gif differ diff --git a/docs/assets/logo.png b/docs/assets/logo.png new file mode 100644 index 0000000..48d3813 Binary files /dev/null and b/docs/assets/logo.png differ diff --git a/docs/configuration.mdx b/docs/configuration.mdx index 879813e..711519f 100644 --- a/docs/configuration.mdx +++ b/docs/configuration.mdx @@ -3,227 +3,3 @@ title: 'Configuration' description: 'Learn how to configure Simba for your specific needs' --- -# Configuring Simba - -Simba is designed to be highly configurable, allowing you to adapt it to your specific requirements. This guide covers all the configuration options available. - -## Configuration Methods - -Simba can be configured using: - -1. **Environment Variables**: For simple configuration and deployment environments -2. **Configuration Files**: For more complex setups with multiple options -3. **Programmatic Configuration**: Via the SDK for runtime configuration - -## Environment Variables - - -Environment variables take precedence over configuration files when both are present. - - -### Core Settings - -| Variable | Description | Default | Required | -|----------|-------------|---------|----------| -| `SIMBA_HOST` | Host to bind the server to | `0.0.0.0` | No | -| `SIMBA_PORT` | Port to bind the server to | `8000` | No | -| `SIMBA_LOG_LEVEL` | Logging level (DEBUG, INFO, WARNING, ERROR) | `INFO` | No | -| `SIMBA_ENVIRONMENT` | Environment (development, production) | `development` | No | - -### Database Configuration - -| Variable | Description | Default | Required | -|----------|-------------|---------|----------| -| `SIMBA_DB_URL` | Database connection URL | `sqlite:///simba.db` | No | -| `SIMBA_DB_POOL_SIZE` | Database connection pool size | `5` | No | -| `SIMBA_DB_MAX_OVERFLOW` | Maximum connections overflow | `10` | No | - -### Redis Configuration - -| Variable | Description | Default | Required | -|----------|-------------|---------|----------| -| `REDIS_URL` | Redis connection URL | `redis://localhost:6379/0` | Yes | -| `REDIS_PASSWORD` | Redis password | None | No | -| `REDIS_USE_SSL` | Whether to use SSL for Redis | `false` | No | - -### Vector Store Configuration - -| Variable | Description | Default | Required | -|----------|-------------|---------|----------| -| `VECTOR_STORE_TYPE` | Vector store type (faiss, chroma, pinecone) | `faiss` | No | -| `VECTOR_STORE_PATH` | Path to store vector files | `./vector_stores` | No | -| `PINECONE_API_KEY` | Pinecone API key (if using Pinecone) | None | Only for Pinecone | -| `PINECONE_ENVIRONMENT` | Pinecone environment (if using Pinecone) | None | Only for Pinecone | - -### Embedding Configuration - -| Variable | Description | Default | Required | -|----------|-------------|---------|----------| -| `EMBEDDING_MODEL` | Embedding model to use | `all-MiniLM-L6-v2` | No | -| `EMBEDDING_DIMENSION` | Embedding dimension | `384` | No | -| `HF_TOKEN` | HuggingFace token for private models | None | No | - -## Configuration File - -Simba uses a YAML configuration file (`config.yaml`) for more complex settings. This file should be placed in the root directory of your Simba installation. - -Here's a sample configuration file with all available options: - -```yaml -# config.yaml -server: - host: 0.0.0.0 - port: 8000 - log_level: INFO - environment: development - workers: 4 - -database: - url: sqlite:///simba.db - pool_size: 5 - max_overflow: 10 - echo: false - -redis: - url: redis://localhost:6379/0 - password: null - use_ssl: false - -vector_store: - type: faiss - path: ./vector_stores - pinecone: - api_key: null - environment: null - index_name: simba - chroma: - path: ./chroma_db - -embeddings: - model: all-MiniLM-L6-v2 - dimension: 384 - hf_token: null - -chunking: - chunk_size: 1000 - chunk_overlap: 200 - -parsing: - default_parsers: - - pdf - - docx - - txt - - md - - html - custom_parsers: [] -``` - -## Programmatic Configuration - -You can also configure some aspects of Simba programmatically using the SDK: - -```python -from simba_sdk import SimbaClient - -# Configure the client -client = SimbaClient( - api_url="http://localhost:8000", - timeout=30, - max_retries=3 -) - -# Configure vector store at runtime -client.vector_store.configure( - type="pinecone", - api_key="your-api-key", - environment="production", - index_name="my-index" -) - -# Configure embedding model -client.embeddings.configure( - model="text-embedding-ada-002", - provider="openai", - api_key="your-openai-api-key" -) -``` - -## Advanced Configuration - -### Custom Chunking Strategies - -You can configure custom chunking strategies by modifying the `chunking` section in your configuration file: - -```yaml -chunking: - strategies: - - name: fine - chunk_size: 500 - chunk_overlap: 100 - - name: coarse - chunk_size: 2000 - chunk_overlap: 300 - default_strategy: fine -``` - -### Custom Parsers - -To add custom document parsers, update the `parsing` section: - -```yaml -parsing: - custom_parsers: - - module: my_package.my_parser - class: MyCustomParser - extensions: - - .custom - - .special -``` - -### Authentication Configuration - -For production deployments, you can configure authentication: - -```yaml -auth: - enabled: true - secret_key: your-secret-key - token_expiration: 86400 # 24 hours in seconds - providers: - - type: basic - - type: oauth2 - config: - provider: github - client_id: your-client-id - client_secret: your-client-secret -``` - -## Environment-Specific Configuration - -You can use different configuration files for different environments: - -```bash -# Development environment -simba --config config.dev.yaml - -# Production environment -simba --config config.prod.yaml -``` - -## Verifying Configuration - -To verify your configuration: - -```bash -simba --check-config -``` - -This will validate your configuration and report any issues without starting the server. - -## Next Steps - -With Simba properly configured, you can now: - -- [Upload your first documents](/examples/document-ingestion) -- [Learn about vector stores](/core-concepts/vector-stores) -- [Configure custom embedding models](/core-concepts/embeddings) \ No newline at end of file diff --git a/docs/examples/chainlit-app.mdx b/docs/examples/chainlit-app.mdx index 8c9adc9..90fde7d 100644 --- a/docs/examples/chainlit-app.mdx +++ b/docs/examples/chainlit-app.mdx @@ -2,324 +2,3 @@ title: 'Chainlit Integration' description: 'Building a chat interface with Simba and Chainlit' --- - -# Building a Chat Interface with Simba and Chainlit - -This example demonstrates how to create an interactive chat interface using Simba for knowledge retrieval and Chainlit for the user interface. - -## Prerequisites - -- Simba installed and running -- Python 3.9+ -- Required packages: `simba-client`, `chainlit`, `openai` - -## Project Setup - -1. Create a new directory for your project: - -```bash -mkdir simba-chainlit-app -cd simba-chainlit-app -``` - -2. Install the required packages: - -```bash -pip install simba-client chainlit openai -``` - -3. Create a basic Chainlit app structure: - -``` -simba-chainlit-app/ -├── app.py -├── chainlit.md -└── .env -``` - -## Basic Implementation - -Here's the `app.py` file with a basic implementation: - -```python -import os -import chainlit as cl -from chainlit.playground.providers import ChatOpenAI -from simba_sdk import SimbaClient -from openai import OpenAI - -# Initialize clients -simba_client = SimbaClient(api_url="http://localhost:8000") -openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) - -@cl.on_chat_start -async def start(): - """Setup the chat session""" - # Set up the chatbot settings - await cl.Message(content=""" - # Welcome to Simba Knowledge Assistant - I can answer questions based on your documents. How can I help you today? - """).send() - - # Store the chat history - cl.user_session.set("messages", [ - {"role": "system", "content": "You are a helpful assistant. Answer based on the context provided."} - ]) - -@cl.on_message -async def main(message: cl.Message): - """Process each user message""" - # Get user query - query = message.content - - # Show processing steps - await cl.Message(content=f"Searching for information about: '{query}'...").send() - - # Search using Simba - with cl.Step("Retrieving relevant documents...") as step: - results = simba_client.retrieval.retrieve( - query=query, - top_k=3 - ) - - # Display the retrieved chunks - for i, chunk in enumerate(results): - step.output = cl.Text( - content=chunk["content"], - name=f"Chunk {i+1} (Score: {chunk['score']:.4f})" - ) - - # Prepare context for LLM - context = "\n\n".join([r["content"] for r in results]) - prompt = f"""Answer the following question based on the provided context. If the answer is not in the context, say "I don't have enough information to answer that question." - -Context: -{context} - -Question: {query} -""" - - # Get chat history - messages = cl.user_session.get("messages") - messages.append({"role": "user", "content": prompt}) - - # Generate response using OpenAI - with cl.Step("Generating answer...") as step: - response = openai_client.chat.completions.create( - model="gpt-3.5-turbo", - messages=messages, - temperature=0.2, - ) - answer = response.choices[0].message.content - step.output = answer - - # Update chat history - messages.append({"role": "assistant", "content": answer}) - cl.user_session.set("messages", messages) - - # Send the final answer - await cl.Message(content=answer).send() -``` - -Create a `chainlit.md` file for the welcome screen: - -```markdown -# Simba Knowledge Assistant - -Welcome to the Simba Knowledge Assistant! This app demonstrates integrating Simba's knowledge retrieval capabilities with Chainlit's conversational interface. - -## Features - -- Ask questions about your documents -- View relevant document chunks -- Get AI-generated answers based on your knowledge base - -## How to Use - -1. Type your question in the chat input -2. The system will retrieve relevant information from your documents -3. An AI will generate an answer based on the retrieved information -``` - -Create a `.env` file for your OpenAI API key: - -``` -OPENAI_API_KEY=your-openai-api-key -``` - -## Running the App - -Start the Chainlit app: - -```bash -chainlit run app.py -``` - -Your app will be available at http://localhost:8000 - -## Adding File Upload Support - -To allow users to upload documents directly to Simba through the Chainlit interface, enhance your `app.py`: - -```python -@cl.on_chat_start -async def start(): - """Setup the chat session""" - # Set up the chatbot settings - await cl.Message(content=""" - # Welcome to Simba Knowledge Assistant - I can answer questions based on your documents. You can upload new documents or ask me questions. - """).send() - - # Store the chat history - cl.user_session.set("messages", [ - {"role": "system", "content": "You are a helpful assistant. Answer based on the context provided."} - ]) - - # Store uploaded document IDs - cl.user_session.set("document_ids", []) - -@cl.on_file_upload -async def handle_file_upload(file: cl.File): - """Process uploaded files""" - # Save file temporarily - temp_path = f"/tmp/{file.name}" - with open(temp_path, "wb") as f: - f.write(await file.get_bytes()) - - # Upload to Simba - try: - response = simba_client.documents.create(file_path=temp_path) - document_id = response[0]["id"] - - # Store the document ID - document_ids = cl.user_session.get("document_ids") - document_ids.append(document_id) - cl.user_session.set("document_ids", document_ids) - - # Notify user - await cl.Message( - content=f"✅ Document '{file.name}' uploaded successfully! You can now ask questions about it." - ).send() - - except Exception as e: - await cl.Message( - content=f"❌ Error uploading document: {str(e)}" - ).send() - - # Clean up - os.remove(temp_path) -``` - -## Adding Document Listing - -Add a function to display available documents: - -```python -@cl.action_callback("List Documents") -async def list_documents(): - """List available documents""" - # Get all documents from Simba - try: - documents = simba_client.documents.list() - - if not documents: - await cl.Message(content="No documents found in the knowledge base.").send() - return - - # Create a markdown table of documents - table = "| ID | Filename | Status | Chunks |\n| --- | --- | --- | --- |\n" - for doc in documents: - table += f"| {doc['id']} | {doc['filename']} | {doc['status']} | {doc.get('chunks_count', 'N/A')} |\n" - - await cl.Message(content=f"## Available Documents\n\n{table}").send() - - except Exception as e: - await cl.Message(content=f"Error listing documents: {str(e)}").send() - -# Add actions to the UI -@cl.on_chat_start -async def setup_actions(): - """Setup chat actions""" - cl.Action(name="List Documents", description="Show available documents") -``` - -## Adding Memory Management - -Add conversation memory to make the chat more context-aware: - -```python -from langchain.memory import ConversationBufferMemory - -@cl.on_chat_start -async def start(): - """Setup the chat session""" - # Initialize memory - memory = ConversationBufferMemory(return_messages=True) - cl.user_session.set("memory", memory) - - # Welcome message - await cl.Message(content=""" - # Welcome to Simba Knowledge Assistant - I can answer questions based on your documents. My memory is enabled, so I'll remember our conversation context. - """).send() - -@cl.on_message -async def main(message: cl.Message): - """Process each user message""" - # Get user query and memory - query = message.content - memory = cl.user_session.get("memory") - - # Get chat context - chat_history = memory.chat_memory.messages - - # Search using Simba and generate response (same as before) - # ... - - # Update memory - memory.chat_memory.add_user_message(query) - memory.chat_memory.add_ai_message(answer) - cl.user_session.set("memory", memory) -``` - -## Deployment Considerations - -When deploying your Chainlit app with Simba: - -1. Configure environment variables for production: - ``` - SIMBA_API_URL=https://your-simba-instance.com - OPENAI_API_KEY=your-openai-api-key - ``` - -2. For Docker deployment, create a Dockerfile: - ```dockerfile - FROM python:3.10-slim - - WORKDIR /app - - COPY requirements.txt . - RUN pip install --no-cache-dir -r requirements.txt - - COPY . . - - CMD ["chainlit", "run", "app.py", "--host", "0.0.0.0", "--port", "8000"] - ``` - -3. Create a `requirements.txt` file: - ``` - simba-client - chainlit - openai - ``` - -## Best Practices - -When building Chainlit apps with Simba: - -- **Provide context**: Show users which documents were used to answer their questions -- **Handle errors gracefully**: Display user-friendly messages when Simba or OpenAI encounters issues -- **Add feedback mechanisms**: Allow users to rate answer quality -- **Implement authentication**: Protect sensitive documents with user authentication -- **Monitor usage**: Track token usage and popular queries to optimize costs \ No newline at end of file diff --git a/docs/examples/notebook-example.mdx b/docs/examples/notebook-example.mdx index 94dfeed..f5bccac 100644 --- a/docs/examples/notebook-example.mdx +++ b/docs/examples/notebook-example.mdx @@ -2,291 +2,3 @@ title: 'Jupyter Notebook Example' description: 'Using Simba in Jupyter notebooks for interactive document analysis' --- - -# Using Simba in Jupyter Notebooks - -This example demonstrates how to use Simba in Jupyter notebooks for interactive document analysis and retrieval. - -## Prerequisites - -- Simba installed and running -- Jupyter notebook environment -- Required Python packages: `simba-client`, `pandas`, `matplotlib`, `ipywidgets` - -## Setup - -First, install the necessary packages: - -```bash -pip install simba-client pandas matplotlib ipywidgets -``` - -## Basic Integration - -Here's a complete notebook example showing Simba integration: - -```python -# Import required libraries -import pandas as pd -import matplotlib.pyplot as plt -import ipywidgets as widgets -from IPython.display import display, Markdown -from simba_sdk import SimbaClient - -# Initialize Simba client -client = SimbaClient(api_url="http://localhost:8000") - -# Function to upload a document -def upload_document(file_path): - response = client.documents.create(file_path=file_path) - return response[0]["id"] - -# Function to retrieve information -def search_knowledge(query, top_k=5): - results = client.retrieval.retrieve( - query=query, - top_k=top_k - ) - return results - -# Upload a document -document_id = upload_document("sample-data.pdf") -print(f"Document uploaded with ID: {document_id}") - -# Create a simple search interface -query_input = widgets.Text( - value='', - placeholder='Enter your query', - description='Query:', - disabled=False, - layout=widgets.Layout(width='80%') -) - -top_k_slider = widgets.IntSlider( - value=3, - min=1, - max=10, - step=1, - description='Results:', - disabled=False, - continuous_update=False, - orientation='horizontal', - readout=True, - readout_format='d' -) - -output_area = widgets.Output() - -def on_search_clicked(b): - with output_area: - output_area.clear_output() - print("Searching...") - results = search_knowledge(query_input.value, top_k_slider.value) - - # Display results - for i, chunk in enumerate(results): - display(Markdown(f"### Result {i+1} (Score: {chunk['score']:.4f})")) - display(Markdown(f"```\n{chunk['content']}\n```")) - print("-" * 80) - - # Create a simple visualization of relevance scores - scores = [r['score'] for r in results] - plt.figure(figsize=(10, 4)) - plt.bar(range(len(scores)), scores) - plt.xlabel('Result Index') - plt.ylabel('Relevance Score') - plt.title('Relevance Scores for Query Results') - plt.show() - -search_button = widgets.Button(description="Search") -search_button.on_click(on_search_clicked) - -# Display the search interface -display(widgets.HBox([query_input, search_button])) -display(top_k_slider) -display(output_area) -``` - -## Document Analysis Example - -This example shows how to analyze document chunks and their metadata: - -```python -# Get all chunks for a document -chunks = client.chunks.list(document_id=document_id) - -# Convert to pandas DataFrame for analysis -import pandas as pd - -# Extract metadata and content -chunk_data = [] -for chunk in chunks: - chunk_info = { - 'id': chunk['id'], - 'content_length': len(chunk['content']), - 'page': chunk.get('metadata', {}).get('page', 'N/A'), - } - chunk_info.update(chunk.get('metadata', {})) - chunk_data.append(chunk_info) - -# Create DataFrame -df = pd.DataFrame(chunk_data) - -# Display basic statistics -display(Markdown("## Document Chunk Statistics")) -display(df.describe()) - -# Visualize content length distribution -plt.figure(figsize=(10, 6)) -plt.hist(df['content_length'], bins=20) -plt.xlabel('Chunk Content Length') -plt.ylabel('Frequency') -plt.title('Distribution of Chunk Lengths') -plt.show() - -# Visualize chunks by page -if 'page' in df.columns: - page_counts = df['page'].value_counts().sort_index() - plt.figure(figsize=(12, 6)) - page_counts.plot(kind='bar') - plt.xlabel('Page Number') - plt.ylabel('Number of Chunks') - plt.title('Chunks per Page') - plt.show() -``` - -## Interactive Metadata Filtering - -This example demonstrates interactive filtering of document chunks by metadata: - -```python -# Get all chunks with metadata -chunks = client.chunks.list(document_id=document_id) -df = pd.DataFrame([{**c.get('metadata', {}), 'id': c['id'], 'content': c['content']} for c in chunks]) - -# Create dropdown for metadata fields -available_fields = [col for col in df.columns if col not in ['id', 'content']] -field_dropdown = widgets.Dropdown( - options=available_fields, - description='Filter by:', - disabled=False, -) - -# Create text input for filter value -filter_text = widgets.Text( - value='', - placeholder='Filter value', - description='Value:', - disabled=False -) - -filter_output = widgets.Output() - -def on_filter_clicked(b): - with filter_output: - filter_output.clear_output() - field = field_dropdown.value - value = filter_text.value - - if not field or not value: - display(Markdown("Please select a field and enter a filter value")) - return - - # Filter the DataFrame - filtered_df = df[df[field].astype(str).str.contains(value, case=False)] - display(Markdown(f"## Found {len(filtered_df)} chunks matching '{value}' in '{field}'")) - - # Display the first 5 results - for i, (_, row) in enumerate(filtered_df.head(5).iterrows()): - display(Markdown(f"### Result {i+1}")) - display(Markdown(f"**ID:** {row['id']}")) - display(Markdown(f"**{field}:** {row[field]}")) - display(Markdown(f"**Content:**\n```\n{row['content'][:300]}...\n```")) - print("-" * 80) - -filter_button = widgets.Button(description="Apply Filter") -filter_button.on_click(on_filter_clicked) - -# Display the filter interface -display(widgets.HBox([field_dropdown, filter_text, filter_button])) -display(filter_output) -``` - -## Exporting Results - -This example shows how to export search results to different formats: - -```python -# Function to search and export results -def search_and_export(query, export_format='csv'): - results = client.retrieval.retrieve(query=query, top_k=10) - - # Convert to DataFrame - results_df = pd.DataFrame([ - { - 'chunk_id': r['id'], - 'content': r['content'], - 'score': r['score'], - **r.get('metadata', {}) - } for r in results - ]) - - # Export based on format - if export_format == 'csv': - results_df.to_csv('search_results.csv', index=False) - return 'Exported to search_results.csv' - elif export_format == 'excel': - results_df.to_excel('search_results.xlsx', index=False) - return 'Exported to search_results.xlsx' - elif export_format == 'json': - results_df.to_json('search_results.json', orient='records') - return 'Exported to search_results.json' - else: - return results_df - -# Create export interface -export_query = widgets.Text( - value='', - placeholder='Enter your query', - description='Query:', - disabled=False, - layout=widgets.Layout(width='70%') -) - -format_dropdown = widgets.Dropdown( - options=['csv', 'excel', 'json', 'dataframe'], - value='csv', - description='Format:', - disabled=False, -) - -export_output = widgets.Output() - -def on_export_clicked(b): - with export_output: - export_output.clear_output() - print(f"Searching for '{export_query.value}' and exporting as {format_dropdown.value}...") - result = search_and_export(export_query.value, format_dropdown.value) - - if isinstance(result, pd.DataFrame): - display(result) - else: - print(result) - -export_button = widgets.Button(description="Search & Export") -export_button.on_click(on_export_clicked) - -# Display the export interface -display(widgets.HBox([export_query, format_dropdown, export_button])) -display(export_output) -``` - -## Best Practices - -When using Simba in Jupyter notebooks: - -- **Cache results** for repeated queries to reduce API calls -- **Use visualizations** to better understand your document structure -- **Create interactive widgets** for easier exploration -- **Export results** for further analysis in other tools -- **Use markdown cells** to document your analysis process \ No newline at end of file diff --git a/docs/examples/react-frontend.mdx b/docs/examples/react-frontend.mdx index 2450205..ac551e2 100644 --- a/docs/examples/react-frontend.mdx +++ b/docs/examples/react-frontend.mdx @@ -2,796 +2,3 @@ title: 'React Frontend Integration' description: 'Building a document search interface with Simba and React' --- - -# Building a Document Search Interface with React - -This example demonstrates how to create a React-based web interface that connects to Simba for document search and retrieval. - -## Prerequisites - -- Simba running on your server -- Node.js and npm installed -- Basic knowledge of React - -## Project Setup - -1. Create a new React app: - -```bash -npx create-react-app simba-search-interface -cd simba-search-interface -``` - -2. Install necessary dependencies: - -```bash -npm install axios react-router-dom @chakra-ui/react @emotion/react @emotion/styled framer-motion react-icons -``` - -## Application Structure - -Let's create a simple document search interface with the following features: -- Document search with relevance scores -- Document viewing and uploading -- Result filtering by metadata - -## API Service - -First, create a service to interact with the Simba API: - -```jsx -// src/services/simbaService.js -import axios from 'axios'; - -const API_URL = process.env.REACT_APP_SIMBA_API_URL || 'http://localhost:8000'; - -const simbaApi = axios.create({ - baseURL: API_URL, - headers: { - 'Content-Type': 'application/json', - }, -}); - -// Add auth token if available -simbaApi.interceptors.request.use((config) => { - const token = localStorage.getItem('simba_token'); - if (token) { - config.headers.Authorization = `Bearer ${token}`; - } - return config; -}); - -// Search documents -export const searchDocuments = async (query, topK = 5, filters = {}) => { - try { - const response = await simbaApi.post('/api/v1/retrieval/search', { - query, - top_k: topK, - filters - }); - return response.data; - } catch (error) { - console.error('Error searching documents:', error); - throw error; - } -}; - -// Get all documents -export const getDocuments = async () => { - try { - const response = await simbaApi.get('/api/v1/documents'); - return response.data; - } catch (error) { - console.error('Error fetching documents:', error); - throw error; - } -}; - -// Upload a document -export const uploadDocument = async (file, metadata = {}) => { - try { - const formData = new FormData(); - formData.append('file', file); - - if (Object.keys(metadata).length > 0) { - formData.append('metadata', JSON.stringify(metadata)); - } - - const response = await simbaApi.post('/api/v1/documents', formData, { - headers: { - 'Content-Type': 'multipart/form-data', - }, - }); - - return response.data; - } catch (error) { - console.error('Error uploading document:', error); - throw error; - } -}; - -// Get document chunks -export const getDocumentChunks = async (documentId) => { - try { - const response = await simbaApi.get(`/api/v1/chunks?document_id=${documentId}`); - return response.data; - } catch (error) { - console.error('Error fetching chunks:', error); - throw error; - } -}; -``` - -## Search Component - -Create a component for searching documents: - -```jsx -// src/components/Search.jsx -import React, { useState } from 'react'; -import { - Box, - Input, - Button, - VStack, - Text, - Heading, - Container, - SimpleGrid, - Spinner, - Badge, - Select, - HStack, - useToast, -} from '@chakra-ui/react'; -import { searchDocuments } from '../services/simbaService'; - -function Search() { - const [query, setQuery] = useState(''); - const [results, setResults] = useState([]); - const [isLoading, setIsLoading] = useState(false); - const [topK, setTopK] = useState(5); - const toast = useToast(); - - const handleSearch = async () => { - if (!query.trim()) { - toast({ - title: 'Query is empty', - description: 'Please enter a search query', - status: 'warning', - duration: 3000, - isClosable: true, - }); - return; - } - - setIsLoading(true); - try { - const response = await searchDocuments(query, topK); - setResults(response.data || []); - } catch (error) { - toast({ - title: 'Search error', - description: error.message || 'Failed to search documents', - status: 'error', - duration: 5000, - isClosable: true, - }); - } finally { - setIsLoading(false); - } - }; - - const handleKeyPress = (e) => { - if (e.key === 'Enter') { - handleSearch(); - } - }; - - return ( - - - - Simba Document Search - - - - setQuery(e.target.value)} - onKeyPress={handleKeyPress} - size="lg" - flex="1" - /> - - - - - {isLoading ? ( - - - Searching documents... - - ) : ( - <> - {results.length > 0 ? ( - - - Found {results.length} results - - - {results.map((result, index) => ( - - - - Score: {result.score.toFixed(4)} - - {result.metadata && result.metadata.source && ( - - {result.metadata.source} - - )} - {result.metadata && result.metadata.page && ( - - Page {result.metadata.page} - - )} - - - Result {index + 1} - - {result.content} - - ))} - - - ) : query !== '' && ( - - No results found. Try a different query. - - )} - - )} - - - ); -} - -export default Search; -``` - -## Document Upload Component - -Create a component for uploading documents: - -```jsx -// src/components/DocumentUpload.jsx -import React, { useState } from 'react'; -import { - Box, - Button, - FormControl, - FormLabel, - Input, - VStack, - Container, - Heading, - Text, - useToast, - Progress, - Textarea, - HStack, - IconButton, -} from '@chakra-ui/react'; -import { AddIcon, CloseIcon } from '@chakra-ui/icons'; -import { uploadDocument } from '../services/simbaService'; - -function DocumentUpload() { - const [file, setFile] = useState(null); - const [isUploading, setIsUploading] = useState(false); - const [uploadProgress, setUploadProgress] = useState(0); - const [metadata, setMetadata] = useState([{ key: '', value: '' }]); - const toast = useToast(); - - const handleFileChange = (e) => { - if (e.target.files.length > 0) { - setFile(e.target.files[0]); - } - }; - - const handleMetadataChange = (index, field, value) => { - const newMetadata = [...metadata]; - newMetadata[index][field] = value; - setMetadata(newMetadata); - }; - - const addMetadataField = () => { - setMetadata([...metadata, { key: '', value: '' }]); - }; - - const removeMetadataField = (index) => { - const newMetadata = [...metadata]; - newMetadata.splice(index, 1); - setMetadata(newMetadata); - }; - - const handleUpload = async () => { - if (!file) { - toast({ - title: 'No file selected', - description: 'Please select a file to upload', - status: 'warning', - duration: 3000, - isClosable: true, - }); - return; - } - - setIsUploading(true); - setUploadProgress(0); - - // Simulate progress (in a real app, you might get this from the upload) - const progressInterval = setInterval(() => { - setUploadProgress((prev) => { - if (prev >= 90) { - clearInterval(progressInterval); - return 90; - } - return prev + 10; - }); - }, 500); - - try { - // Convert metadata array to object - const metadataObj = {}; - metadata.forEach((item) => { - if (item.key && item.value) { - metadataObj[item.key] = item.value; - } - }); - - const response = await uploadDocument(file, metadataObj); - - clearInterval(progressInterval); - setUploadProgress(100); - - toast({ - title: 'Upload Successful', - description: `Document ${file.name} has been uploaded successfully!`, - status: 'success', - duration: 5000, - isClosable: true, - }); - - // Reset form - setFile(null); - setMetadata([{ key: '', value: '' }]); - - // Reset file input - document.getElementById('file-upload').value = ''; - - } catch (error) { - clearInterval(progressInterval); - toast({ - title: 'Upload Failed', - description: error.message || 'Failed to upload document', - status: 'error', - duration: 5000, - isClosable: true, - }); - } finally { - setIsUploading(false); - } - }; - - return ( - - - - Upload Document - - - - - - Select Document - - {file && ( - - Selected: {file.name} ({(file.size / 1024).toFixed(2)} KB) - - )} - - - - Metadata (Optional) - - - Add key-value pairs to help organize and search your document. - - - {metadata.map((item, index) => ( - - - - handleMetadataChange(index, 'key', e.target.value) - } - disabled={isUploading} - /> - - - - handleMetadataChange(index, 'value', e.target.value) - } - disabled={isUploading} - /> - - {metadata.length > 1 && ( - } - onClick={() => removeMetadataField(index)} - aria-label="Remove field" - size="sm" - disabled={isUploading} - /> - )} - - ))} - - - - {isUploading && ( - - Uploading: {uploadProgress}% - - - )} - - - - - - - ); -} - -export default DocumentUpload; -``` - -## Document List Component - -Create a component to list and manage documents: - -```jsx -// src/components/DocumentList.jsx -import React, { useState, useEffect } from 'react'; -import { - Box, - Container, - Heading, - Table, - Thead, - Tbody, - Tr, - Th, - Td, - Button, - Badge, - Spinner, - Text, - useToast, - HStack, - IconButton, -} from '@chakra-ui/react'; -import { DeleteIcon, ViewIcon } from '@chakra-ui/icons'; -import { getDocuments } from '../services/simbaService'; - -function DocumentList() { - const [documents, setDocuments] = useState([]); - const [isLoading, setIsLoading] = useState(true); - const toast = useToast(); - - useEffect(() => { - fetchDocuments(); - }, []); - - const fetchDocuments = async () => { - setIsLoading(true); - try { - const response = await getDocuments(); - setDocuments(response.data || []); - } catch (error) { - toast({ - title: 'Error fetching documents', - description: error.message || 'Failed to load documents', - status: 'error', - duration: 5000, - isClosable: true, - }); - } finally { - setIsLoading(false); - } - }; - - const getStatusColor = (status) => { - switch (status.toLowerCase()) { - case 'completed': - return 'green'; - case 'processing': - return 'blue'; - case 'failed': - return 'red'; - default: - return 'gray'; - } - }; - - return ( - - - - Documents - - - - - {isLoading ? ( - - - Loading documents... - - ) : ( - <> - {documents.length > 0 ? ( - - - - - - - - - - - - - {documents.map((doc) => ( - - - - - - - - ))} - -
FilenameStatusChunksUpload DateActions
{doc.filename} - - {doc.status} - - {doc.chunks_count || 'N/A'} - {new Date(doc.created_at).toLocaleDateString()} - - - } - aria-label="View document" - size="sm" - colorScheme="blue" - /> - } - aria-label="Delete document" - size="sm" - colorScheme="red" - /> - -
-
- ) : ( - - No documents found. Upload some documents to get started. - - )} - - )} -
- ); -} - -export default DocumentList; -``` - -## App Component and Routing - -Set up the main App component with routing: - -```jsx -// src/App.jsx -import React from 'react'; -import { ChakraProvider, Box, Flex } from '@chakra-ui/react'; -import { BrowserRouter as Router, Routes, Route, Link } from 'react-router-dom'; -import Search from './components/Search'; -import DocumentUpload from './components/DocumentUpload'; -import DocumentList from './components/DocumentList'; - -function Navbar() { - return ( - - - - Simba Document Search - - - - - Search - - - - - Documents - - - - - Upload - - - - - - ); -} - -function App() { - return ( - - - - - - - } /> - } /> - } /> - - - - - - ); -} - -export default App; -``` - -## Environment Configuration - -Create a `.env` file in the project root: - -``` -REACT_APP_SIMBA_API_URL=http://localhost:8000 -``` - -## Running the Application - -Start the React development server: - -```bash -npm start -``` - -Your application will be available at http://localhost:3000. - -## Deployment Considerations - -1. Build the production version: - -```bash -npm run build -``` - -2. For Docker deployment, create a Dockerfile: - -```dockerfile -FROM node:16-alpine as build -WORKDIR /app -COPY package*.json ./ -RUN npm install -COPY . . -RUN npm run build - -FROM nginx:alpine -COPY --from=build /app/build /usr/share/nginx/html -COPY nginx.conf /etc/nginx/conf.d/default.conf -EXPOSE 80 -CMD ["nginx", "-g", "daemon off;"] -``` - -3. Create a simple NGINX configuration (nginx.conf) for client-side routing: - -``` -server { - listen 80; - - location / { - root /usr/share/nginx/html; - index index.html index.htm; - try_files $uri $uri/ /index.html; - } - - # Proxy API requests to Simba backend - location /api/ { - proxy_pass http://simba-backend:8000; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - } -} -``` - -## Best Practices - -When building React applications with Simba: - -- **Implement authentication**: Add JWT-based authentication for production use -- **Add error boundaries**: Handle API failures gracefully -- **Implement pagination**: For document lists and search results -- **Consider server-side rendering**: For better SEO and initial load performance -- **Use React Query or similar**: For efficient data fetching and caching -- **Add detailed document viewers**: For better document exploration -- **Implement advanced filtering**: By metadata, date ranges, etc. \ No newline at end of file diff --git a/docs/examples/streamlit-app.mdx b/docs/examples/streamlit-app.mdx index 95c11e8..6bd73a8 100644 --- a/docs/examples/streamlit-app.mdx +++ b/docs/examples/streamlit-app.mdx @@ -6,9 +6,3 @@ description: 'Learn how to create a Streamlit app with Simba' # Streamlit App Example This guide demonstrates how to create a Streamlit app with Simba. - -## Prerequisites - -- Simba installed and running -- Simba SDK installed: `pip install simba-client` -- Documents to upload diff --git a/docs/getting-started.mdx b/docs/getting-started.mdx index 9691786..e783346 100644 --- a/docs/getting-started.mdx +++ b/docs/getting-started.mdx @@ -1,136 +1,97 @@ --- -title: 'Getting Started with Simba' -description: 'Learn how to set up and start using Simba in your project' +title: Overview --- -## Prerequisites +Simba is an open-source, portable Knowledge Management System (KMS) specifically designed to integrate seamlessly with Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive solution for managing, processing, and retrieving knowledge from various document sources to enhance AI applications with contextual information. -Before you begin, make sure you have the following installed: +## Key Features -- **Python 3.11+** -- **Redis 7.0+** -- **Node.js 20+** (for the frontend) -- **Git** -- **Poetry** (for Python dependency management) +* **🔌 Powerful SDK:** Comprehensive Python SDK (`simba-sdk`) for easy integration -## Quick Installation +* **🧩 Modular Architecture:** Flexible integration of vector stores, embedding models, chunkers, and parsers -### Option 1: Install with pip +* **🖥️ Modern UI:** User-friendly interface for managing document chunks and monitoring system performance -The simplest way to get started with Simba is to install the client SDK: +* **🔗 Seamless Integration:** Effortlessly connects with any RAG-based system -```bash -pip install simba-core -``` +* **👨‍💻 Developer-Centric:** Simplifies complex knowledge management tasks -### Option 2: Clone the Repository +* **📦 Open Source & Extensible:** Community-driven with extensive customization options -For a complete installation, including the backend and frontend: +## System Architecture Overview -```bash -git clone https://github.com/GitHamza0206/simba.git -cd simba -poetry install -``` +Simba employs a modular architecture with these key components: -## Running Simba +* **Document Processors:** Parse and extract content from various document formats -### Backend Service +* **Chunkers:** Divide documents into semantically meaningful segments -To start the Simba backend service: +* **Embedding Models:** Convert text into vector representations -```bash -# In a new terminal, start Simba -simba server -``` +* **Vector Stores:** Index and store embeddings for efficient retrieval -By default, the backend will be available at http://localhost:8000. +* **Retrieval Engine:** Find relevant information using various retrieval strategies -### Frontend (Optional) +* **API Layer:** Expose functionality through a RESTful interface -If you want to use the UI: +* **SDK:** Provide programmatic access to all functionality -```bash -cd frontend -npm install -npm run dev -``` -The frontend will be available at http://localhost:5173. +## Demo -### Running Parsing Algorithms +![Watch the demo](/assets/demo.gif) -Simba leverages advanced parsing algorithms, such as Docling, to transform unstructured documents into structured formats suitable for efficient retrieval and analysis. To run the parsing process, execute: +## Who is Simba for? -```bash -simba parse -``` +Simba is ideal for: +* **AI Engineers:** Building RAG applications that require contextual knowledge -## Basic Usage +* **Developers:** Creating context-aware applications with minimal boilerplate -### Connecting to Simba +* **Organizations:** Seeking to leverage their internal knowledge for AI applications -Here's a simple example of connecting to Simba using the SDK: +## Deployment Options -```python -from simba_sdk import SimbaClient +Simba offers two primary deployment models to suit different organizational needs: -# Initialize the client -client = SimbaClient(api_url="http://localhost:8000") + + + Get started using Simba through our Cloud -# Check connection -status = client.health() -print(f"Simba status: {status}") -``` + offering, free of charge. + Perfect for fast serverless deployment. + -### Adding a Document + + Host your own full-featured Simba system. Ideal for on premise use cases. + Complete control over your data & infra. + + -To add a document to your knowledge base: +### Cloud-Hosted Solution -```python -# Upload a file -document = client.documents.create(file_path="path/to/your/document.pdf") -document_id = document[0]["id"] -print(f"Document uploaded with ID: {document_id}") +The Simba Cloud offering provides a fully-managed service where you can: -# List all documents -all_docs = client.documents.list() -print(f"Total documents: {len(all_docs)}") -``` +* Start using Simba immediately without infrastructure setup -### Retrieving Knowledge +* Access your knowledge base from anywhere -To retrieve information from your knowledge base: +* Scale resources automatically based on your needs -```python -# Basic retrieval -results = client.retrieval.retrieve( - query="What is Simba?", - top_k=3 # Number of chunks to retrieve -) +* Benefit from automatic updates and maintenance -# Display results -for chunk in results: - print(f"Score: {chunk['score']}") - print(f"Content: {chunk['content']}") - print("---") -``` +### Self-Hosted Solution -## Next Steps +For organizations requiring complete control over their infrastructure and data: -Now that you have Simba up and running, here are some next steps: +* Deploy Simba in your own environment (on-premises or cloud VPC) -- Learn more about [configuring Simba](/configuration) -- Explore [vector stores](/core-concepts/vector-stores) for optimized retrieval -- Understand [embeddings](/core-concepts/embeddings) and how they work -- Customize the [chunking process](/core-concepts/chunking) for your specific needs -- Check out our [examples](/examples/document-ingestion) for more advanced usage +* Maintain full control over your sensitive data -## Troubleshooting +* Customize and extend functionality as needed -If you encounter any issues during setup: +* Integrate with existing internal systems -- Ensure Redis is running and accessible -- Check that all prerequisites are installed -- Verify port availability for both backend (8000) and frontend (5173) -- See our [community support](/community/support) for more help \ No newline at end of file +Both deployment options provide access to the same powerful Simba SDK, allowing you to programmatically interact with your knowledge base. + +Let's get started and explore how Simba can empower your RAG projects! \ No newline at end of file diff --git a/docs/installation.mdx b/docs/installation.mdx deleted file mode 100644 index dbff8c2..0000000 --- a/docs/installation.mdx +++ /dev/null @@ -1,191 +0,0 @@ ---- -title: 'Installation' -description: 'Detailed installation instructions for Simba' ---- - -# Installing Simba - -This guide provides detailed instructions for installing Simba in various environments. Choose the method that works best for your needs. - -## Installation Methods - - - - ### Python Package Installation - - If you only need to use the Simba client in your existing projects, you can install it via pip: - - ```bash - pip install simba-client - ``` - - This will install the Simba SDK, allowing you to connect to a running Simba instance. - - To verify your installation: - - ```python - from simba_sdk import SimbaClient - - # This should print the installed version - print(SimbaClient.__version__) - ``` - - - ### Clone the Repository - - For a complete installation including the backend and frontend: - - ```bash - git clone https://github.com/GitHamza0206/simba.git - cd simba - ``` - - ### Backend Installation - - Simba uses Poetry for dependency management: - - ```bash - # Install Poetry if not already installed - curl -sSL https://install.python-poetry.org | python3 - - - # Install dependencies - poetry install - ``` - - ### Frontend Installation - - ```bash - cd frontend - npm install - ``` - - This will set up both the backend and frontend components of Simba. - - - ### Using Docker Compose - - Simba provides a Docker Compose setup for easy deployment: - - ```bash - # Clone the repository - git clone https://github.com/GitHamza0206/simba.git - cd simba - - # Start services with Docker Compose - docker-compose up -d - ``` - - This will start: - - The Simba backend API - - Redis for caching and task queue - - The Simba frontend UI - - All services will be properly configured to work together. - - ### Using Individual Containers - - You can also run individual components: - - ```bash - # Run just the backend - docker run -p 8000:8000 -e REDIS_URL=redis://redis:6379 simba/backend - - # Run just the frontend - docker run -p 5173:5173 -e API_URL=http://localhost:8000 simba/frontend - ``` - - - -## System Requirements - -### Minimum Requirements - -- **CPU**: 2 cores -- **RAM**: 4 GB -- **Disk Space**: 1 GB -- **Python**: 3.11+ -- **Redis**: 7.0+ -- **Node.js** (for frontend): 20+ - -### Recommended Requirements - -- **CPU**: 4+ cores -- **RAM**: 8+ GB -- **Disk Space**: 10+ GB (depending on your document volume) -- **Python**: 3.11+ -- **Redis**: 7.0+ -- **Node.js** (for frontend): 20+ - -## Dependencies - -Simba has the following key dependencies: - - - - - **FastAPI**: Web framework for the backend API - - **Redis**: For caching and task queues - - **SQLAlchemy**: ORM for database interactions - - **Celery**: Distributed task queue for background processing - - **Pydantic**: Data validation and settings management - - - - **FAISS**: Facebook AI Similarity Search for efficient vector storage - - **Chroma**: ChromaDB integration for document embeddings - - **Pinecone** (optional): For cloud-based vector storage - - **Milvus** (optional): For distributed vector search - - - - **Sentence Transformers**: For text embeddings - - **PyTorch** (optional): For custom embedding models - - **HuggingFace Transformers** (optional): For text processing - - - - **React**: UI library - - **TypeScript**: For type-safe JavaScript - - **Vite**: Frontend build tool - - **Tailwind CSS**: Utility-first CSS framework - - - -## Troubleshooting - -### Common Installation Issues - -#### Poetry Installation Fails - -```bash -# Try installing with pip instead -pip install poetry -``` - -#### Redis Connection Issues - -```bash -# Check if Redis is running -redis-cli ping -# Should return PONG -``` - -#### Backend Startup Issues - -```bash -# Check environment variables -cp .env.example .env -# Edit .env with your configuration -``` - -#### Frontend Build Issues - -```bash -# Clear npm cache -npm cache clean --force -npm install -``` - -## Next Steps - -Once you have Simba installed, proceed to: - -1. [Configure your installation](/configuration) -2. [Set up your first document collection](/examples/document-ingestion) -3. [Connect your application to Simba](/sdk/client) \ No newline at end of file diff --git a/docs/introduction.mdx b/docs/introduction.mdx deleted file mode 100644 index 858b25b..0000000 --- a/docs/introduction.mdx +++ /dev/null @@ -1,65 +0,0 @@ ---- -title: 'Simba - Advanced Knowledge Management for RAG Systems' -description: 'The most advanced AI retrieval system for seamless RAG integration' ---- - -Simba Logo - -# Simba - -**The most advanced AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a modular architecture and powerful SDK.** - -Simba is an all-in-one solution for Knowledge Management specifically designed for seamless integration with Retrieval-Augmented Generation (RAG) systems. With its production-ready features including multimodal content ingestion, hybrid search capabilities, and comprehensive user/document management, Simba empowers developers to build sophisticated AI applications with ease. - -## Key Features - - - - Flexible integration of vector stores, embedding models, chunkers, and parsers to adapt to your specific needs. - - - Process various document formats seamlessly through an intuitive ingestion pipeline. - - - Comprehensive Python SDK for effortless integration with your existing applications and workflows. - - - Combine semantic and keyword search techniques for enhanced retrieval accuracy. - - - User-friendly interface for managing document chunks and knowledge sources with ease. - - - Tailor responses and retrieval methods to your specific technical environment and requirements. - - - -## How Simba Works - -Simba streamlines the entire knowledge management lifecycle through its advanced architecture: - -1. **Document Ingestion**: Upload and process various document formats through a unified pipeline -2. **Content Processing**: Automatically parse, chunk, and embed content for optimal retrieval -3. **Vector Storage**: Efficiently store and index knowledge using state-of-the-art vector databases -4. **Intelligent Retrieval**: Powerful API with hybrid search capabilities for precise information access -5. **Multi-Source Integration**: Combine data from multiple sources to ensure reliability and comprehensive responses - -## Security and Compliance - -Simba employs robust validation and control mechanisms to prevent errors and adhere to quality and security standards. This guarantees that the information provided is both relevant and compliant with enterprise requirements, making it suitable for sensitive business applications. - -## Use Cases - -- **Enterprise Knowledge Bases**: Organize and access company documentation with precision -- **AI Chatbots**: Power conversational interfaces with accurate, contextual knowledge -- **Research Platforms**: Manage and retrieve research papers and findings efficiently -- **Customer Support**: Provide accurate, verifiable information from knowledge bases -- **Technical Documentation**: Create searchable, interconnected documentation for complex systems - -## Flexibility and Scalability - -Simba's modular design allows for seamless integration of new functionalities or extensions as your needs evolve, offering a solid foundation for continuous improvements and rapid adaptation to emerging challenges. - -## Get Started - -Ready to supercharge your AI applications with Simba? Check out our [Getting Started](/getting-started) guide to begin your journey! \ No newline at end of file diff --git a/docs/mint.json b/docs/mint.json index 3643096..e36f9c6 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -33,17 +33,22 @@ }, { "name": "Examples", - "icon": "list-magnifying-glass", + "icon": "web-awesome", "url": "examples" } ], "navigation": [ { - "group": "Introduction", + "group": "Getting Started", "pages": [ - "introduction", - "getting-started", - "installation", + "overview", + { + "group": "Quickstart", + "pages": [ + "quickstart/cloud", + "quickstart/self-hosted" + ] + }, "configuration" ] }, @@ -74,4 +79,4 @@ "twitter": "https://twitter.com/zerou_hamza", "github": "https://github.com/GitHamza0206/simba" } -} \ No newline at end of file +} diff --git a/docs/overview.mdx b/docs/overview.mdx new file mode 100644 index 0000000..093d02e --- /dev/null +++ b/docs/overview.mdx @@ -0,0 +1,73 @@ +--- +title: Overview +--- + +Simba is an open-source, portable Knowledge Management System (KMS) specifically designed to integrate seamlessly with Retrieval-Augmented Generation (RAG) systems. It provides a comprehensive solution for managing, processing, and retrieving knowledge from various document sources to enhance AI applications with contextual information. + +## Key Features + +* **🔌 SDK:** Comprehensive Python SDK (`simba-sdk`) for easy integration + +* **🧩 Modular Architecture:** Flexible integration of `vector stores`, `embedding models`, `chunkers`, and `parsers` + +* **🖥️ Modern UI:** User-friendly interface for managing document chunks and monitoring system performance + +* **🔗 Seamless Integration:** Effortlessly connects with any RAG-based system + +* **👨‍💻 Developer-Centric:** Simplifies complex knowledge management tasks + +* **📦 Open Source & Extensible:** Community-driven with extensive customization options + +## System Architecture Overview + +Simba employs a modular architecture with these key components: + +* **Document Parsers:** Parse and extract content from various document formats + +* **Chunkers:** Divide documents into semantically meaningful segments + +* **Embedding Models:** Convert text into vector representations + +* **Vector Stores:** Index and store embeddings for efficient retrieval + +* **Retrieval Engine:** Find relevant information using various retrieval strategies + +* **API Layer:** Expose functionality through a RESTful interface + +* **SDK:** Provide programmatic access to all functionality + +## Demo + +![Watch the demo](/assets/demo.gif) + +## Who is Simba for? + +Simba is ideal for: + +* **AI Engineers:** Building RAG applications that require contextual knowledge + +* **Developers:** Creating context-aware applications with minimal boilerplate + +* **Organizations:** Seeking to leverage their internal knowledge for AI applications + +## Deployment Options + +Simba offers two primary deployment models to suit different organizational needs: + + + + Get started using Simba through our Cloud offering, free of charge. + Perfect for fast serverless deployment. + + + + Host your own full-featured Simba system. Ideal for on premise use cases. + Complete control over your data & infra. + + + +Both deployment options provide access to the same Simba SDK, allowing you to programmatically interact with your knowledge base. + +Choose the system that best aligns with your requirements and proceed with the documentation. + +Let's get started and explore how Simba can empower your RAG projects! \ No newline at end of file diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx new file mode 100644 index 0000000..43cfc52 --- /dev/null +++ b/docs/quickstart.mdx @@ -0,0 +1,242 @@ +--- +title: "Quickstart" +description: "This guide provides detailed instructions for installing Simba in various environments. Choose the method that works best for your needs." +--- + +## Installation Methods + + + + ### Python Package Installation + + If you only need to use the Simba client in your existing projects, you can install it via pip: + + ```bash + pip install simba-client + ``` + + This will install the Simba SDK, allowing you to connect to a running Simba instance. + + To verify your installation: + + ```python + from simba_sdk import SimbaClient + + # This should print the installed version + print(SimbaClient.__version__) + ``` + + #### Example Usage + + ```python + from simba_sdk import SimbaClient + + client = SimbaClient(api_url="http://simba.cloud.api:8000") + document = client.documents.create(file_path="path/to/your/document.pdf") + document_id = document[0]["id"] + + parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True) + + retrieval_results = client.retriever.retrieve(query="your-query") + + for result in retrieval_results["documents"]: + print(f"Content: {result['page_content']}") + print(f"Metadata: {result['metadata']['source']}") + print("====" * 10) + ``` + + + + ### Clone the Repository + + For a complete installation including the backend and frontend: + + ```bash + git clone https://github.com/GitHamza0206/simba.git + cd simba + ``` + + ### Backend Installation + + Simba uses Poetry for dependency management: + + ```bash + # Install Poetry if not already installed + curl -sSL https://install.python-poetry.org | python3 - + ``` + + ```bash + # Install dependencies + poetry config virtualenvs.in-project true + poetry install + source .venv/bin/activate + ``` + + This will set up both the backend and frontend components of Simba. + + To run the backend server: + + ```bash + simba server + ``` + + To run the frontend server: + + ```bash + simba front + ``` + + to run parsers : + + ```bash + simba parsers + ``` + + + + ### Using Makefile + + Simba provides a Makefile for easy deployment: + + ```bash + # Clone the repository + git clone https://github.com/GitHamza0206/simba.git + cd simba + ``` + + For CPU: + + ```bash + # Build the Docker image + DEVICE=cpu make build + # Start the Docker container + DEVICE=cpu make up + ``` + + For NVIDIA GPU: + + ```bash + # Build the Docker image + DEVICE=cuda make build + # Start the Docker container + DEVICE=cuda make up + ``` + + For Apple Silicon: + + ```bash + # Build the Docker image + DEVICE=cpu make build + # Start the Docker container + DEVICE=cpu make up + ``` + + This will start: + + * The Simba backend API + + * Redis for caching and task queue + + * The Simba frontend UI + + All services will be properly configured to work together. + + To stop the services: + + ```bash + make down + ``` + + You can find more information about Docker setup here: [Docker Setup](/docs/docker-setup) + + + +## System Requirements + +### Minimum Requirements + +* **CPU**: 2 cores + +* **RAM**: 4 GB + +* **Disk Space**: 1 GB + +* **Python**: 3.11+ + +* **Redis**: 7.0+ + +* **Node.js** (for frontend): 20+ + +### Recommended Requirements + +* **CPU**: 4+ cores + +* **RAM**: 8+ GB + +* **Disk Space**: 10+ GB (depending on your document volume) + +* **Python**: 3.11+ + +* **Redis**: 7.0+ + +* **Node.js** (for frontend): 20+ + +## Dependencies + +Simba has the following key dependencies: + + + + * **FastAPI**: Web framework for the backend API + + * **Ollama**: For running the LLM inference (optional) + + * **Redis**: For caching and task queues + + * **PostgreSQL**: For database interactions + + * **Celery**: Distributed task queue for background processing + + * **Pydantic**: Data validation and settings management + + + + * **FAISS**: Facebook AI Similarity Search for efficient vector storage + + * **Chroma**: ChromaDB integration for document embeddings + + * **Pinecone** (optional): For cloud-based vector storage + + * **Milvus** (optional): For distributed vector search + + + + * **OpenAI**: For text embeddings + + * **HuggingFace Transformers** (optional): For text processing + + + + * **React**: UI library + + * **TypeScript**: For type-safe JavaScript + + * **Vite**: Frontend build tool + + * **Tailwind CSS**: Utility-first CSS framework + + + +# Troubleshooting + +to be added... + +## Next Steps + +Once you have Simba installed, proceed to: + +1. [Configure your installation](/docs/configuration) + +2. [Set up your first document collection](/docs/examples/document-ingestion) + +3. [Connect your application to Simba](/docs/sdk/client) \ No newline at end of file diff --git a/docs/quickstart/cloud.mdx b/docs/quickstart/cloud.mdx new file mode 100644 index 0000000..f803d82 --- /dev/null +++ b/docs/quickstart/cloud.mdx @@ -0,0 +1,65 @@ +--- +title: "Cloud" +description: "Getting started with Simba cloud using the SDK " +--- + +*** + + + This page is under construction and will be avaialble soon + + DISCLAIMER, the bellow doc is not working + + + + + Create an account with [Simba cloud ](https://github.com/GitHamza0206/simba). It's free! + + + If you want to deploy locally, please refer here + + + + + Only Python is available with the SDK, you can install it via pip + + ```python Python + pip install simba-client + ``` + + + + After signing into Simba cloud, click `create new API KEY ` + + + Adjust the caption and image of your Frame component here. + + + make sure to create a `.env` file + + ```python + SIMBA_API_KEY="sb-..." + ``` + + + + ```python + from simba_sdk import SimbaClient + + client = SimbaClient(api_url="http://simba.cloud.api:8000") + document = client.documents.create(file_path="path/to/your/document.pdf") + document_id = document[0]["id"] + + parsing_result = client.parser.parse_document(document_id,parser="docling", sync=True) + + retrieval_results = client.retriever.retrieve(query="your-query") + + for result in retrieval_results["documents"]: + print(f"Content: {result['page_content']}") + print(f"Metadata: {result['metadata']['source']}") + print("====" * 10) + ``` + + + +## \ No newline at end of file diff --git a/docs/quickstart/self-hosted.mdx b/docs/quickstart/self-hosted.mdx new file mode 100644 index 0000000..32b6a52 --- /dev/null +++ b/docs/quickstart/self-hosted.mdx @@ -0,0 +1,593 @@ +--- +title: "Self-Hosted" +description: "Getting started with Simba installed on your local system " +--- + +*** + +This guide will walk you through installing and running simba on your local system using both pip, git or docker + +you can choose the method that suits you best, if you want to use the SDK for free, we recommand using the pip installation method, if you want to have more control over the source code we recommand installing the full system. If you want to use the prebuilt solution, we recommand docker. + +## Installation Methods + + + + + + `simba-core` is the PyPi package that contains the server logic and API, it is necessary to run it to be able to use the SDK + + ```python + pip install simba-core + ``` + + + To install the dependencies faster we recommand using `uv` + + ```python + pip install uv + uv pip install simba-core + ``` + + + + + The config.yaml file is one of the most important files of this setup, because it's what will parameter the Embedding model, vector store type, retreival strategy , database, worker celery for parsing and also the llm you're using + + Go to your project root and create config.yaml, you can get inspired from this one below + + ```yaml + project: + name: "Simba" + version: "1.0.0" + api_version: "/api/v1" + + paths: + base_dir: null # Will be set programmatically + faiss_index_dir: "vector_stores/faiss_index" + vector_store_dir: "vector_stores" + + llm: + provider: "openai" #OPTIONS:ollama,openai + model_name: "gpt-4o-mini" + temperature: 0.0 + max_tokens: null + streaming: true + additional_params: {} + + embedding: + provider: "huggingface" + model_name: "BAAI/bge-base-en-v1.5" + device: "cpu" # OPTIONS: cpu,cuda,mps + additional_params: {} + + vector_store: + provider: "faiss" + collection_name: "simba_collection" + + additional_params: {} + + chunking: + chunk_size: 512 + chunk_overlap: 200 + + retrieval: + method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked + k: 5 + # Method-specific parameters + params: + # Semantic retrieval parameters + score_threshold: 0.5 + + # Hybrid retrieval parameters + prioritize_semantic: true + + # Ensemble retrieval parameters + weights: [0.7, 0.3] # Weights for semantic and keyword retrievers + + # Reranking parameters + reranker_model: colbert + reranker_threshold: 0.7 + + # Database configuration + database: + provider: litedb # Options: litedb, sqlite + additional_params: {} + + celery: + broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0} + result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1} + ``` + + + The config file should be at the same place where your running simba, otherwise that's not going to work + + + + + If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env + + ``` + OPENAI_API_KEY=your_openai_api_key #(optional) + MISTRAL_API_KEY=your_mistral_api_key #(optional) + LANGCHAIN_TRACING_V2=true #(optional) + LANGCHAIN_API_KEY=your_langchain_api_key (#optional) + REDIS_HOST=localhost + CELERY_BROKER_URL=redis://localhost:6379/0 + CELERY_RESULT_BACKEND=redis://localhost:6379/1 + ``` + + + + Now that you have your .env, and config.yaml, you can run the following command + + ``` + simba server + ``` + + This will start the server at http://localhost:8000. You will see a logging message in the console + + ``` + Starting Simba server... + INFO: Started server process [62940] + INFO: Waiting for application startup. + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + 2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + 2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba + 2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0 + 2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai + 2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5 + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps + 2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss + 2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb + 2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid + 2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5 + 2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba + 2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads + 2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + INFO: Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) + ``` + + + + You can now install the SDK to start using simba SDK in local mode + + ``` + pip install simba-client + ``` + + + + ```python + from simba_sdk import SimbaClient + + client = SimbaClient(api_url="http://localhost:8000") + document = client.documents.create(file_path="path/to/your/document.pdf") + document_id = document[0]["id"] + + parsing_result = client.parser.parse_document(document_id,parser="docling", sync=True) + + retrieval_results = client.retriever.retrieve(query="your-query") + + for result in retrieval_results["documents"]: + print(f"Content: {result['page_content']}") + print(f"Metadata: {result['metadata']['source']}") + print("====" * 10) + ``` + + + + + + + + For a complete installation including the backend and frontend + + ``` + git clone https://github.com/GitHamza0206/simba.git + cd simba + ``` + + + + Simba uses Poetry for dependency management + + + ```bash MacOS/Linux + curl -sSL https://install.python-poetry.org | python3 - + ``` + + ``` + pip install poetry + ``` + + + then install the virtual environement and activate it + + + + The config.yaml file is one of the most important files of this setup, because it's what will parameter the Embedding model, vector store type, retreival strategy , database, worker celery for parsing and also the llm you're using + + Go to your project root and create config.yaml, you can get inspired from this one below + + ```yaml + project: + name: "Simba" + version: "1.0.0" + api_version: "/api/v1" + + paths: + base_dir: null # Will be set programmatically + faiss_index_dir: "vector_stores/faiss_index" + vector_store_dir: "vector_stores" + + llm: + provider: "openai" #OPTIONS:ollama,openai + model_name: "gpt-4o-mini" + temperature: 0.0 + max_tokens: null + streaming: true + additional_params: {} + + embedding: + provider: "huggingface" + model_name: "BAAI/bge-base-en-v1.5" + device: "cpu" # OPTIONS: cpu,cuda,mps + additional_params: {} + + vector_store: + provider: "faiss" + collection_name: "simba_collection" + + additional_params: {} + + chunking: + chunk_size: 512 + chunk_overlap: 200 + + retrieval: + method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked + k: 5 + # Method-specific parameters + params: + # Semantic retrieval parameters + score_threshold: 0.5 + + # Hybrid retrieval parameters + prioritize_semantic: true + + # Ensemble retrieval parameters + weights: [0.7, 0.3] # Weights for semantic and keyword retrievers + + # Reranking parameters + reranker_model: colbert + reranker_threshold: 0.7 + + # Database configuration + database: + provider: litedb # Options: litedb, sqlite + additional_params: {} + + celery: + broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0} + result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1} + ``` + + + + If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env + + ``` + OPENAI_API_KEY=your_openai_api_key #(optional) + MISTRAL_API_KEY=your_mistral_api_key #(optional) + LANGCHAIN_TRACING_V2=true #(optional) + LANGCHAIN_API_KEY=your_langchain_api_key (#optional) + REDIS_HOST=localhost + CELERY_BROKER_URL=redis://localhost:6379/0 + CELERY_RESULT_BACKEND=redis://localhost:6379/1 + ``` + + + + ``` + simba server + ``` + + This will start the server at http://localhost:8000. You will see a logging message in the console + + ``` + Starting Simba server... + INFO: Started server process [62940] + INFO: Waiting for application startup. + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + 2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + 2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba + 2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0 + 2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai + 2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5 + 2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps + 2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss + 2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb + 2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid + 2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5 + 2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba + 2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads + 2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores + 2025-03-12 16:42:50 - simba.__main__ - INFO - ================================================== + INFO: Application startup complete. + INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) + ``` + + + + you can run the frontend by running + + ``` + simba front + ``` + + or navigate to `/frontend` and run + + ``` + cd /frontend + npm install + npm run dev + ``` + + then you should see your local instance at http://localhost:5173 + + + + If you want to enable document parsers, you should start the `celery worker` instance, this is necessary if you want to run docling parser. Celery requires redis , to start redis you have to open a terminal and run + + ``` + redis-server + ``` + + Once redis is running you can open a new terminal and run + + ``` + simba parsers + ``` + + + + + + ### Docker setup + + we use makefile to build simba, this is easiest setup + + + + ``` + git clone https://github.com/GitHamza0206/simba.git + cd simba + ``` + + + + The config.yaml file is one of the most important files of this setup, because it's what will parameter the Embedding model, vector store type, retreival strategy , database, worker celery for parsing and also the llm you're using + + Go to your project root and create config.yaml, you can get inspired from this one below + + ```yaml + project: + name: "Simba" + version: "1.0.0" + api_version: "/api/v1" + + paths: + base_dir: null # Will be set programmatically + faiss_index_dir: "vector_stores/faiss_index" + vector_store_dir: "vector_stores" + + llm: + provider: "openai" #OPTIONS:ollama,openai + model_name: "gpt-4o-mini" + temperature: 0.0 + max_tokens: null + streaming: true + additional_params: {} + + embedding: + provider: "huggingface" + model_name: "BAAI/bge-base-en-v1.5" + device: "cpu" # OPTIONS: cpu,cuda,mps + additional_params: {} + + vector_store: + provider: "faiss" + collection_name: "simba_collection" + + additional_params: {} + + chunking: + chunk_size: 512 + chunk_overlap: 200 + + retrieval: + method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked + k: 5 + # Method-specific parameters + params: + # Semantic retrieval parameters + score_threshold: 0.5 + + # Hybrid retrieval parameters + prioritize_semantic: true + + # Ensemble retrieval parameters + weights: [0.7, 0.3] # Weights for semantic and keyword retrievers + + # Reranking parameters + reranker_model: colbert + reranker_threshold: 0.7 + + # Database configuration + database: + provider: litedb # Options: litedb, sqlite + additional_params: {} + + celery: + broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0} + result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1} + ``` + + + + If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env + + ``` + OPENAI_API_KEY=your_openai_api_key #(optional) + MISTRAL_API_KEY=your_mistral_api_key #(optional) + LANGCHAIN_TRACING_V2=true #(optional) + LANGCHAIN_API_KEY=your_langchain_api_key (#optional) + REDIS_HOST=localhost + CELERY_BROKER_URL=redis://localhost:6379/0 + CELERY_RESULT_BACKEND=redis://localhost:6379/1 + ``` + + + + + ```bash cpu + # Build the Docker image + DEVICE=cpu make build + # Start the Docker container + DEVICE=cpu make up + ``` + + ```bash cuda (Nvidia) + # Build the Docker image + DEVICE=cuda make build + # Start the Docker container + DEVICE=cuda make up + ``` + + ```bash mps (Apple Silicon) + # Build the Docker image + DEVICE=cpu make build + # Start the Docker container + DEVICE=cpu make up + ``` + + + + + + ```bash cpu + # Build the Docker image + ENABLE_OLLAMA=True DEVICE=cpu make build + # Start the Docker container + ENABLE_OLLAMA=True DEVICE=cpu make up + ``` + + ```bash cuda (Nvidia) + # Build the Docker image + ENABLE_OLLAMA=True DEVICE=cuda make build + # Start the Docker container + ENABLE_OLLAMA=True DEVICE=cuda make up + ``` + + ```bash mps (Apple Silicon) + # Build the Docker image + ENABLE_OLLAMA=True DEVICE=cpu make build + # Start the Docker container + ENABLE_OLLAMA=True DEVICE=cpu make up + ``` + + + + + ### + + This will start: + + * The Simba backend API + + * Redis for caching and task queue + + * Celery workers for parsing tasks + + * The Simba frontend UI + + All services will be properly configured to work together. + + To stop the services: + + ```bash + make down + ``` + + You can find more information about Docker setup here: [Docker Setup](/docs/docker-setup) + + + +## Dependencies + +Simba has the following key dependencies: + + + + * **FastAPI**: Web framework for the backend API + + * **Ollama**: For running the LLM inference (optional) + + * **Redis**: For caching and task queues + + * **PostgreSQL**: For database interactions + + * **Celery**: Distributed task queue for background processing + + * **Pydantic**: Data validation and settings management + + + + * **FAISS**: Facebook AI Similarity Search for efficient vector storage + + * **Chroma**: ChromaDB integration for document embeddings + + * **Pinecone** (optional): For cloud-based vector storage + + * **Milvus** (optional): For distributed vector search + + + + * **OpenAI**: For text embeddings + + * **HuggingFace Transformers** (optional): For text processing + + + + * **React**: UI library + + * **TypeScript**: For type-safe JavaScript + + * **Vite**: Frontend build tool + + * **Tailwind CSS**: Utility-first CSS framework + + + +## Troubleshooting + +to be added... + +## Next Steps + +Once you have Simba installed, proceed to: + +1. [Configure your installation](/docs/configuration) + +2. [Set up your first document collection](/docs/examples/document-ingestion) + +3. [Connect your application to Simba](/docs/sdk/client) \ No newline at end of file diff --git a/docs/sdk/overview.mdx b/docs/sdk/overview.mdx index a65f414..dee6363 100644 --- a/docs/sdk/overview.mdx +++ b/docs/sdk/overview.mdx @@ -3,92 +3,3 @@ title: 'Simba SDK Overview' description: 'Introduction to the Simba SDK and its capabilities' --- -# Simba SDK Overview - -The Simba SDK is a Python client library that allows developers to easily integrate Simba's knowledge management capabilities into their applications. - -## Installation - -```bash -pip install simba-client -``` - -## Quick Start - -```python -from simba_sdk import SimbaClient - -client = SimbaClient(api_url="http://localhost:8000") # you need to install simba-core and run simba server first - -document = client.documents.create(file_path="path/to/your/document.pdf") -document_id = document[0]["id"] - -parsing_result = client.parser.parse_document(document_id, parser="docling", sync=True) - -retrieval_results = client.retriever.retrieve( - query="your-query", - method="default", - k=3, -) - -for result in retrieval_results["documents"]: - print(f"Content: {result['page_content']}") - print(f"Metadata: {result['metadata']['source']}") - print("====" * 10) -``` - -## Key Features - - - - Simple, Pythonic interface designed for developer productivity - - - Comprehensive type annotations for better IDE support - - - Both synchronous and asynchronous operation modes - - - Detailed error information with custom exception types - - - -## Configuration Options - -```python -client = SimbaClient( - api_url="http://localhost:8000", - api_key="your-api-key", # Optional for authenticated setups - timeout=30, # Request timeout in seconds - max_retries=3, # Number of retry attempts - verify_ssl=True # Verify SSL certificates -) -``` - -## Available Modules - -| Module | Description | -|--------|-------------| -| `client.documents` | Document management | -| `client.chunks` | Chunk operations | -| `client.vector_stores` | Vector store configuration | -| `client.embeddings` | Embedding model settings | -| `client.retrieval` | Semantic search and retrieval | - -## Error Handling - -```python -from simba_sdk import SimbaClient -from simba_sdk.exceptions import SimbaApiError - -client = SimbaClient(api_url="http://localhost:8000") - -try: - results = client.retrieval.retrieve(query="What is RAG?") -except SimbaApiError as e: - print(f"API Error: {e.message}") - print(f"Status Code: {e.status_code}") -``` - -For more detailed examples, check out our [document ingestion example](/examples/document-ingestion). \ No newline at end of file diff --git a/frontend/src/components/DocumentManagement/PreviewModal.tsx b/frontend/src/components/DocumentManagement/PreviewModal.tsx index 3168456..5b6cbb0 100644 --- a/frontend/src/components/DocumentManagement/PreviewModal.tsx +++ b/frontend/src/components/DocumentManagement/PreviewModal.tsx @@ -3,6 +3,9 @@ import ReactMarkdown from 'react-markdown'; import rehypeRaw from 'rehype-raw'; import rehypeSanitize from 'rehype-sanitize'; import remarkGfm from 'remark-gfm'; +import remarkMath from 'remark-math'; +import rehypeKatex from 'rehype-katex'; +import 'katex/dist/katex.min.css'; // Import KaTeX CSS import { useState, useEffect, useRef } from 'react'; import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select"; import { Button } from "@/components/ui/button"; @@ -32,6 +35,22 @@ const imageStyles = ` } `; +// Add enhanced KaTeX styles +const mathStyles = ` + .katex { + font-size: 1.1em !important; + line-height: 1.5 !important; + } + .katex-display { + margin: 1em 0 !important; + overflow-x: auto !important; + overflow-y: hidden !important; + } + .math-inline { + padding: 0 0.15em !important; + } +`; + const PreviewModal: React.FC = ({ isOpen, onClose, @@ -419,21 +438,74 @@ const ChunkContent = ({ content }: { content: string }) => { return
Invalid content
; } - // CRITICAL FIX: Simply use dangerouslySetInnerHTML to render content directly - // This bypasses ReactMarkdown completely which may be causing rendering issues + // Check if content contains LaTeX-style math that would benefit from KaTeX + const hasMathContent = /\$.*?\$|\${2}.*?\${2}/g.test(content); + + // Check if content contains image markdown syntax that needs special handling + const hasImageSyntax = /!\[(.*?)\]\((data:image\/[^)]+)\)/g.test(content); + + // If we detect image syntax, use the original rendering method which worked for images + if (hasImageSyntax) { + // For content with images, process it using our basic formatter + const processedContent = content + // Manually format superscript notation for math/citations + .replace(/\$\{\s*\}\^{([^}]+)}\$/g, '$1') + // Handle other LaTeX-style formatting that might appear + .replace(/\$\^{([^}]+)}\$/g, '$1') + .replace(/\$_{([^}]+)}\$/g, '$1'); + + return ( + <> + +
') + // Add line breaks for better readability + .replace(/\n/g, '
') + }} + /> + + ); + } + + // For complex math content, use the full KaTeX renderer + if (hasMathContent) { + return ( + <> + + + + {content} + + + ); + } + + // For regular content without math or images, use normal markdown + // But still apply the simple formatting to handle basic superscripts + const processedContent = content + // Manually format superscript notation for math/citations in case KaTeX isn't working + .replace(/\$\{\s*\}\^{([^}]+)}\$/g, '$1') + .replace(/\$\^{([^}]+)}\$/g, '$1') + .replace(/\$_{([^}]+)}\$/g, '$1'); + return ( <> -
') - // Add line breaks for better readability - .replace(/\n/g, '
') - }} - /> + remarkPlugins={[remarkGfm]} + rehypePlugins={[rehypeRaw, rehypeSanitize]} + > + {processedContent} + ); }; diff --git a/simba/api/ingestion_routes.py b/simba/api/ingestion_routes.py index 98d27ef..74dea28 100644 --- a/simba/api/ingestion_routes.py +++ b/simba/api/ingestion_routes.py @@ -103,8 +103,14 @@ async def delete_document(uids: List[str]): # Delete documents from vector store for uid in uids: simbadoc = db.get_document(uid) - if simbadoc.metadata.enabled: - store.delete_documents([doc.id for doc in simbadoc.documents]) + if simbadoc and simbadoc.metadata.enabled: + try: + store.delete_documents([doc.id for doc in simbadoc.documents]) + except Exception as e: + # Log the error but continue with deletion + logger.warning( + f"Error deleting document {uid} from vector store: {str(e)}. Continuing with database deletion." + ) # Delete documents from database db.delete_documents(uids)