[Question]: Configuration of the VM instance to host librechat in the cloud for a corporate setting #5541

jainrahulsethi · 2025-01-29T09:40:25Z

jainrahulsethi
Jan 29, 2025

What is your question?

Hi Community,

Thank you for the fantastic contribution. I am looking forward to your insights on what VM settings/configurations should be used to host the Libre Chat in a VM for a corporate setting for up to 500 users. The idea is also to host the RAG within the VM itself. Will a GPU too be useful? What configurations etc.?

Thank you!

More Details

Any pointers on this would be appreciated.

What is the main subject of your question?

No response

Screenshots

No response

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2025-01-29T14:56:58Z

danny-avila
Jan 29, 2025
Maintainer

Hi, thanks for your question.

A GPU would only be useful if running local models in the same server.

Try at least 4 GB of RAM and 2 vCPU for that many users.

2 replies

jainrahulsethi Jan 29, 2025
Author

Yes, I have been running Ollama on the same server. In that case, I guess a GPU instance would be useful? Would a T4 instance be sufficient?
Also, I have been using a custom model. It works fine. However whenever the size of the pdf is more than let's say 500 KB or some number it fails and looks like the file keeps on uploading until it results in an error. Surprisingly I dont see anything in any of the error logs too. What could be the reason?

danny-avila Jan 29, 2025
Maintainer

it could take long depending on the embedding service and how much text is in the PDF. If you are behind a reverse proxy, it could also limit the file payload by about that much. In NGINX, it needs to be increased with client_max_body_size.

Unfortunately, without more details of your setup, I can't really help. The server should be generating error logs otherwise it could be a network error.

jainrahulsethi · 2025-01-29T17:23:42Z

jainrahulsethi
Jan 29, 2025
Author

Hi Danny, Thanks for the quick reply!
it is behind a reverse_proxy and inside the nginx.conf file I have included the said key.. However it still doesnt work at all for pdf files beyond a size.
server {

// rest of the items
client_max_body_size 100m;

}

1 reply

jainrahulsethi Jan 29, 2025
Author

Also I see this in the debug logs

jainrahulsethi · 2025-01-30T03:34:45Z

jainrahulsethi
Jan 30, 2025
Author

Hi,

The file upload issue worked when I introduced timeout-related changes in the nginx.conf file. Just the client_max_body_size attribute was not sufficient. The following configuration changes helped resolve the issue:

nginx

# Proxy requests to the backend application
location / {
    proxy_pass <REDACTED_BACKEND_URL>; # Adjust this to point to your application's backend
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # Increase timeout settings for large file uploads
    proxy_connect_timeout 600s;
    proxy_send_timeout 600s;
    proxy_read_timeout 600s;
    send_timeout 600s;
}

# Serve static files if necessary
location /static/ {
    root <REDACTED_STATIC_PATH>;
}

# Enable Gzip compression for performance optimization
gzip on;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

}

0 replies

jainrahulsethi · 2025-01-31T10:44:42Z

jainrahulsethi
Jan 31, 2025
Author

Hi Danny,

I request your help in verifying if my understanding of the flow/architecture of the application correct with a self hosted RAG and embeddings model:-

LibreChat Architecture Flow

1. File Upload and Initial Handling

User Upload to NGINX:
The user uploads a file. This upload request hits the LibreChat-NGINX container first.
NGINX to Backend (Initial):
NGINX forwards the file upload request to the LibreChat (main backend) container.
Backend Receives File:
The backend receives the uploaded file.

2. File Processing and Embedding (Offline-ish, but triggered by upload)

Backend Orchestrates Chunking:
The backend orchestrates the chunking of the uploaded file using a library or built-in functionality to split the file into smaller chunks.
Backend Sends Chunks to RAG Container:
The backend sends each chunk to the librechat-rag-api-1 (RAG) container instead of directly sending it to Ollama.
RAG Container to Ollama:
The RAG container, upon receiving a chunk, sends it to the Ollama container for embedding generation.
Ollama to RAG Container:
Ollama returns the embedding to the RAG container.
RAG Container to Vector Database:
The RAG container sends the embedding and metadata to the librechat-vectordb-1 (vector database).
Repeat for All Chunks:
Steps 2.2 - 2.5 are repeated for all chunks of the uploaded file.

3. User Interaction (Real-time - Chat)

User Query:
The user submits a query.
Query to NGINX:
The query goes to the LibreChat-NGINX container.
NGINX to Backend:
NGINX forwards the query to the LibreChat backend container.
RAG Trigger:
The backend determines if the Retrieval-Augmented Generation (RAG) pipeline is needed.

4. RAG Pipeline (if triggered)

Request to RAG API:
The backend sends the query to the librechat-rag-api-1 container.
Embeddings Generation (Ollama - for the query):
The RAG container sends the user's query to the Ollama container to generate an embedding for the query.
Vector Database Query:
The RAG container uses the query's embedding to query the librechat-vectordb-1 container.
Context Retrieval:
The vector database returns the relevant chunks.
Context to RAG Container:
The relevant chunks are sent back to the librechat-rag-api-1 container.
Context to Backend:
The RAG container sends the retrieved chunks (the context) back to the LibreChat backend container.

5. LLM Interaction

Prompt Construction:
The backend combines the user's query and the retrieved context to form a prompt.
Request to Any LLM Endpoint:
The backend sends the constructed prompt to the LLM endpoint (replacing Databricks LLM).
The LLM endpoint processes the prompt to generate a response.
LLM Response:
The LLM endpoint returns the response to the LibreChat backend container.

6. Final Response

Response to NGINX:
The backend sends the final response to the LibreChat-NGINX container.
Response to User:
NGINX forwards the response to the user.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Configuration of the VM instance to host librechat in the cloud for a corporate setting #5541

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[Question]: Configuration of the VM instance to host librechat in the cloud for a corporate setting #5541

jainrahulsethi Jan 29, 2025

What is your question?

More Details

What is the main subject of your question?

Screenshots

Code of Conduct

Replies: 4 comments · 3 replies

danny-avila Jan 29, 2025 Maintainer

jainrahulsethi Jan 29, 2025 Author

danny-avila Jan 29, 2025 Maintainer

jainrahulsethi Jan 29, 2025 Author

jainrahulsethi Jan 29, 2025 Author

jainrahulsethi Jan 30, 2025 Author

jainrahulsethi Jan 31, 2025 Author

LibreChat Architecture Flow

1. File Upload and Initial Handling

2. File Processing and Embedding (Offline-ish, but triggered by upload)

3. User Interaction (Real-time - Chat)

4. RAG Pipeline (if triggered)

5. LLM Interaction

6. Final Response

jainrahulsethi
Jan 29, 2025

Replies: 4 comments 3 replies

danny-avila
Jan 29, 2025
Maintainer

jainrahulsethi Jan 29, 2025
Author

danny-avila Jan 29, 2025
Maintainer

jainrahulsethi
Jan 29, 2025
Author

jainrahulsethi Jan 29, 2025
Author

jainrahulsethi
Jan 30, 2025
Author

jainrahulsethi
Jan 31, 2025
Author