This document outlines the Flask backend (app.py
) and HTML frontend (ChatClient.html
) for a chatbot application. Here's a breakdown of the functionality and structure:
-
Initialization:
The Flask app is initialized, and CORS (Cross-Origin Resource Sharing) is enabled to allow requests from
http://127.0.0.1:5500
.Logging is set up to track the application's activity.
-
Model Loading:
The application uses the
BloomForCausalLM
model from the Hugging Face Transformers library.The model is either downloaded from Hugging Face or loaded from a local directory (
data/bloom-1b7
).The model and tokenizer are loaded onto the available device (GPU if available, otherwise CPU).
-
Chat Endpoint:
The
/chat
endpoint accepts POST requests with JSON data containing a user message.The message is processed by the Bloom model, which generates a response.
The response is returned as JSON.
-
Health Check:
A simple health check endpoint (
/
) returns a message indicating that the chatbot is running.
-
Running the App:
The Flask app runs on port 5000 by default, or a port specified in the environment variable
PORT
.
-
Structure:
The HTML file defines a simple chat interface with a header, a message display area, and an input field with a send button.
-
Styling:
CSS is used to style the chat interface, including message bubbles, loading animations, and button states.
-
JavaScript Functionality:
The
sendMessage
function sends user input to the Flask backend and handles the response.User messages are displayed in the chat window, and a loading animation is shown while waiting for the bot's response.
The bot's response is displayed in the chat window once received.
The Enter key can be used to send messages.
-
Error Handling:
Errors during the fetch request are caught and displayed as a bot message indicating that something went wrong.
The frontend sends user messages to the
/chat
endpoint of the Flask backend using a POST request.The backend processes the message using the Bloom model and returns a generated response.
The frontend displays the bot's response in the chat window.
-
Ensure Python and the required libraries (
flask
,torch
,transformers
) are installed.Run the Flask app using
python app.py
.
-
Open the
ChatClient.html
file in a web browser (e.g., using a local server likehttp-server
or directly viafile://
protocol).
-
Type a message in the input field and press Enter or click the Send button to interact with the chatbot.
The BloomForCausalLM models are part of the BLOOM (BigScience Large Open-science Open-access Multilingual) family of language models. These models are designed for causal language modeling, predicting the next token in a sequence, making them suitable for text generation tasks.
BLOOM (BigScience Large Open-science Open-access Multilingual) models, specifically the BloomForCausalLM, are designed for causal language modeling. They predict the next token in a sequence, making them suitable for text generation tasks.
BLOOM models come in various sizes. Here’s a comparison of some popular variants:
Model Name | Parameters | Layers | Heads | Hidden Size | Context Window | Multilingual Support | Use Case |
---|---|---|---|---|---|---|---|
bloom-560m | 560 million | 24 | 16 | 1024 | 2048 tokens | Yes (46 languages) | Lightweight, fast inference, suitable for low-resource environments. |
bloom-1b1 | 1.1 billion | 24 | 16 | 1536 | 2048 tokens | Yes (46 languages) | Balanced performance, good for general-purpose text generation. |
bloom-1b7 | 1.7 billion | 24 | 16 | 2048 | 2048 tokens | Yes (46 languages) | Improved performance over 1b1, suitable for more complex tasks. |
bloom-3b | 3 billion | 30 | 32 | 2560 | 2048 tokens | Yes (46 languages) | Higher capacity, better for nuanced text generation and larger contexts. |
bloom-7b1 | 7.1 billion | 30 | 32 | 4096 | 2048 tokens | Yes (46 languages) | Strong performance for advanced tasks, requires more computational resources. |
bloom-176b | 176 billion | 70 | 112 | 14336 | 2048 tokens | Yes (46 languages) | State-of-the-art, massive scale, requires significant computational power. |
-
bloom-560m:
Ideal for lightweight applications or environments with limited computational resources.
Suitable for simple text generation tasks or prototyping.
-
bloom-1b7:
A good balance between performance and resource requirements.
Suitable for general-purpose text generation, chatbots, and more complex tasks.
-
bloom-3b and bloom-7b1:
Better for advanced tasks requiring higher accuracy and nuance.
Requires more computational power but offers significantly better performance.
-
bloom-176b:
State-of-the-art performance for research and large-scale applications.
Requires specialized hardware (e.g., multiple GPUs or TPUs) and is not practical for most users.
-
Hardware Requirements:
Smaller models like bloom-560m can run on CPUs or low-end GPUs.
Larger models like bloom-1b7 and above require GPUs for efficient inference.
The bloom-176b model requires distributed computing infrastructure.
-
Inference Speed:
Smaller models are faster but may produce less coherent or nuanced text.
Larger models are slower but generate higher-quality responses.
-
Memory Usage:
Larger models consume significantly more memory, which can be a bottleneck for deployment.
The choice of BLOOM model depends on your specific use case, available hardware, and performance requirements. For lightweight applications, bloom-560m is a good starting point, while bloom-1b7 offers a balance between performance and resource usage. For advanced tasks, larger models like bloom-7b1 or bloom-176b are recommended, though they require significant computational resources.
This project implements a chatbot using the Bloom-1.7B language
model from Hugging Face's transformers
library. The chatbot is served
via a Flask web application, allowing users to interact with the
model through a simple API endpoint. The application supports CORS
(Cross-Origin Resource Sharing) for seamless integration with frontend
applications.
::: section
- Bloom-1.7B Model: Utilizes the powerful Bloom-1.7B causal language model for generating human-like responses.
- Flask API: Provides a RESTful API endpoint (
/chat
) for sending user messages and receiving model-generated responses. - CORS Support: Enables cross-origin requests from a specified
frontend origin (e.g.,
http://127.0.0.1:5500
). - Health Check: Includes a health check endpoint (
/
) to verify that the chatbot is running. - Error Handling: Robust error handling for invalid requests, model loading issues, and inference errors.
- Device Optimization: Automatically uses GPU if available, otherwise falls back to CPU. :::
::: section
Before running the application, ensure you have the following installed:
- Python 3.8 or higher
pip
(Python package manager) :::
::: section
-
Clone the repository:
git clone https://github.com/attributeyielding/Smart_Chat_Bot.git cd bloom-chatbot
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
The
requirements.txt
file should include:flask torch transformers flask-cors
-
Download the Bloom-1.7B model:
- The application will automatically download the model if it is
not already present in the
data/bloom-1b7
directory. - Ensure you have sufficient disk space (approximately 5-10 GB) for the model. :::
- The application will automatically download the model if it is
not already present in the
::: section
-
Start the Flask server:
python app.py
-
The application will run on
http://0.0.0.0:5000
by default. You can access the health check endpoint at:http://127.0.0.1:5000/
-
Interact with the chatbot:
Send a POST request to the
/chat
endpoint with a JSON payload containing the user's message:{ "message": "Hello, how are you?" }
Example using
curl
:curl -X POST http://127.0.0.1:5000/chat \ -H "Content-Type: application/json" \ -d '{"message": "Hello, how are you?"}'
The response will be in JSON format:
{ "response": "I am doing well, thank you! How can I assist you today?" }
:::
::: section
-
CORS Origins: By default, the application allows requests from
http://127.0.0.1:5500
. To modify this, update theorigins
parameter in theCORS
initialization:CORS(app, origins="http://your-frontend-url.com", supports_credentials=True)
-
Model Path: The model is saved and loaded from the
data/bloom-1b7
directory. You can change this by modifying theMODEL_PATH
variable in the code. -
Device: The application automatically detects and uses a GPU if available. To force CPU usage, modify the
device
variable:device = torch.device("cpu")
:::
::: section
- Endpoint:
GET /
- Description: Verifies that the chatbot is running.
- Response: Plain text string:
"Chatbot is running!"
-
Endpoint:
POST /chat
-
Description: Accepts a user message and returns a model-generated response.
-
Request Body:
{ "message": "Your input message here" }
-
Response:
{ "response": "Model-generated response here" }
-
Error Responses:
400 Bad Request
: If no message is provided.415 Unsupported Media Type
: If theContent-Type
header is notapplication/json
.500 Internal Server Error
: If an error occurs during model inference. :::
::: section
-
Model Download Issues: Ensure you have a stable internet connection and sufficient disk space. If the download fails, manually download the model using:
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-1b7") model = BloomForCausalLM.from_pretrained("bigscience/bloom-1b7") tokenizer.save_pretrained("data/bloom-1b7") model.save_pretrained("data/bloom-1b7")
-
GPU Not Detected: If you have a GPU but it is not being used, ensure that
torch
is installed with CUDA support:pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
CORS Errors: Ensure the frontend URL is correctly specified in the
CORS
configuration.
This project is licensed under the MIT License.
- Hugging Face for the
transformers
library and the Bloom model. - Flask for the web framework.
- PyTorch for the deep learning framework.