Coffeebot is a chatbot barista developed for the NYU-AD RoboCafe research project. It leverages the Pipecat framework for conversational AI to take user-specific coffee orders. Coffeebot is tailored to operate according to the specifications of the Keurig K-Cafe Single Serve K-Cup Coffee, Latte, and Cappuccino Maker, meaning each order is crafted according to the options available on this specific machine. For its implementation, Coffeebot utilizes MeetKai Functionary for the language model (LLM), Tacotron2 (trained on the LJSpeech dataset and accessed through Coqui-ai) for text-to-speech (TTS), and Daily-co for speech-to-text (STT). The user interface is based on a modified version of Pipecat-ai’s web-client UI.
Below is an overview of the interaction flow Coffeebot uses to guide users through placing their coffee orders. This flowchart represents how the chatbot interacts to ensure user preferences are accurately captured.
Through the Functionary LLM, the chatbot engages in dynamic conversations with users, making function calls when necessary to process the user inputs.
- Coffee Pod Choice: The chatbot starts by asking the user to choose a coffee pod. The options available are light, medium, or dark roast coffee pods.
- Coffee Type Selection: The user is then asked to select the type of coffee they want. The options, as per the coffee machine, are regular coffee, cappuccino, or latte.
- Regular Coffee: The user is prompted to choose the cup size next, with options including 2, 6, 8, 10, or 12 ounces.
- Cappuccino/Latte: The user is asked to specify the type of milk (fresh, soy, almond, or skimmed milk) and the drink temperature (hot or cold). Note that the cup size for cappuccino and latte is fixed at 2 ounces as per the coffee machine’s default setting.
- Order Confirmation: Once all preferences are specified, the chatbot confirms the order with the user to ensure accuracy before processing.
Coffeebot is designed to adapt to the user's input style, processing orders either in part or in full. This flexibility allows users to provide all coffee details at once or progressively throughout the interaction.
- Complete Order Input: If a user specifies all details upfront (e.g., "I would like a dark roast pod in a cold cappuccino with almond milk"), Coffeebot will directly move to confirm the order without further queries.
- Partial Order Input: If a user starts with partial details (e.g., "I want a dark roast pod for a cappuccino"), Coffeebot will then ask for the remaining details, such as milk type and temperature.
This guarantees a smooth and efficient ordering process, similar to the conversational style of a human barista.
Coffeebot requires powerful hardware, particularly strong GPUs, to effectively host the computationally intensive LLM and TTS models.
Ensure you have access to a machine or server equipped with adequate GPU resources. Detailed steps on how to configure and run the LLM and TTS APIs are provided below to guide you through the necessary setup to get Coffeebot operational.
NYU affiliates may have the option to access university-provided servers that are equipped with the necessary hardware to support these models. If you are an NYU affiliate and need access to these resources, please reach out to me for assistance at sohaila.mohammed@nyu.edu. After consulting with relevant faculty, I can provide you with the details required to configure your config.sh
file. This will allow you to connect to NYU's servers and run the Coffeebot project.
Before you begin, you'll need to set up your Daily account to obtain the necessary API key and URL:
Getting Daily API Key:
- Create a Daily account.
- Login to your account and find the “Developers” section in the side menu of the landing page.
- There you will find the API key.
- Copy the key to save it for later use.
Getting Daily Room URL:
- Login to your Daily account and find the “Rooms” section in the side menu of the landing page.
- Click on the “Rooms” section and find the “Create room” button.
- Click on “Create room” then you can add a room name (otherwise it will be a randomly generated string), but simply keep the default settings and confirm room creation.
- You will then be redirected back to the “Rooms” section and you should find the room you created under the “Your rooms” section.
- Click on the name of the room you created, then you will be able to see all of the room’s information, including its URL.
- Copy the URL to save it.
git clone --recurse-submodules https://github.com/Sohaila-Abdulsattar-Mohammed/RoboCafe-Coffeebot-NYU-AD.git
cd RoboCafe-Coffeebot-NYU-AD
pip install -r requirements.txt
-
Setup TTS API:
- The file you will need to run is the
coqui_api.py
file provided in this repository. You can copy it wherever you need to run it. - Run the TTS API using the command:
python3 coqui_api.py
- Note the URL at which your TTS API is running.
- The file you will need to run is the
-
Setup LLM API:
- Clone the MeetKai Functionary repository where you will be running the LLM:
git clone https://github.com/MeetKai/functionary.git
- Navigate to the functionary directory:
cd functionary
- Install the required modules:
pip install -r requirements.txt
- Run the LLM API using the command:
python3 server_vllm.py --model meetkai/functionary-small-v2.5
- Note the URL at which your LLM API is running.
- Clone the MeetKai Functionary repository where you will be running the LLM:
- Server Access:
- Create a
config.sh
file with the necessary details (which will be provided to you) to access NYU’s servers.
- Create a
-
Create and Configure .env File:
- Make a copy of the
env.example
file and rename it to.env
:cp env.example .env
- Fill in the necessary details in the
.env
file:DAILY_SAMPLE_ROOM_URL=your_daily_room_url DAILY_API_KEY=your_daily_api_key TTS_URL=your_tts_api_url LLM_URL=your_llm_api_url
DAILY_SAMPLE_ROOM_URL
: The room URL you obtained as per the instructions in the prerequisites section.DAILY_API_KEY
: The API key you obtained from Daily as per the instructions in the prerequisites section.TTS_URL
: The URL at which your TTS API is running; the format ishttp://IP_ADDRESS:PORT/tts
.LLM_URL
: The URL at which your LLM API is running; the format ishttp://IP_ADDRESS:PORT/v1
.
Note for NYU Affiliates:
- The TTS and LLM URLs will be provided to you.
- Make a copy of the
-
Configure Web UI Environment:
- Navigate to the
webui
directory:cd webui
- Create a
.env.development.local
file:touch .env.development.local
- Add the following configuration details to the
.env.development.local
file:VITE_APP_TITLE=Coffee Bot VITE_SERVER_URL=http://localhost:7860
- Navigate to the
-
Ensure LLM and TTS APIs are Running:
- Make sure both your LLM and TTS APIs are running and accessible.
-
Start the Coffeebot Server:
- Navigate to the main Coffeebot directory and start the server:
python3 server.py
- Navigate to the main Coffeebot directory and start the server:
-
Start the Web UI:
- Open a new terminal, navigate to the
webui
directory, and execute the following commands:yarn yarn run build yarn dev
- Open a new terminal, navigate to the
-
Access the Web UI:
- Open your browser and go to
http://localhost:5173/
to interact with the Coffeebot web interface.
- Open your browser and go to
-
Start All Necessary Processes:
- Execute the provided script to start all necessary processes:
bash start_coffeebot.sh
- Execute the provided script to start all necessary processes:
-
Access the Web UI:
- Wait for the confirmation message indicating all processes are running, then open your browser and go to
http://localhost:5173/
to interact with the Coffeebot web interface.
- Wait for the confirmation message indicating all processes are running, then open your browser and go to
-
Debugging:
- To debug errors, you can stop the processes with
Ctrl+C
and view the following log files:server.log
for server-related issues.functionary.log
for LLM-related issues.coqui.log
for TTS-related issues.webui.log
located under thewebui
directory for UI-related issues.
- To debug errors, you can stop the processes with