RAPID is a state-of-the-art tool designed to assist in the diagnosis of aphasia using multimodal inputs such as speech, gesture, and audio data. By leveraging advanced machine learning models, it enables real-time self-diagnosis and generates a comprehensive diagnostic report.
- Multimodal Analysis: Integrates speech, gesture, and audio data for precise aphasia type detection
- User-Friendly: Designed for easy use at home, reducing the need for frequent hospital visits
- Interpretability: Highlights model attention areas, offering transparency in predictions
- Comprehensive Reports: Provides detailed diagnostic insights, risk assessments, and personalized recommendations
To set up the RAPID environment, ensure you have Python 3.8 installed, and then follow these steps:
-
Clone the repository:
git clone https://github.com/serizard/RAPID.git cd RAPID
-
Install dependencies:
pip install -r requirements.txt
RAPID utilizes the AphasiaBank Dataset labeled with Western Aphasia Battery (WAB) classifications:
- Control (Normal)
- Fluent Aphasia
- Non-Comprehensive Aphasia
- Non-Fluent Aphasia
Before downloading the dataset, you should get access to TalkBank.
Visit the website and check qualification.
- Transcription: Processed using whisper-timestamped
- Audio Features: Extracted using opensmile
- Gesture Analysis: Conducted using MediaPipe
- Chunking: Data is split into chunks of 40 tokens for model input
To train the RAPID model:
- Navigate to the project directory
- Run the training script (or you can use .sh scripts):
python main.py
RAPID offers a simple demo for testing and diagnosis:
- Start Test: Record a video of yourself speaking a predefined text (e.g., the Cinderella story)
- View Results: move to the demo directory and execute
demo.py
. Make sure that all of required paths and key are involved inconfig.yaml
.
To launch the RAPID user interface via Streamlit:
cd streamlit-app
streamlit run app.py
Also, you have to install dependencies and turn on the dev mode for React.
cd components/react_components/frontend
npm install
npm start
- Start Test: Record a video of yourself speaking a predefined text (e.g., the Cinderella story)
- View Results: Visualize model attention scores and receive a diagnostic report *Note that you can get actual results only if the remote api server is running.
To configure the remote server for inference:
- Set up a FastAPI server:
cd rapid-api-server python main.py
- Use the provided API endpoints for remote model inference
This project builds upon the foundational work from the EMNLP '23 paper:
Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
Authors: Daeun Lee, Sejung Son, Hyolim Jeon, Seungbae Kim, Jinyoung Han
EMNLP 2023 Proceedings
We sourced the dataset from AphasiaBank with Institutional Review Board (IRB) approval and strictly follow the data sharing guidelines provided by TalkBank, including the Ground Rules for all TalkBank databases based on American Psychological Association Code of Ethics (Association et al., 2002). Additionally, our project adheres to the five core issues outlined by TalkBank in accordance with the General Data Protection Regulation (GDPR) (Regulation, 2018). These issues include addressing commercial purposes, handling scientific data, obtaining informed consent, ensuring deidentification, and maintaining a code of conduct. Considering ethical concerns, we did not use any photographs and personal information of the participants from the AphasiaBank.