text_tools_project

Generalized Mental Health Classifier

This repository explores the challenges of building a generalized mental health classifier using anonymous, user-generated text data from various online platforms. Mental health classification is inherently complex due to overlapping symptoms, ambiguous language, and low signal-to-noise ratio in textual descriptions. By leveraging these unstructured data sources, our work sheds light on:

The nuanced ways patients describe symptoms. Limitations of LLM-based approaches for mental health classification. Insights into the complexities of mental health diagnosis. This project contributes to advancing the understanding of mental health dynamics while improving accessibility to mental health resources.

Navigating the Repo

Data

In the data folder, you will find cleaned-output, which is the data we used for all downstream tasks. You will also find a folder that contains all of the train-val-test splits for our experiments, as well as a .ipynb file to create unique data splits. The scripts used to create cleaned-output can be found in the folder called preprocessing. This folder contains scripts for both cleaning and creating the mini-csvs

Statistics

In the stats folder, you will find a few scripts and a directory for calculating interesting statistics about the data. pos contains the following scripts and files:

tagger.py, which tags the mini-csvs using thespacy tagger.
pos-stats.sh, which generates POS tags for the CSVs.
pos_count.txt, which contains the POS counts generated by pos-stats.sh.
type-token.sh, which generates type token ratios for each mini CSV.
anygram directory, which is contains all scripts for ngrams processing of each individual disorder files.

Replicating Results

We make the process of replicating results relatively straightforward. If you want to replicate a result for a particular model, simply run one of the training scripts in the trainers directory. Your model will be saved to an output directory with the same name as the corresponding .py file.

Inferencing and Probing

We've uploaded our two downsampled models to huggingface. To inference with your own symptomatic description, run predictor.py in the command line as follows:

python predict.py --model "rachelhamelburg/downsampled_disorder_only" or "rachelhamelburg/downsampled_model --text "Your symptoms"

Keep in mind that the model was trained on lengthy symptomatic descriptions. Short descriptions are not likely to yield good results.

Exploratory

We also streamline the process of experimenting with different datasets, models, and a few hyperparameters. Simply cd into the experimental folder, and put the appropriate arguments in the command line:

python train_model.py --train_file <train_file> --val_file <val_file> --test_file <test_file> --model_name <model_name> --output_dir <output_dir> --num_epochs <num> --batch_size <num> --learning_rate <rate>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text_tools_project

Generalized Mental Health Classifier

Navigating the Repo

Data

Statistics

Replicating Results

Inferencing and Probing

Exploratory

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
experimental		experimental
preprocessing		preprocessing
statistics		statistics
trainers		trainers
README.md		README.md
requirements.txt		requirements.txt

sammartinoj/Mental-Health-Classification

Folders and files

Latest commit

History

Repository files navigation

text_tools_project

Generalized Mental Health Classifier

Navigating the Repo

Data

Statistics

Replicating Results

Inferencing and Probing

Exploratory

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages