Automated phenotyping

Project description

Repository for the manuscript entitled "Comparative neural word embeddings approaches for medical concept representation and patient trajectory prediction".

This project aims to compare NLP models (word2vec, fastTex and GloVe), based on the quality of their representation of medical concepts.
We use MIMIC-IV to train the models, from which we extract patient trajectories as sequences of (amongst others) ICD10 and ATC codes.
We train the models with model-specific NLP tasks that use patient trajectory sequences as input.
We evaluate the models by producing medical concept embeddings, clustering them, comparing them to existing biomedical terminologies, and using them for medical outcome and patient trajectory prediction tasks.

Installation requirements

Clone the repository:

git clone git@github.com:ds4dh/medical_concept_representation.git  # or https://github.com/ds4dh/medical_concept_representation.git
cd medical_concept_representation

Install the required dependencies:
```
./create_env.sh
conda activate medical_representation
```
If you have version issues, you can build an environment with the packages listed in environment.yml
The project uses WandbLogger for experiment tracking. Ensure you have a Weights & Biases account set up for logging.

Usage

You need to download the data yourself! Instructions for downloading and pre-processing the data are here: https://github.com/ds4dh/medical_concept_representation/tree/main/data

Once the pre-processed data is ready, train the models with:

python run_all_models.py  # long step, best in screen https://linuxize.com/post/how-to-use-linux-screen/

Once the models are trained, test the trained models with:

python run_all_models.py -t  # long step, best in screen https://linuxize.com/post/how-to-use-linux-screen/

Result figures 4, 5, and 7 will be available at your wandb log page.

For the other result figures, run:

python figures/figure_6.py
python figures/figure_8.py
python figures/figure_8_bis.py   # supplementary figures
python figures/figure_9.py

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
configs		configs
data		data
figures		figures
metrics		metrics
models		models
.gitignore		.gitignore
README.md		README.md
create_env.sh		create_env.sh
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
environment.yml		environment.yml
run.py		run.py
run_all_models.py		run_all_models.py
train_utils.py		train_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated phenotyping

Project description

Installation requirements

Usage

About

Releases

Packages

Contributors 3

Languages

ds4dh/medical_concept_representation

Folders and files

Latest commit

History

Repository files navigation

Automated phenotyping

Project description

Installation requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages