Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

Code accompanying the NeurIPS 2022 paper (PDF).

Our talk on chemCPA at the M2D2 reading club is available here. A previous version of this work was a spotlight paper at ICLR MLDD 2022. Code for this previous version can be found under the v1.0 git tag.

Codebase overview

chemCPA/: contains the code for the model, the data, and the training loop.
embeddings: There is one folder for each molecular embedding model we benchmarked. Each contains an environment.yml with dependencies. We generated the embeddings using the provided notebooks and saved them to disk, to load them during the main training loop.
experiments: Each folder contains a README.md with the experiment description, a .yaml file with the seml configuration, and a notebook to analyze the results.
notebooks: Example analysis notebooks.
preprocessing: Notebooks for processing the data. For each dataset there is one notebook that loads the raw data.
tests: A few very basic tests.

All experiments where run through seml. The entry function is ExperimentWrapper.__init__ in chemCPA/seml_sweep_icb.py. For convenience, we provide a script to run experiments manually for debugging purposes at chemCPA/manual_seml_sweep.py. The script expects a manual_run.yaml file containing the experiment configuration.

All notebooks also exist as Python scripts (converted through jupytext) to make them easier to review.

Getting started

Environment

The easiest way to get started is to use a docker image we provide

docker run -it -p 8888:8888 --platform=linux/amd64 registry.hf.space/b1ro-chemcpa:latest

this image contains the source code and all dependencies to run the experiments. By default it runs a jupyter server on port 8888.

Alternatively you may clone this repository and setup your own environment by running:

conda env create -f environment.yml
python setup.py install -e .

Datasets

The datasets are not included in the docker image, but get automatically downloaded when you run the notebooks that require them. The datasets may alternatively be downloaded manually using the python tool in the raw_data/dataset.py folder. Usage is:

python raw_data/dataset.py --list
python raw_data/dataset.py --dataset <dataset_name>

or you may use the following links:

Some of the notebooks use a drugbank_all.csv file, which can be downloaded from here (registration needed).

Data preparation

To train the models, first the raw data needs to be processed. This can be done by running the notebooks inside the preprocessing/ folder in a sequential order. Alternatively, you may run

python preprocessing/run_notebooks.py

A description of the preprocessing steps is given in the preprocessing/README.md file and in the headers of individual notebooks. Section 4 of the paper is also highly relevant.

Training the models

Run

python chemCPA/train_hydra.py

Citation

You can cite our work as:

@inproceedings{hetzel2022predicting,
  title={Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution},
  author={Hetzel, Leon and Böhm, Simon and Kilbertus, Niki and Günnemann, Stephan and Lotfollahi, Mohammad and Theis, Fabian J},
  booktitle={NeurIPS 2022},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
chemCPA		chemCPA
config		config
docs		docs
embeddings		embeddings
experiments		experiments
notebooks		notebooks
preprocessing		preprocessing
raw_data		raw_data
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker_entrypoint.sh		docker_entrypoint.sh
download_training_output.sh		download_training_output.sh
environment.yaml		environment.yaml
environment.yml		environment.yml
load_lightning.ipynb		load_lightning.ipynb
load_lightning.py		load_lightning.py
manual_run.yaml		manual_run.yaml
manual_seml_sweep.py		manual_seml_sweep.py
project_folder		project_folder
pyproject.toml		pyproject.toml
setup.py		setup.py
test_config.yaml		test_config.yaml
test_config_biolord.yaml		test_config_biolord.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

Codebase overview

Getting started

Environment

Datasets

Data preparation

Training the models

Citation

About

Contributors 6

Languages

License

theislab/chemCPA

Folders and files

Latest commit

History

Repository files navigation

Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution

Codebase overview

Getting started

Environment

Datasets

Data preparation

Training the models

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 6

Languages