Genetic Neural Network (GNN) is an artificial neural network for predicting genome-wide gene expression given gene knockouts and master regulator perturbations. In its core, the GNN maps existing gene regulatory information in its architecture and it uses cell nodes that have been specifically designed to capture the dependencies and non-linear dynamics that exist in gene networks. These two key features make the GNN architecture capable to capture complex relationships without the need of large training datasets.
- Operating Systems: Verified on Ubuntu 18.04, and MacOS Sierra 10.12.6
- Programming Languages: python (version >= 3.4) and lua (version = 5.1)
- Libraries: GUROBI (version >= 6.5), Torch7, Keras, TensorFlow and pandas
Follow installation steps for details.
Training. To train a GNN, you only need the GE dataset. There is also the option to also supply the known/inferred TRN network as a tsv file (see bellow for file formats). If no TRN file is provided, the GNN trainer will first run the GENIE3 method, create an inferred TRN network and it will use that to train the GNN. By default, the trained model will be saved under directory named model_dir
.To train the GNN, write:
./train.py --dataset dataset.csv [--trn net.tsv] [--output-model-dir model_dir]
Prediction. A trained GNN can be used to predict a new gene expression profile:
./predict.py --input gnn_input.csv [--load-model-dir model_dir] --output gnn_pred.csv
File formats.
-
dataset.csv
(e.g.): Each row corresponds to GE profile of an experiment. First column contains knockout genes (separated by&
if multiple knockouts). Each other column represents the expression of a gene. First row encodes column names. -
net.tsv
(e.g.): Each row encodes a single regulatory relationship. First column corresponds to transcription factor (TF) gene and second column to the gene regulated by TF. -
gnn_input.csv
(e.g.): It encodes knockout information (column1) and the expression of master regulator (MR) genes (column2 to last). Each row, corresponds to an experiment. The list of MR genes can be found frommodel_dir/MR_genes.csv
. First row encodes column names. -
gnn_pred.csv
(e.g.): Each row represents predicted gene expressions corresponding to a row ofgnn_input.csv
above. First row encodes the gene names.
Follow instructions here to reproduce performance benchmarks of our article (Figure 3 and 4).
For any questions contact Ameen Eetemadi (eetemadi@ucdavis.edu).
Eetemadi A and Tagkopoulos I. Genetic Neural Networks: An artificial neural network architecture for capturing gene expression relationships. Bioinformatics. 2018. [link]
See the LICENSE file for license rights and limitations (Apache2.0).
This work was supported by grants from National Science Foundation (1516695, 1743101 and 1254205).