Plpred

By Maria Clara Martins

About the project:

A protein subcellular location prediction program (based on Machine Learning models). 🧬

Find out if the protein is located in membrane or cytoplasm!

Avaible at: https://mcm-plpred.herokuapp.com/

Web application developed with Flask.

📁 Project Structure:

environment.yml: Environment configuration file.
requirements.txt: Libs needed for the project.
Makefile: Create "rules" to centralize and execute main commands.
plpred: Main package directory, with application functions.
data/: Data directory. Raw data are saved in data/raw, preprocessed data in data/processed and trained models are saved in data/models (models are serialized using pickle).
plpred/models: provides predictive models based on Random Forest, Gradient Boosting, Neural Networks (MLP) and SVM.
tests/: set of unit tests for Plpred components.

Running locally (Setup):

Clone the repository and run:
$ conda install make (Windows only, "make" comes by default in macOS and Linux)
$ make setup
$ make server
You can view the application at: http://localhost:8000/

👨‍💻 Command line interface (CLI):

`plpred-preprocess`:

usage: plpred-preprocess [-h] -m MEMBRANE_PROTEINS -c CYTOPLASM_PROTEINS -o OUTPUT

plpred-preprocess: data preprocessing tool

optional arguments:
  -h, --help            show this help message and exit
  -m MEMBRANE_PROTEINS, --membrane_proteins MEMBRANE_PROTEINS
                        path to the file containing membrane proteins (.fasta)
  -c CYTOPLASM_PROTEINS, --cytoplasm_proteins CYTOPLASM_PROTEINS
                        path to the file containing cytoplasm proteins (.fasta)
  -o OUTPUT, --output OUTPUT
                        path to the output file (.csv)

`plpred-train`:

usage: plpred-train [-h] -p PROCESSED_DATASET -o OUTPUT [-r]
                    [-a {random_forest,neural_network,gradient_boosting,svm}]

plpred-train: model training tool

optional arguments:
  -h, --help            show this help message and exit
  -p PROCESSED_DATASET, --processed_dataset PROCESSED_DATASET
                        processed dataset generated by plpred-preprocess (.csv)
  -o OUTPUT, --output OUTPUT
                        path to the output trained model (.pickle)
  -r, --report          show classification report
  -a {random_forest,neural_network,gradient_boosting,svm}, --algorithm {random_forest,neural_network,gradient_boosting,svm}
                        machine learning algorithm

`plpred-predict`:

usage: plpred-predict [-h] -i INPUT -o OUTPUT -m MODEL

plpred-predict: subcellular location prediction tool

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        input file (.fasta)
  -o OUTPUT, --output OUTPUT
                        output file (.csv)
  -m MODEL, --model MODEL
                        trained model (.pickle)

`plpred-server`:

usage: plpred-server [-h] -H HOST -p PORT -m MODEL

plpred-server: subcellular location prediction server

optional arguments:
  -h, --help            show this help message and exit
  -H HOST, --host HOST  host adress
  -p PORT, --port PORT  host port
  -m MODEL, --model MODEL
                        trained model to be deployed

Machine Learning - Models description:

(Standard) - RandomForestClassifier: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

GradientBoostingClassifier: GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions.

MLPClassifier: Multi-layer Perceptron classifier. This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

C-Support Vector Classification: The implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
notebooks		notebooks
plpred		plpred
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.json		app.json
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plpred

About the project:

A protein subcellular location prediction program (based on Machine Learning models). 🧬

📁 Project Structure:

Running locally (Setup):

👨‍💻 Command line interface (CLI):

`plpred-preprocess`:

`plpred-train`:

`plpred-predict`:

`plpred-server`:

Machine Learning - Models description:

About

Releases

Packages

Languages

License

mariacmartins/plpred

Folders and files

Latest commit

History

Repository files navigation

Plpred

About the project:

A protein subcellular location prediction program (based on Machine Learning models). 🧬

📁 Project Structure:

Running locally (Setup):

👨‍💻 Command line interface (CLI):

plpred-preprocess:

plpred-train:

plpred-predict:

plpred-server:

Machine Learning - Models description:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`plpred-preprocess`:

`plpred-train`:

`plpred-predict`:

`plpred-server`:

Packages