Obtaining protein family members

A SEDA pipeline created in Compi that implements the "Obtaining protein family members" SEDA-based protocol. Created using the SEDA-Compi pipelines framework.

This protocol shows how to retrieve all members of a given protein family such as, for instance, mucins. The main feature of mucin proteins is their extended region of tandemly repeated sequences (PTS repeats), which contain prolines (P) together with serines (S), and/or threonines (T), which generally occupy between 30% and 90% of the protein length, and that cannot be detected in homology searches due to their poor sequence conservation (https://doi.org/10.1371/journal.pone.0003041). Mucins also show signal peptides and other associated domains.

Quick-start: running the pipeline with sample data

Download this ZIP and decompress it. The path where it is extracted will be referred as "working directory" (/path/to/working_dir).

Move to the working directory and edit the params/pfamscan.sedaParams file to set your e-mail address in the third line (eMail), otherwise PfamScan will not run. Then, simply run ./run.sh "$(pwd)" to execute the entire pipeline with two input files.

The two input FASTA files for Homo sapiens (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39) and Drosophila melanogaster (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001215.4) were downloaded from the NCBI assembly RefSeq database by selecting the Download assembly / Protein FASTA (.faa) option.

To run specific tasks an additional parameter can be passed to the run.sh script: ./run.sh "$(pwd)" "--single-task extract-headers" or ./run.sh "$(pwd)" "--until pfamscan".

Applying the protocol to other case studies

Applying the protocol to other case studies is easy, you only need to:

Put the protein FASTA files at input/pattern-filtering/.
Edit the params/pattern-filtering.sedaParams to set an appropriate pattern filtering parameters to your case study. This file can be created and exported using the SEDA GUI, which is handy for advanced pattern filtering cases.

Contributors

^{Made with contrib.rocks.}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.vscode		.vscode
pipeline-runner		pipeline-runner
task-scripts		task-scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compi.project		compi.project
pipeline.xml		pipeline.xml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obtaining protein family members

Quick-start: running the pipeline with sample data

Applying the protocol to other case studies

Contributors

About

Releases

Packages

Contributors 2

Languages

License

pegi3s/seda-pipeline-protein-family-members

Folders and files

Latest commit

History

Repository files navigation

Obtaining protein family members

Quick-start: running the pipeline with sample data

Applying the protocol to other case studies

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages