Consensus sequence motifs are short sequences of amino acids shared by proteins across multiple organisms that are associated with a specific biological function such as phosphorylation sites and metal binding sites. Currated databases of sequence motifs are publically available.
There are currently web applications available that indentify sequence motifs from a database of motifs (ELM) or from either a database or a user specified sequence (ScanProsite). Both of these resources include statistical tools to quantify the probability of a given motif occuring. However, the rate of false positives is still high as concensus motifs are short and can be found by chance.
SeqMo_ID works on the hypothesis that a high degree of conservation of consensus sites can be used to identify sequence motifs that are functional in vivo. It takes multiple protein sequences from different individuals or species and determines how conserved a given motif is across the sample. More highly conserved motifs are more likely candidates to be biologically functional.
- accession numbers
- protein names
- ftp to download .faa files
seqkit
to filter
- filtered protein
out.faa
file
Using the output from the algorithms that define consensus sites, SeqMo-ID generates tables for each protein of interest that include the GeneID and Strain ID from the gene annotation (directly from out.faa
) as well as each gene has the each location motif conserved with the reference sequence. The last column include the number of times the motif occurs in the sequence but is not conserved with the reference sequence.
Visualization tools provide rapid summarizations of our data and allow a visual complement to the analytical search and categorize tools developed in SeqMo-ID. We make use of the R-based msaR tool.
- Integrate steps that are currently seperate: Getting data, Defining consensus sites + analysis, and visualization
- Improve automation of analysis tables
- Allow the algorithm to handle "wildcard" positions that can take any amino acid or a specified list of amino acids
TBD
Listed alphabetically by last name
- Miranda Lynch, PhD, Hauptman-Woodward Medical Research Institute
- Kevin McPherson, Bellwethr
- Amy Pomeroy, UNC Chapel Hill Medical School
- Kimiko Suzuki, UNC Chapel Hill Medical School