Skip to content

Second project for Introduction to computational biology

Notifications You must be signed in to change notification settings

JuliaGol/motif_enrichment

Repository files navigation

motif_enrichment

The second project for Introduction to computational biology It contains two scripts. First script - E_coli_proteins_finder.py which finds E.Coli proteins in the genome. In this task, genome is included in a fasta file with genes' sequences (genes_e_coli.fa) and desirable proteins are in a fasta file with protein sequences (protein_fragments.fa). Protein fragments need to be matched against DNA sequences, so Blast is performed. The best hits are wrote to .csv file (ecolihits.csv). The file in which promoter's sequences is given (proms_e_coli.fa). This file includes promoters from theoretical groups A and B, which should have different regulatory mechanisms. This script identifies 10 sequence motifs present in the promoters associated with group A and 10 motifs associated with group B with length 15. To reach this goal script genrates file fasta files (promotersA.fa and promotersB.fa) as input to MEME-suite.

The output should be changed manually, because of the different MEME suite versions between one implemented in Biopython and one available online. For example: it was: PRIMARY SEQUENCES= promotersB.fa CONTROL SEQUENCES= --none-- it should be: DATAFILE= promotersB.fa I uploded correct files to this respository: motifsA.txt and motifsB.txt.

The second script - motif_enrichment.py parses MEME files and uses the binomial test to identify the motifs that are significantly enriched. Statistics are writen to statisticsresults.tsv file.

About

Second project for Introduction to computational biology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages