Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding output #20

Open
Ge0rges opened this issue Oct 31, 2024 · 1 comment
Open

Understanding output #20

Ge0rges opened this issue Oct 31, 2024 · 1 comment

Comments

@Ge0rges
Copy link

Ge0rges commented Oct 31, 2024

Hello,

I ran the tool on a FASTA file representing a genome which contains many different contigs. I used python Promotech-master/promotech.py -g -i results/ -o results/ to generate the final results.

In the results folder I look at genome_predictions.csv, in the column chrome I get a list of every contig separated by | and then the sequence is quite long.

Does the sequence represent the entire promoter then, meaning the next nucleotide is the start codon?
How can I identify which contig the entry actually belongs to?

Thanks

@Ge0rges
Copy link
Author

Ge0rges commented Oct 31, 2024

So I think I figured out that the contigs get concatenated so a sliding window can work (though in the case of a MAG that doesn't really make sense). However, many sequences don't exist in the FASTA file (forward strand sequences).

The highest forward strand sequence I could find in the FASTA which scored 91% had a start codon about 50 nucleotides away. Which I guess was a little unexpected. Does this track with results seen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant