Understanding output #20

Ge0rges · 2024-10-31T02:00:02Z

Hello,

I ran the tool on a FASTA file representing a genome which contains many different contigs. I used python Promotech-master/promotech.py -g -i results/ -o results/ to generate the final results.

In the results folder I look at genome_predictions.csv, in the column chrome I get a list of every contig separated by | and then the sequence is quite long.

Does the sequence represent the entire promoter then, meaning the next nucleotide is the start codon?
How can I identify which contig the entry actually belongs to?

Thanks

The text was updated successfully, but these errors were encountered:

Ge0rges · 2024-10-31T06:41:55Z

So I think I figured out that the contigs get concatenated so a sliding window can work (though in the case of a MAG that doesn't really make sense). However, many sequences don't exist in the FASTA file (forward strand sequences).

The highest forward strand sequence I could find in the FASTA which scored 91% had a start codon about 50 nucleotides away. Which I guess was a little unexpected. Does this track with results seen?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding output #20

Understanding output #20

Ge0rges commented Oct 31, 2024

Ge0rges commented Oct 31, 2024

Understanding output #20

Understanding output #20

Comments

Ge0rges commented Oct 31, 2024

Ge0rges commented Oct 31, 2024