Bug in uniprot-dat-to-fasta.py #1

rjacak · 2017-02-09T22:46:49Z

Thanks for providing the python script to convert .dat files to FASTA. I did, however, find a bug when trying to convert an old UniProt database. There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag. This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces. According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags. There is probably a faster solution but that's what worked for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in uniprot-dat-to-fasta.py #1

Bug in uniprot-dat-to-fasta.py #1

rjacak commented Feb 9, 2017

Bug in uniprot-dat-to-fasta.py #1

Bug in uniprot-dat-to-fasta.py #1

Comments

rjacak commented Feb 9, 2017