You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for providing the python script to convert .dat files to FASTA. I did, however, find a bug when trying to convert an old UniProt database. There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag. This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces. According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags. There is probably a faster solution but that's what worked for me.
The text was updated successfully, but these errors were encountered:
Thanks for providing the python script to convert .dat files to FASTA. I did, however, find a bug when trying to convert an old UniProt database. There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag. This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces. According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags. There is probably a faster solution but that's what worked for me.
The text was updated successfully, but these errors were encountered: