Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in uniprot-dat-to-fasta.py #1

Open
rjacak opened this issue Feb 9, 2017 · 0 comments
Open

Bug in uniprot-dat-to-fasta.py #1

rjacak opened this issue Feb 9, 2017 · 0 comments

Comments

@rjacak
Copy link

rjacak commented Feb 9, 2017

Thanks for providing the python script to convert .dat files to FASTA. I did, however, find a bug when trying to convert an old UniProt database. There's an assumption in uniprot-dat-to-fasta.py that any line split resulting in only two characters must be a line with a tag. This actually has the side effect of causing any two amino acid sequences (of which there are some) to create errors, as well as truncating large numbers of other sequences.
A fix is to keep a copy of the original line read from the line without stripping off any whitespace, and checking to see if that line starts with 5 spaces. According to the .dat file format, this is something else that distinguishes a line with sequence from lines with tags. There is probably a faster solution but that's what worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant