Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low annotation ratio for PLM result #9

Open
wen1112 opened this issue Jul 24, 2024 · 4 comments
Open

Low annotation ratio for PLM result #9

wen1112 opened this issue Jul 24, 2024 · 4 comments

Comments

@wen1112
Copy link

wen1112 commented Jul 24, 2024

Hi, thank you for your supports before. I used for my viral protein, but from the "_predictions.csv" only 3,021 protein was annotation (all 109,221 viral protein), and for this 3,021 result 60% is "unknow". When I used hmmsearch for PHROG db, I had 5000+ results. So I'm not sure if there has some problem here?

@zachflam
Copy link
Contributor

Hi, that is strange behavior for the classifier if hmmsearch with PHROGs returned hits. Are the PHROG hits also to unknown families? Can you share the fasta file?

@wen1112
Copy link
Author

wen1112 commented Jul 31, 2024

I have a mistake before, when I used hmmsearch for PHROGs, it return 5w results, also some of that is 'unknown function'. Here is part of my faa file, which include different situations. The faa file(4500 protein) in my test: 2431 could get result used hmmsearch for PHROGs, 500 could get from plm, the rest were not able to be annotated.
test.zip

@zachflam
Copy link
Contributor

I cannot download a zip file. But I am not sure what the issue would be. Can you share a screen shot of some of the proteins that are not being being predicted?

@wen1112
Copy link
Author

wen1112 commented Aug 2, 2024

Hi~, I saves it in this link: https://github.com/wen1112/data/blob/main/test.faa
And below is some part of plm can't predicted
>YP_007678033.1 hypothetical protein [Bacillus phage PM1]
MEMSQSIKNLAEAMSKFQAELEQPEKSADNPFFKSKYVPLPSVIAAIKKFGAPHGLSYMQ
MPVTNERGTGIQTIVMHSSGEYIKHDPFFLPMDKQTAQGAGSSITYSRRYSLSAAFGIDS
DPDDDGNEASGNNNNQNRNRNNNQQRNNNQQQNKNQQRNNSTKASEQLLTALSGFIDRMV
KEKKLTIDTVLDTLEKKKDPQSGKLIVGPFGRDIKNMTMQQASAAIGILKTILG
>YP_009217538.1 tail fiber assembly protein [Stenotrophomonas phage IME13]
MNAEQQTIASLKIRVFDLSEQLVATQQQAKEFSDALTKIVQIVGVTPADGEDSITLSSIV
EAVEALVPSQEVEVVEE
>MGV-GENOME-0282599_34 # 35804 # 37561 # -1 # ID=102389_34;partial=00;start_type=TTG;rbs_motif=GGA/GAG/AGG;rbs_spacer=11-12bp;gc_cont=0.324
MSDNYKDTKKDASNIKADSQIARANIENVAFSYTNDLNKWYSEKHKEKLSSRLETNVEQE
DTDGITTNVEKNEGVQTNVSETSDRNEDITSFKRTKFDNKEVIKTKIAETSNPVEDTQEY
NANAAVRNANYNNFINENTNNGSSHKSKIKTKANSKQQNTQKTINQEQDGNTDNQAKIQT
RINRTQKGSELVSKTIKTKKAANRVSKFVVRKGKKLQTASEGNIGEAFTFEIKDTSMRTA
GKTVEVSTRGIRQKVRTSTAKLVGRIVKSVFNVLAKLLKSLAALLAEASPVIVIGAILLI
LLSLFYAIAGGIAGSVSGIFGEYAESNTITSYVDYMNQIDSDLNTKVDWKAAFTVIHCLD
MDIKFDDAEQYILEEFNKADLYSDSCKPKDFTEWLNKNYSVVNTFYRKKGQTNSATSITN
EDLDLMKELYDSDKFMKLIDEKRKSSVNTGTISGSTGDGSSNGKLDYPTSYRTISAGYPN
YSSGKYHGGIDFPCPTGTKVCAAADGKVIAAKELNYSYGHYIIIDHGNGLTTLYAHNSKL
LVGVGDSVTKGQAIAYSGSTGNSTGPHCHFEVRVNGVRVNPENYL
>SRS016188_a1_ct10_vs1@Podoviridae__sp._cteAV1_13 # 10527 # 11621 # -1 # ID=123678_13;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.371
MAKLEKLFDVSTSSNIVDKGSTVPAWAGSRAKSNWWSGGASVDYFNSINGDKIYIQYGQN
QEKWASTRFMIESVHVEDEEEQSNGNIKVKGYVQLELLDGKLTDFAGAGVRVHRTISING
ETIDDWNGRTNEEYSKSNLKKVSFNETIEPQEKSKSTQMKIKTVYPDGEYSNSTIVLGIA
LKNPNPPKYIPMAIRISKDWQALDQNDYKPQKNKKDDKPSNPSHSGSSDKPSKEVNFYYP
FKDWPISRGWQANGHAGIDYAVNAGTPVKSTVDGTVIKSWFSNLGGGNEVQIWDGSQYTH
IFMHMNDRQVSTGQTVKKGQLIGHVGSTGNSTGPHLHWQVNKGKGYLYNHPDSIDPMILV
NKYT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants