Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate Infernal output with tRNAscan SE 2.0 #44

Open
Abieskawa opened this issue Jan 15, 2025 · 2 comments
Open

How to integrate Infernal output with tRNAscan SE 2.0 #44

Abieskawa opened this issue Jan 15, 2025 · 2 comments

Comments

@Abieskawa
Copy link

Abieskawa commented Jan 15, 2025

Hi,
I am doing eukaryotic genome annotation, and I have done protein structural and functional annotation with Braker3 and GO and KO (it will create other txt file so gff file was not touched). I saw your team recommend to use infernal as generic prediction, and for specific type like tRNA, it is better to use other tools tRNAscan-SE. My question are:

  1. How can I combine these two outputs (Infernal and tRNAscan-SE result)? I saw tRNAscan SE can be output as gff, which I think that can be easily integrate into braker3 gff result.
  2. Can I ask Infernal not to predict tRNA and use tRNAscan SE instead?
  3. Are there recommended tools that can integrate infernal/tRNAscan SE with protein predicted gene gtf/gff (in my case is braker3 result)

Sorry if this is stupid question.
Thanks~

@nawrockie
Copy link
Member

I agree that tRNAscan-SE is a better tool for tRNA annotation than Infernal.
This page has instructions on how to annotate genomes with Infernal and Rfam:
https://docs.rfam.org/en/latest/genome-annotation.html

If you want to ignore Infernal/Rfam predictions for tRNAs, the easiest way is probably to just remove the lines with 'RF00005' or 'RF01852' (these are the Rfam accessions for the tRNA and tRNA-Sec models) from the .tblout files output from cmscan with a command like:
cat my.tblout | grep -v RF00005 | grep -v RF01852 > new.tblout

There's also a video tutorial that may be useful:
https://www.youtube.com/watch?v=NU63fazDZHs

You can convert infernal .tblout files to gff with this script:
https://github.com/nawrockie/jiffy-infernal-hmmer-scripts/blob/master/infernal-tblout2gff.pl

@Abieskawa
Copy link
Author

Abieskawa commented Jan 18, 2025

Thanks for your kindly reply. But I have two more questions.
I had transformed the tblout this morning, and I found that the gff output might require modification.
I transformed the file with the following command line:
infernal-tblout2gff.pl --cmscan --fmt2 --source infernal beltfish_ncRNA_wo_tRNA.tblout > beltfish_ncRNA_wo_tRNA.gff
The below is the the output
Image
I suppose the output should have ID=blablabla in the last column. and also a label tag of gff3 version. Like this from tRNAscan-SE output

Image

And I wonder if the Parent tag is required or not in other RNA entries of infernal output gff... I am not familiar with the RNA field

Additionally, the classification of Rfam cannot match gff3 type format, for example, U1 described in first picture, is actually a kind of snRNA, and when I combine result from braker3, tRNAscan-SE, and infernal transformed output, AGAT warns me about that and remove the entries.

when I look the format from gff3 file of human reference

NC_000001.11	cmsearch	gene	6752723	6752821	.	+	.	ID=gene-LOC124904847;Dbxref=GeneID:124904847;Name=LOC124904847;description=small nucleolar RNA U13;gbkey=Gene;gene=LOC124904847;gene_biotype=snoRNA
NC_000001.11	cmsearch	snoRNA	6752723	6752821	.	+	.	ID=rna-XR_007067439.1;Parent=gene-LOC124904847;Dbxref=GeneID:124904847,RFAM:RF01210,GenBank:XR_007067439.1;Name=XR_007067439.1;gbkey=ncRNA;gene=LOC124904847;inference=COORDINATES: profile:INFERNAL:1.1.1;product=small nucleolar RNA U13;transcript_id=XR_007067439.1
NC_000001.11	cmsearch	exon	6752723	6752821	.	+	.	ID=exon-XR_007067439.1-1;Parent=rna-XR_007067439.1;Dbxref=GeneID:124904847,RFAM:RF01210,GenBank:XR_007067439.1;gbkey=ncRNA;gene=LOC124904847;inference=COORDINATES: profile:INFERNAL:1.1.1;product=small nucleolar RNA U13;transcript_id=XR_007067439.1

It has created hierarchy of exon, which seems not to be the original output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants