NCLcomparator, a comprehensive tool for analyzing non-co-linear (NCL) transcripts (fusion, trans-splicing, and circular RNA), enables to combine several NCL results from different detection tools, and to provide the characteristics of NCL events.
The NCLcomparator program, document, and test set can be downloaded from our FTP site: ftp://treeslab1.genomics.sinica.edu.tw/NCLcomparator or GitHub: https://github.com/TreesLab/NCLcomparator.
NCLcomparator runs under Linux-like environment (i.e. Bio-Linux, also see http://environmentalomics.org/bio-linux/) with at least 30 GB RAM.
$ tar zxvf NCLcomparator.tar.gz
$ cd NCLcomparator
$ chmod +x NCLcomparator.sh
$ chmod +x bin/*
(1) bedtools (http://bedtools.readthedocs.io/en/latest/) (2) STAR (https://github.com/alexdobin/STAR) (3) RSEM (https://github.com/deweylab/RSEM) (4) R (https://www.r-project.org/)
Get latest bedtools source from releases and install it
$ wget https://github.com/arq5x/bedtools2/releases/download/v2.25.0/bedtools2.25.0.tar.gz
$ tar -zxvf bedtools-2.25.0.tar.gz
$ cd bedtools2
$ make
$ cp ./bin/ /usr/local/bin
Get latest STAR source from releases and install it
$ wget https://github.com/alexdobin/STAR/archive/2.5.3a.tar.gz
$ tar -xzf 2.5.3a.tar.gz
$ cd STAR-2.5.3a
$ cp bin/Linux_x86_64_static/STAR /usr/local/bim
Get latest RSEM source code and install it
$ wget https://github.com/deweylab/RSEM/archive/v1.3.0.tar.gz
$ tar -xzf v1.3.0.tar.gz
$ cd RSEM-1.3.0
$ make
$ make ebseq
Get R in Ubuntu environment
$ sudo apt-get update
$ sudo apt-get install r-base
(1) Genome and its annotation, which can be download from the GENCODE website (http://www.gencodegenes.org/) or ensembl FTP (http://www.ensembl.org/info/data/ftp/index.html). Given Human as an example, go to ensembl FTP (http://www.ensembl.org/info/data/ftp/index.html) to download human genome and annotation.
$ wget ftp://ftp.ensembl.org/pub/release-85/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
$ wget ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.gtf.gz
$ gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
$ gunzip Homo_sapiens.GRCh38.87.gtf.gz
(2) (Optional) Synonymous Constraint elements (SCE), which can be download from (http://compbio.mit.edu/SCE/)
(3) Building STAR and RSEM index
The following steps are to generate the index files before running NCLcomparator tool.
$ mkdir STAR_RSEM_index
$ cd STAR_RSEM_index
$ rsem-prepare-reference --gtf /path/to/Homo_sapiens.GRCh38.87.gtf --star -p 10 /path/to/Homo_sapiens.GRCh38.dna.primary_assembly.fa RSEM
The STAR and RSEM index of the genome hg38 and the annotation ensemble 87 is prepared and can be downloaded in our FTP (ftp://treeslab1.genomics.sinica.edu.tw/NCLcomparator/STAR_RSEM_index.tar.gz).
Usage:
$ ./NCLcomparator.sh -gtf [annotation GTF file] -thread [number of thread] -read1 [fastq_1.gz] -read2 [fastq_2.gz] -index [STAR_RSEM index folder] -intra [circular result folder] -inter [fusion result folder] -sce [SCE bed file]
An example:
$ ./NCLcomparator.sh -gtf Homo_sapiens.GRCh38.87.gtf -intra /path/to/intra -inter /path/to/inter -sce SCE_hg38.bed -read1 HeLa_1.fastq.gz -read2 HeLa_2.fastq.gz -index /path/to/STAR_RSEM_index
The basic options to run a job as follow:
- -intra /path/to/NCL-intra result folder
- -inter (optional) /path/to/NCL-inter result folder
- -sce (optional) /path/to/SCE.bed
The output file of running HeLa rRNA depleted RNA-seq data are provided in our FTP (ftp://treeslab1.genomics.sinica.edu.tw/NCLcomparator/HeLa_output.tar.gz)
The input format of NCL detection results are required to modify as 5-col format, which includes the positons of donor/acceptor sides and the number of supporting NCL-junction reads. Intra-NCL modified 5-col format results are gathered into a folder, and inter-NCL modified 5-col format results are gathered into another folder. An example of NCL detectors' result on HeLa rRNA depleted RNA-seq data are provided at our FTP (ftp://treeslab1.genomics.sinica.edu.tw/NCLcomparator/HeLa_runNCLdetectors_results.tar.gz)
Given a 5-col result, CIRCexplorer2.5col (tab-delimited text file), as an example,
Chromosome | Coordination | Chromosome | Coordination | Total number of supporting NCL-junction reads |
---|---|---|---|---|
chr1 | 945176 | chr1 | 945517 | 1 |
chr1 | 955922 | chr1 | 956013 | 1 |
chr1 | 955922 | chr1 | 957273 | 5 |
After executing NCLcomparator program, two merged circular RNA and fusion RNA tools' results (intraMerged.result and interMerged.result) accompanied by its graphic report (intra.pdf and inter.pdf) are generated, and two newly folder is created (comparison and STAR_RSEM_out). The genomic positions of NCL events (within 5 bp of franking exon boundaries) in each NCL detection tool are adjusticed to the exact exonic boundaries. The other two files (intraMerged_characteristic.result and interMerged_characteristic.result) are produced.
In STAR_RSEM_out folder, several output files are generated by running STAR and RSEM programs, these files are given as the input files of characteristic process in NCLcomaprator. In comparison folder, the two folders (intra and inter) are generated, where NCL events of the positions in each tool with <= 5 bp of franking exon boundaries are adjusted to the exact positions of exon boundaries.
The column formats of outputs are described as follow:
No. of columun | Description |
---|---|
(1) | Chromosome name of the donor side (5' ss) |
(2) | Junction coordination of the donor side |
(3) | Strand of the donor side |
(4) | Chromosome name of the acceptor side (3' ss) |
(5) | Junction coordinate of the acceptor side |
(6) | Strand of the acceptor side |
(7) | Gene name |
(8) | Circular tool No.1 (Yes: number of supporting intragenic NCL-junction reads; No: 0) |
(9) | Circular tool No.2 (Yes: number of supporting intragenic NCL-junction reads; No: 0) |
(10) | ... |
No. of columun | Description |
---|---|
(1) | Chromosome name of the donor side (5' ss) |
(2) | Junction coordination of the donor side |
(3) | Strand of the donor side |
(4) | Chromosome name of the acceptor side (3' ss) |
(5) | Junction coordinate of the acceptor side |
(6) | Strand of the acceptor side |
(7) | Gene name of the donor side |
(8) | Gene nqamae of the acceptor side |
(9) | Fusion tool No.1 (Yes: number of supporting intergenic NCL-junction reads; No: 0) |
(10) | Fusion tool No.2 (Yes: number of supporting intergenic NCL-junction reads; No: 0) |
(11) | ... |
The other two ouputs, intraMerged_characteristic.result and interMerged_characteristic.result, provide the useful features of NCL events.
No. of column | Description |
---|---|
(1) | Chromosome name of the donor side (5' ss) |
(2) | Junction coordination of the donor site |
(3) | Strand of the donor site |
(4) | Chromosome name of the acceptor site (3' ss) |
(5) | Junction coordinate of the acceptor site |
(6) | Strand of the acceptor site |
(7) | Gene name |
(8) | Total number of exons in the gene |
(9) | TPM of the gene |
(10) | FPKM of the gene |
(11) | Number of reads spanning the co-linearly spliced juncations at NCL donor site |
(12) | Number of reads spanning the co-linearly splice junctions at NCL acceptor site |
(13) | Usage of the co-linear junctions at NCL donor splice site (P_D) |
(14) | Usage of the co-linear junctions at NCL acceptor spliced site (P_A) |
(15) | Median frequency of occurrance of all well-annotated splice sites (co-linear) in the host gene (P_median) |
(16) | Out of circle (Yes: 1; No: 0) |
(17) | Circular tool No.1 : number of supporting NCL-junction reads |
(18) | Circular tool No.1: RPM based on total RNA-seq reads |
(19) | Circular tool No.1: RPM based on uniquely mapping reads |
(20) | Circular tool No.1 : circular fraction (CF) |
(21) | Circular tool No.1: non-co-linear ratio (R_NCL) |
(22) | Circular tool No.2 : number of supporting NCL-junction reads |
(23) | Circular tool No.2: RPM based on total RNA-seq reads |
(24) | Circular tool No.2: RPM based on uniquely mapping reads |
(25) | Circular tool No.2 : circular fraction (CF) |
(26) | Circular tool No.2: non-co-linear ratio (R_NCL) |
... | ... |
(Optional) | The donor size within SCE (1) or outside SCE (0) |
(Optional) | The acceptor size within SCE (1) or outside SCE (0) |
No. of column | Description |
---|---|
(1) | Chromosome name of the donor side (5' ss) |
(2) | Junction coordination of the donor side |
(3) | Strand of the donor side |
(4) | Chromosome name of the acceptor side (3' ss) |
(5) | Junction coordinate of the acceptor side |
(6) | Strand of the acceptor side |
(7) | Gene name of the donor side |
(8) | Gene name of the acceptor side |
(9) | TPM of the donor gene |
(10) | FPKM of the donor gene |
(11) | TPM of the acceptor gene |
(12) | FPKM of the acceptor gene |
(13) | Number of reads spanning the co-linearly spliced junctions at NCL donor site |
(14) | Number of reads spanning the co-linearly spliced junctions at NCL acceptor site |
(15) | Usage of co-linear juncations at NCL donor splice site in the donor gene (P_D) |
(16) | Usage of co-linear junctions at NCL acceptor splice site in the acceptor gene (P_A) |
(17) | Median frequency of occurrence of well-annotated splice sites (co-linear) in the donor gene (P_median_D) |
(18) | Median frequency of occurrence of well-annoataed splice sites (co-linear) in the acceptor gene (P_median_A) |
(19) | Fusion tool No.1 : number of supporting intergenic NCL-junction reads |
(20) | Fusion tool No.1 : RPM based on raw reads |
(21) | Fusion tool No.1 : RPM based on uniquely mapped reads |
(22) | Fusion tool No.2 : number of supporting intergenic NCL-junction reads |
(23) | Fusion tool No.2 : RPM based on raw reads |
(24) | Fusion tool No.2 : RPM based on uniquely mapped reads |
... | ... |
(Optional) | The donor side within SCE (1) or outside SCE (0) |
(Optional) | The acceptor size within SCE (1) or outside SCE (0) |
In comparison folder, the intra or inter folder are created for adjusting the positions of NCL detection tools to the positions of exon boundaries.
No. of column | Description |
---|---|
(1) | Chromsome name of the donor side (5'ss) |
(2) | Exonic junction coordination of the donor side |
(3) | Strand of the donor side |
(4) | Chromsome name of the acceptor side (3'ss) |
(5) | Exonic junction coordination of the acceptor side |
(6) | Strand of the acceptor side |
(7) | Total number of supporting NCL-junction reads |
(8) | Gene name of the donor side |
(9) | Gene name of the acceptor side |