ONT Basecall and QC:
- Guppy v6 SUP was used to basecall the data with default parameters.
- Nanoplot v.1.40.0 and MultiQC v1.13 were run using default parameters.
Long-Read Splicing Analysis:
- FLAIR v2.0 modules were used to process long-read sequencing data in two modes: LR-only and SR-supported.
- Run Flair align, correct, and collapse:
- Run Flair quantify:
Short-Read Splicing Junctions (SJs):
- intronProspector was used to generate SJs from short-read data using the following bash script:
- intronProspector was used to generate SJs from short-read data using the following bash script:
Data QC and Filtering:
- SQANTI3 v5.1.2 was used to QC the long-read data models obtained by FLAIR.
- Filter for Known (Gencode v43 annotated Genes) using SQANTI categories with the following script:
Short-read sequencing data was processed using the GRAPE-NF (commit 78fd890) pipeline.
To process short-read sequencing data with GRAPE-NF, use the following command:
nextflow -bg run grape-nf -r 78fd890 --rsemSkipCi --index /nfs/users/rg/scarbonell/TFM/short_reads/metadata.tsv --genome /nfs/users/rg/projects/references/Genome/H.sapiens/GRCh38/GRCh38.p13.primary_assembly.genome.fa.gz --annotation /nfs/users/rg/projects/references/Annotation/H.sapiens/gencode43/gencode.v43.primary_assembly.annotation.gtf.gz --rg-platform ILLUMINA --rg-center-name CRG -resume -c /software/rg/grape/config/rg.singularity.dsl2.config > pipeline.log
FLAIR Modules for Splicing Events and Isoform Usage:
FLAIR modules were employed to analyze splicing events and isoform usage.
Fisher Test for Splicing Events:
To conduct Fisher tests on splicing events:
In addition to the abovementioned tools, R scripts were utilized for further analysis and to generate data visualisations. These R scripts were crucial in providing insights and facilitating data interpretation.
Please refer to the specific R script files for detailed information on the analyses conducted and the visualizations produced.
Long-Read Data Visualization:
For long-read sequencing data, the FLAIR align module was run independently for each sample to obtain separate BAM files for data visualization.
Short-Read Data Visualization:
- Separate BAM files from short-read sequencing data were obtained directly from Grape-nf output.
BAM File Merge:
- Samtools was used to merge sample replicate BAM files from different compartments prior to data visualization.
Splicing Events and Isoform Usage Visualization:
USCS browser and IGV were used to inspect splicing events and isoform usage. The exact coordinates for plotting were extracted and utilized to run the sushi function on ggsashimi.
- Purpose: Build scaled and centered PCAs for LRonly and SRsupported data.
- Purpose: Perform different isoform usage analysis.
- Purpose: Analyze splicing events (IR, ES, Alt3, and Alt5).
- Purpose: Prepare upset plots of isoform counts and GENCODE for LRonly and SRsupported data.
- Purpose: Generate regression plots for LRonly and SRsupported data.
- Purpose: Bash script to run FLAIR along, correct, and quantify modules from independent long-read fastq by simultaneously processing all samples and building a unique set of transcript models.
- Purpose: Bash script to run FLAIR quantify module, which performs isoform quantifications matrix from transcript models file.
- Purpose: From BAM/SAM generates a list of splicing junctions (SJs).
- Purpose: Bash script to run SQANTI3 (example for short-read supported data).
- Purpose: Filter known genes using SQANTI categories.
- Purpose: Bash script to run FLAIR diffSplice - module 6 (example for short-read supported data).
- Purpose: Bash script to run FLAIR different isoform usage tool (example for known genes filtered data).
- Purpose: Compute Fisher test on splicing events.
- Purpose: From long-read fastq, generates single BAMs for each sample using FLAIR align module 1.
- Purpose: Uses sushi function to generate ggsashimi plots.