The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Pipeline summary to README
- Update to nf-core/tools v2.9
Initial release of nf-core/viralintegration, created with the nf-core template. (@alyssa-ab) (@Emiller88)
This pipeline is a re-implementation of CTAT-VirusIntegrationFinder v1.5.0.
- Input Check
- Input path to sample FASTAs in samplesheet.csv
- Check that sample meets requirements (samplesheet_check)
- Read QC (FastQC)
- Align reads to human genome
- Generate index and perform alignment (STAR)
- Quality trimming for unaligned reads
- Quality and adaptor trimming (Trimmomatic)
- Remove polyAs from reads (PolyAStripper)
- Identify chimeric reads
- Combine human and virus FASTAs (cat_fasta)
- Generate index and perform alignment to combined human + viral reference (STAR)
- Sort and index alignments (SAMtools)
- Determine potential insertion site candidates and optimize file (insertion_site_candidates, abridged_TSV)
- Virus Report outputs:
- Viral read counts in a tsv table and png plot
- Preliminary genome wide abundance plot
- Bam and bai for reads detected in potential viral insertion site
- Web based interactive genome viewer for virus infection evidence (VirusDetect.igvjs.html)
- Verify chimeric reads
- Create chimeric FASTA and GTF extracts (extract_chimeric_genomic_targets)
- Generate index and perform alignment to verify chimeric reads (STAR)
- Sort and index validated alignments (SAMtools)
- Remove duplicate alignments (remove_duplicates)
- Generate evidence counts for chimeric reads (chimeric_contig_evidence_analyzer)
- Summary Report outputs:
- Refined genome wide abundance plog png
- Insertion site candidates in tab-delimited format with gene annotations (vif.refined.wRefGeneAnnots.tsv)
- Web based interactive genome viewer for virus insertion sites (vif.html)
- Present quality checking and visualization for raw reads, adaptor trimming, and STAR alignments (MultiQC)