Skip to content

6. Examples

MikiSchikora edited this page Feb 13, 2024 · 30 revisions

Running perSVade modules (perSVade <module> <args>) (recommended)

The examples below may be run sequentially to get all the possible outputs of perSVade, starting from a set of reads called reads1.fastq.gz and reads2.fastq.gz, a reference genome called genome.fasta and an annotation file called annotations.gff. All the results will be stored into ./output You could also run these commands independently. Note that the READS SHOULD BE COMPRESSED in .gz format to avoid some unexpected errors. Future releases of perSVade will allow to work on uncompressed files as well.

Note that if you want to use the docker or singularity installation you need to do it as in the section Quick start. In addition, the section FAQs is useful to understand why some arguments are specified. You can check all the available options of each module with perSVade <module> --help.

  • Read trimming and quality control (note that you should check the output of fastqc on the trimmed reads before doing anything else):

./scripts/perSVade trim_reads_and_QC -f1 reads1.fastq.gz -f2 reads2.fastq.gz -o output/trimmed_reads

  • Read alignment (used in several modules):

./scripts/perSVade align_reads -f1 output/trimmed_reads/trimmed_reads1.fastq.gz -f2 output/trimmed_reads/trimmed_reads2.fastq.gz --ref genome.fasta -o output/aligned_reads

  • Inference of repeats (used in several modules):

./scripts/perSVade infer_repeats --ref genome.fasta -o output/repeat_inference

  • Define regions with SVs to perform relistic simulations in the parameter optimization step. Use one of the steps below (optional):

    • Define pairs of homologous regions in the reference genome:

    ./scripts/perSVade find_homologous_regions --ref genome.fasta -o output/find_hom_regions

    • Define regions with perSVade-inferred (known) SVs, providing a list of short read datasets called close_shortReads_table.tab:

    ./scripts/perSVade find_knownSVs_regions -o output/find_known_SVs --ref genome.fasta --mitochondrial_chromosome chrM --SVcalling_parameters default --repeats_file skip --close_shortReads_table ./close_shortReads_table.tab

  • Run parameter optimization (simulating in random regions) to find the most adequate filtering parameters for SV calling:

./scripts/perSVade optimize_parameters --ref genome.fasta -o output/parameter_optimization -sbam output/aligned_reads/aligned_reads.bam.sorted --mitochondrial_chromosome chrM --repeats_file output/repeat_inference/combined_repeats.tab --regions_SVsimulations random --simulation_ploidies haploid

  • Run SV calling with the optimized parameters:

./scripts/perSVade call_SVs --ref genome.fasta -o output/call_SVs -sbam output/aligned_reads/aligned_reads.bam.sorted --mitochondrial_chromosome chrM --SVcalling_parameters output/parameter_optimization/optimized_parameters.json --repeats_file output/repeat_inference/combined_repeats.tab

  • Run coverage-based CNV calling (using a window size of 500bp and assuming a haploid organism (-p 1)):

./scripts/perSVade call_CNVs --ref genome.fasta -o output/call_CNVs -sbam output/aligned_reads/aligned_reads.bam.sorted --mitochondrial_chromosome chrM -p 1 --cnv_calling_algs HMMcopy,AneuFinder --window_size_CNVcalling 500

  • Integrate the SV and CNV calls into a single .vcf file:

./scripts/perSVade integrate_SV_CNV_calls -o output/integrated_SV_CNV_calls --ref genome.fasta --mitochondrial_chromosome chrM -p 1 -sbam output/aligned_reads/aligned_reads.bam.sorted --outdir_callSVs output/call_SVs --outdir_callCNVs output/call_CNVs --repeats_file skip

  • Annotate the functional impact of the variants generated by integrate_SV_CNV_calls (assuming a standard gDNA genetic code and a yeast mitochondrial code):

./scripts/perSVade annotate_SVs -o output/annotate_SVs --ref genome.fasta --mitochondrial_chromosome chrM -gff annotations.gff -mcode 3 -gcode 1 --SV_CNV_vcf output/integrated_SV_CNV_calls/SV_and_CNV_variant_calling.vcf

  • Call SNPs and IN/DELs using three algorithms:

./scripts/perSVade call_small_variants -o output/small_vars --ref genome.fasta -p 1 -sbam output/aligned_reads/aligned_reads.bam.sorted --repeats_file output/repeat_inference/combined_repeats.tab --callers bcftools,freebayes,HaplotypeCaller --min_AF 0.9 --min_coverage 20

  • Annotate the functional impact of the variants generated by call_small_variants (assuming a standard gDNA genetic code and a yeast mitochondrial code):

./scripts/perSVade annotate_small_vars -o output/annotate_small_vars --ref genome.fasta --mitochondrial_chromosome chrM -gff annotations.gff -mcode 3 -gcode 1 --merged_vcf output/small_vars/merged_vcfs_allVars_ploidy1.vcf

  • Calculate coverage per gene:

./scripts/perSVade get_cov_genes -o output/get_cov_genes --ref genome.fasta -gff annotations.gff -sbam output/aligned_reads/aligned_reads.bam.sorted



General optional arguments

There are some optional arguments that can be provided to several modules (run perSVade <module> -h for more information):

  • --verbose: Runs in a more verbose mode, useful for debugging. Available in all modules.
  • --replace: Removes the output directory of each module, which ensures that all the steps are done again when re-running. Available in all modules.
  • fraction_available_mem and fractionRAM_to_dedicate: Modulate the RAM allocated to perSVade. Available in all modules.
  • --threads: Specify the number of threads. Available in all modules.
  • --min_chromosome_len: The minimum length to consider chromosomes from the provided fasta for calculating the window length (used in may steps of perSVade to parallelize across fractions of the genome). Available in modules using a reference genome.


Running one-liner perSVade (perSVade.py <args>) (deprecated from version 1.02.3 on)

You can also use the script scripts/perSVade.py to execute multiple modules of perSVade with a single command. However, this script is less user friendly and error prone, and it will be no longer maintained after version 1.02.3. This file contains examples of how to run the one-liner version.