-
Notifications
You must be signed in to change notification settings - Fork 7
Run
Example data is also included to quickly test whether the pipeline is functioning correctly. This data contains one case and one control sample, where each case has 4 cryptic variants per simulation category (simulations will be detailed on the wiki in the near future). Run this to set it up:
mintie -t
Note that if you set up MINTIE manually, you'll need to add the install directory to your path, and set the $MINTIE_HOME variable:
export MINTIE_HOME=<install dir>
export PATH=$MINTIE_HOME:$PATH
Now you can test MINTIE via:
mintie -w -p test_params.txt cases/*.fastq.gz controls/*.fastq.gz
MINTIE should run through all stages to completion, which should take <5 minutes for these data. The outputs will be under allvars-case
.
Make the following directories in the directory under which you will run MINTIE:
mkdir -p cases
mkdir -p controls
Either copy or symlink your desired fastq files into the respective directories for your case and control samples (note that you can rename files in your symlinks, in case you need to change file names to have a consistent mask).
Cases are your cancer samples in which you want to identify variants, and the controls are the samples you are testing against. Ideally, controls will be benign tissue of the same type as the tumour primary--however, as in blood-based cancers where normal of the same tissue type is difficult to acquire, remission samples, or tumours from other samples (ideally the same cancer type) can be used. More controls means more power, so aim for at least 5 but ideally more controls (10-15 is a good number).
NOTE: We don't recommend running MINTIE with a large number of controls (>20), as this can lead to significant overhead for the pipeline manager, as well as significant memory requirements and compute time when collating the data.
First set up your params.txt
file, you can copy this from the install directory (if you installed via conda, you can find the install directory by running mintie -h
), or from the test_params.txt
file from above.
Make sure your fastq files match the pattern found in your params.txt
file:
-p fastqCaseFormat=cases/%_R*.fastq.gz
-p fastqControlFormat=controls/%_R*.fastq.gz
The above is the default and indicates the directories of cases and controls, and that the fastq files are in <sample>_R*.fastq.gz
format, where the *
refers to the paired read number (i.e. R1 and R2).
MINTIE uses bpipe. Please see documentation for bpipe under http://docs.bpipe.org/ for more information.
Now you can run as follows:
mintie -w -p params.txt cases/*.fastq.gz controls/*.fastq.gz
Of if you want to use custom bpipe options:
export $MINTIEDIR=<install dir>
bpipe run @/params.txt [ <other bpipe options >] $MINTIEDIR/MINTIE.groovy cases/*.fastq.gz controls/*fastq.gz
Cases are run on a 1 vs. all specified controls basis. Several cases can be specified and they will be run in parallel. Be careful when running >5 simultaneous cases as bpipe might start throwing errors about too many open files.
Please consult the params.txt
file and adjust as needed before running.
NOTE: The following parameters are especially important. Please ensure that Kmer and read length parameters are set accordingly (check the FAQ for more information):
-
-p Ks=79,49
comma-separated kmer lengths for de novo assembly. This option affects SOAPdenovotrans and rnaSPAdes (but not Trinity). Please ensure that your read length is longer than ALL kmer lengths specified. -
-p min_read_length=50
minimum read length for trimming. NOTE: Please ensure this is greater than your minimum Kmer length. -
-p min_contig_len=100
minimum length required for assembled contig to be kept.
Other parameters are as follows:
-
-p threads=8
controls the number of threads used by any multi-code-compatible part of the pipeline. -
-p assembly_mem=128
controls the max memory allocated to Trinity and rnaSPAdes (this option does not affect SOAPdenovotrans assembly, which is the default). -
-p assembler=soap
may be set to 'soap' (default), 'spades' or 'trinity'. Soap is fastest but you may get better results with the other assemblers. See using other assemblers for more information. -
-p scores=33
Phred scores format for Trimmomatic (you can most likely leave this option as is). -
-p minQScore=20
minimum quality score for Trimmomatic's LEADING parameter. -
-p min_gap=7
smallest INDEL/ITD size considered. NOTE: lowering this will enable you to detect smaller variants, at the expense of more false positives. -
-p min_clip=20
smallest extended/novel exon/rearrangement considered. NOTE: lowering this will enable you to detect smaller variants, at the expense of more false positives. -
-p min_match=30,0.3
when aligning do novo assembled contigs to genome, keep only if they match by 30bp and at least 30% of the contig is aligned by default. -
-p min_logfc=5
minimum logFC cutoff for a novel contig associated equivalence class. -
-p min_cpm=0.1
minimum log CPM cutoff for equivalence class counts for the case sample. -
-p fdr=0.05
significance threshold for DE testing of equivalence classes. -
-p sort_ram=4G
memory limit for sorting BAM files. NOTE: this is a per-thread setting, so please ensure the system has at leastsort_ram * threads
memory. -
-p gene_filter=
optional. This is a file of gene names (one gene per line) of genes to keep in the final results file. -
-p var_filter=
optional. Comma-separated variants to keep in the final results file (leaving this blank keeps all variant types). Please see the VCF output info page under SVTYPE for variant abbreviations. An example setting be-p var_filter=DEL,INS,FUS,UN
if you are only interest in TSVs for instance. -
-p splice_motif_mismatch=0
number of mismatches allowed when checking canonical splice motifs. 0 = no mismatches allowed, 1 = allows a single mismatch (total) across both the donor and acceptor sites. 2 = allows two mismatches (one in both donor and acceptor), 3 = motifs are not checked but are returned for splices variants, 4 = motifs are not checked or returned. NOTE that motif checking does not distinguish between donor or acceptor when checking single splice sites (either motif is acceptable). -
-p assemblyFasta=
set to a fasta file if using your own custom assembly. NOTE that this option overrides using a pipeline-integrated assembler. -
-p run_de_step=true
whether to run the differential expression step. See Running MINTIE without controls.
You may find it convenient to copy the default parameters file, modify it, and then use bpipe run @<your param file>
upon execution.
Using bpipe's -u
parameter, it is possible to run MINTIE up to a certain stage only. This can be useful if you want to run the assembly step only, for example, and run the rest of the pipeline later. The stages are described in the MINTIE.groovy
file (each step looks like run_salmon = {...}
for example).
Running up to the assembly step only can be done as follows:
bpipe run -u create_salmon_index @$MINTIEDIR/params.txt $MINTIEDIR/MINTIE.groovy
MINTIE can be run without controls, for troubleshooting purposes, or if no controls are available (note that this will increase the false positive rate significantly). To do this, set the run_de_step
flag to 'false' in the params.txt
file:
-p run_de_step=false
Then run MINTIE as follows (note no controls are specified):
bpipe run @$MINTIEDIR/params.txt $MINTIEDIR/MINTIE.groovy $cases
The output should be mostly the same, however, some fields related to DE will be missing from the results file.