Program for processing SNV or indel calls from myTYPE with myeloma-specific annotations and filters.
Development on-going.
Takes as input a file containing indel or SNV calls in .csv or .tsv.gz format.
ID_VARIANT = Unique identifier of each variant
CHR = chromosome (int)
START = variant start position (int)
STOP = variant stop position (int)
GENE = Gene name (str)
TARGET_VAF = Variant allele frequency of variant (num)
EFFECT = predicted effect of variant, e.g. 'non_synonymous_codon'
BIDIR = 1 if reads supporting variant is found on both strands, 0 otherwise
FILTER = 'PASS' if variant has passed all previously applied variant caller filters.
COSMIC = Cosmic annotation
*_MAF = Frequency of mutation in EXAC populations and 1000 genomes database.
Data from the following datasets are incorporated as annotations for each variant:
- 16 internal normals sequenced by myTYPE
- MMRF CoMMpass IA9 (889 WES, matched normal, not manually curated)
- Bolli Leukemia 2017 (418 targeted sequencing, manually curated)
- Lohr Cell 2014 203 WES/WGS
- myTYPE: Database of previously manually annotated variants
- MFLAG_PANEL: Gene not in panel
- MFLAG_IGH: In IGH locus
- MFLAG_MAF: MAF > 3 % in exac/1000genomes
- MFLAG_MAFCOS: MAF > 0.1 % and not in COSMIC. For SNVs: Only EXACT/POS in COSMIC counts as match.
- MFLAG_NONPASS: NON-PASS IF not in COSMIC and not previously known in MM. For SNVs: Only EXACT/POS in COSMIC counts as match, and only missense mutations can be removed by this filter (i.e. 'non_synonymous_codon')
- MFLAG_NORM: Variant in 1 or more good normal control run by myTYPE
- MFLAG_VAF: Remove variants with target VAF < 1 %
- MFLAG_BIDIR: Remove variants BIDIR = 0 (only reads on one strand)
Variants with no MFLAGs are output into a file for 'good calls'. Other variants are output as 'bad calls'. A run summary report including flag statistics is created into the same folder.
--mode [snv|indel]: Set input variant type: snv or indel [required]
--outdir: Path to output directory. [required]
--infile: Path to input file with merged SNV calls in tsv.gz or
csv format [required]
--skiplines: Number of lines to skip at the top of input file when
importing. [default: 0]
--genes: Excel file with column 'GENES'. Used to filter out
variants in other genes. [required]
--genes_bed: Bed file containing panel regions, to filter out
outside calls. [required]
--igh: BED file with IGH locus to filter out overlapping
calls. [required]
--mmrf: Path to MMRF reference file, tab separated text
[required]
--bolli: Path to Bolli reference file, tab separated text
[required]
--lohr: Path to Lohr reference file - tab separated hg19
format. [required]
--normals: Path to good normal calls in tsv.gz format [required]
--mytype: Path to manually annotated myTYPE data in csv format
--version: Show the version and exit.
--help: Show this message and exit.
This package was created using Cookiecutter and the leukgen/cookiecutter-toil project template.