Merging CNVs with `truvari bench` vs `truvari collapse` #247

Yonatan-Ariel-Wolberg · 2024-12-17T14:16:11Z

Yonatan-Ariel-Wolberg
Dec 17, 2024

Hi Adam, I've been trying to find out if it is possible to merge CNVs using Truvari if I do not have a truth set. In the Truvari paper, I saw that you first used truvari bench to merge CNVs by more liberal parameters, and then you used truvari collapse to refine the merging to make it more specific.

I am a PhD student trying to create a Nextflow workflow that integrates three exome sequencing-based CNV callers (CANOES, CLAMMS and XHMM) using a SV merging tool, then annotating the output with features used to train a random forest classifier.

Each of these three CNV callers has a different output file format with different fields, but I have written scripts to convert their outputs to VCF format, such that I can use them as inputs for merging tools, like Truvari.

Unfortunately, the CNV callers used do not provide genotype or strand information and as the tools all use the read depth approach to call CNVs from exome sequencing data, the boundaries are fuzzy.

I would like to know if I should first merge the CNVs intra-sample using truvari bench, and then merge between samples, while retaining the association of specific samples to particular CNVs, using truvari collapse.

Kind regards

Yoni Wolberg

ACEnglish · 2024-12-17T15:17:38Z

ACEnglish
Dec 17, 2024
Maintainer

Hello,

In the paper, we first used truvari bench to establish how similar SVs were in order to 'calibrate' the thresholds for merging via truvari collapse. Benchmarking and Merging are conceptually the same operations, just with different outputs/reporting.

For your project, your final output will be a single, merged VCF and therefore you'll be focused on truvari collapsed to produce that file. As of truvari v4.2 there is a truvari collapse --intra parameter which is helpful for intra-sample merging. This command will take the multiple 'SAMPLE' columns and consolidate them into a single column. For example

# bcftools merge of two SVs from two VCFs, same sample
chr1  1324  <DEL> ...  GT:CN  0/1:0.5
chr1  1420  <DEL> ...  GT:CN  1/1:0

# truvari collapse
chr1  1324  <DEL> ...  GT:CN  0/1:0.5  1/1:0

# truvari collapse --intra
chr1  1324  <DEL> ...  GT:CN  0/1:0.5

However, since you don't have typical GT fields you may not find that functionality very helpful and may want to build your own script to consolidate the SAMPLE columns for intra-sample merging (e.g. special handling of the copy-number (CN) field since --intra just picks information from the first non present SAMPLE).

Once you've created the final set of CNVs per-sample, you could use repeat the bcftools merge, truvari collapse procedure to perform inter-sample merging.

An extra benefit of a two step intra/inter merge is you can have more control over the matching thresholds at each step. One could imagine using looser thresholds for intra-sample than what's used for inter-sample if we assume that calls from the same sample that overlap are more likely to be from the same event whereas that same amount of overlap between samples could be from different events.

Have a great day,
~/Adam English

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging CNVs with `truvari bench` vs `truvari collapse` #247

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Merging CNVs with truvari bench vs truvari collapse #247

Yonatan-Ariel-Wolberg Dec 17, 2024

Replies: 1 comment

ACEnglish Dec 17, 2024 Maintainer

Merging CNVs with `truvari bench` vs `truvari collapse` #247

Yonatan-Ariel-Wolberg
Dec 17, 2024

ACEnglish
Dec 17, 2024
Maintainer