You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🏋️♂️ Somatic SNV Call Benchmarking Methods and Results
To demonstrate the efficacy and confidence of our caller choice and and consensus calling method, we benchmarked our workflow against a published benchmark dataset.
The dataset was generated in the publication Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing.
To summarize their work:
Their starting data consists of 21 Replicates distributed among 5 sequencing centers aligned with 3 different aligners (BWA, Bowtie2, NovoAlign). All the aligned BAMs were called with 6 callers (Mutect2, SomaticSniper, VarDict, MuSE, Strelka2, TNscope) resulting into 63 VCFs in total. All the calls were merged together to create a consensus VCF. This consensus file was fed into a deep learning classifier (SomaticSeq, NeuSomatic) that was trained against truth VCFs. These truth VCFs were produced by spiking mutations randomly in the reads. This ML model classify these calls into 4 confidence levels, in which we report the average number of PASS calls given by the classifier in each VCF, as well as REJECT:
High Confidence: 60.5 PASS, 0.15 REJECT and high VAF.
Medium Confidence: 27.4 PASS, 0.80 REJECT and low VAF not captured in many replicates.
Low Confidence: 8.2 PASS, 1.7 REJECT and very low VAF, cannot distinguish from noise.
Unclassified: 1.8 PASS and 3.8 REJECT and likely false positive
A helpful YouTube video also exists to explain their methodology.
Comparison to "Gold Standard" Dataset
BAM files relevant to our workflow (BWA-aligned) were called using our standard somatic workflow followed up by our consensus caller. Synthetic BAM files did need to have read groups corrected. Originally, their SM identifier matched the normal sample used as the starting point for the spike-in experiment. We updated the SM field to match the synthetic BAM filename (e.g. syn_<normal_sample_name>). The gold standard VCFs provided by the authors had all samples aggregated. To make 1:1 comparisons, we did the following:
To add further granularity and gauge individual caller quality/contribution to the consensus call method, we also compared results at a caller level, per sample: