Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why we use MIN_DP for DP and then AD of the ref allele at Hom-Ref block derived genotypes. #7185

Open
vruano opened this issue Apr 7, 2021 · 5 comments

Comments

@vruano
Copy link
Contributor

vruano commented Apr 7, 2021

Question

Why we use MIN_DP over DP for synthetic Ref allele depth for genotypes derived from hom-ref blocks?
Would it make more sense to keep and use the average or median?

## GenotypeGVCFsEngine.java:176 (about)
...
if (result.isPolymorphicInSamples()) {
            // For polymorphic sites we need to make sure e.g. the SB tag is sent to the annotation engine and then removed later.
            final VariantContext reannotated = annotationEngine.annotateContext(result, features, ref, null, a -> true);
              return new VariantContextBuilder(reannotated).genotypes(
==!==>                   cleanupGenotypeAnnotations(reannotated, false)).make();
        } else if (includeNonVariants) {
...

## Same file ln 436, method cleanupGenotypeAnnotations:
...
// move the MIN_DP to DP
            if ( oldGT.hasExtendedAttribute(GATKVCFConstants.MIN_DP_FORMAT_KEY) ) {
                depth = parseInt(oldGT.getAnyAttribute(GATKVCFConstants.MIN_DP_FORMAT_KEY));
                builder.DP(depth);
                attrs.remove(GATKVCFConstants.MIN_DP_FORMAT_KEY);
            }
...

Tool(s) or class(es) involved

GenotypeGVCFs

@vruano vruano changed the title Why we use MIN_DP Why we use MIN_DP for DP and then AD of the ref allele at Hom-Ref block derived genotypes. Apr 7, 2021
@ldgauthier
Copy link
Contributor

A user brought this up recently, and I think it's a good point. I wouldn't want to change the default behavior, but I would be open to adding a new argument to GenotypeGVCFs to use median instead of min. What do you think @droazen ? Or maybe this is a question for @vdauwera and @eitanbanks as GATK product owners.

@droazen
Copy link
Contributor

droazen commented Apr 21, 2021

@ldgauthier defer to you on this, but agree that it seems confusing/misleading, especially in the case of large ref blocks with highly variable depth

@vdauwera
Copy link
Contributor

Agree with adding an option to use median -- we can change the default behavior in GATK 5 ;)

@ldgauthier
Copy link
Contributor

@vruano can you take on adding the arg and alternate behavior? Should be pretty quick.

@ldgauthier ldgauthier assigned vruano and unassigned ldgauthier May 11, 2021
@vruano
Copy link
Contributor Author

vruano commented May 24, 2021

Ok.

vruano added a commit to vruano/gatk that referenced this issue Feb 6, 2023
Adds an argument to GenotypeGVCFs to give the option to preserve the original DP,
the median across the block,  instead of substituing it by the minimum (MIN_DP format)
if present for genotypes that come from HomRef Blocks:

 --keep-median-dp-in-homref-blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants