gnomAD v3.1.1 contains some minor corrections and changes to the v3.1 data release. The major annotations, including allele count, allele number, and allele frequency, as well as variant filtering status, remain unchanged for the entire callset and for all subsets of the callset (except the change noted below about switching to Null instead of 0).
We corrected:
- VEP annotations. There was an issue in gnomAD v3.1 in which some variants were missing VEP annotations. This release fixes the missing annotations in the Hail Tables and VCFs.
- The score chosen for in silico predictors. Some variants have multiple scores for one or more of the in-silico predictors, based on additional information provided by each in-silico predictor, such as trinucleotide context and transcript, that is not listed on the gnomAD browser. In the v3.1 release, the score annotated on the release Hail Table was not standardized. We decided in the v3.1.1 release to keep only the highest score for each in-silico predictor and add an additional
has_duplicate
annotation to the in-silico predictor structs in the release Hail Table for each score to indicate whether a variant had multiple scores. This is also flagged on the variant pages of the browser. - dbSNP rsIDs. The dbSNP resource we used to annotate rsIDs was not split for multi-allelic variants, so if a variant was multi-allelic in the dbSNP b154 VCF, this variant received no rsID annotation. This has been fixed in the release Hail Table and VCFs. When adding this fix we noticed that some variants can have multiple rsIDs associated with them, so we now store rsID as a set instead of a string in the release Hail Table.
- VCF headers. VEP version, dbSNP version, and age distributions information in the VCF header were changed to match VCF specifications.
We changed:
- The format of missing subset frequency info in the release Hail Table. Previously, if a variant was not observed in a subset, the frequency struct in the
freq
annotation was set to{AC: 0, AF: 0.0, AN: Null, homozygote_count: 0}
. For this release we modified this to store all entries in the struct as missing. - VCF release files to use underscore field separators instead of hyphens for the INFO fields. Hyphenated info keys are not allowed in version 4.3 of the VCF specification (item 8 of section 1.6.1), current gnomAD VCFs are on version 4.2.
We added:
- A global annotation to the release Hail Table
freq_sample_count
that includes the sample counts for each element of the allele frequencyfreq
array. - The AS_SB_TABLE field to the release VCFs, which was already present in the release Hail table.