Today, we are thrilled to announce the release of genome-wide structural variants (SVs) for 63,046 unrelated samples with genome sequencing…
Releases
News
The news page highlights new features, versions, or other major announcements. See our changelog for all changes to gnomAD, including minor ones.
Rare coding CNVs from exome sequenced individuals in gnomAD v4
As a part of gnomAD V4, we are excited to include our first gnomAD release of rare (<1% overall site frequency) autosomal coding copy number variants (CNVs) from exome-sequencing (ES) in 464,297 individuals. These data are available to explore in the user-friendly gnomAD browser (https://gnomad.broadinstitute.org/), while the complete annotated rare CNV callset can be downloaded directly from the downloads page.
Variant Co-occurrence Counts by Gene in gnomAD
Today we are pleased to announce the incorporation of cumulative counts of gnomAD individuals carrying pairs of rare co-occurring variants within genes in the gnomAD v2 browser, across various allele frequencies and functional consequences. These counts can be used to evaluate how frequently rare variant co-occurrence is observed in a large reference population. We envision that this data will aid the medical genetics community in interpreting the clinical significance of rare co-occurring variants found in patients, in the context of autosomal recessive disease. This feature builds off of our variant co-occurrence (inferred phasing) work (see “Variant Co-Occurrence (Phasing) Information in gnomAD”).
Variant Co-Occurrence (Phasing) Information in gnomAD
Today, we are pleased to announce the incorporation of variant co-occurrence (inferred phasing) information in the gnomAD v2 browser. Phase refers to the genetic relationship between a pair of variants; that is, whether the variants are on the same copy of the gene (cis) or on different copies of the gene (trans). We are releasing inferred phasing data for all pairs of variants within a gene where both variants have a global allele frequency in gnomAD exomes <5% and are either coding, flanking intronic (from position -1 to -3 in acceptor sites, and +1 to +8 in donor sites) or in the 5’/3’ UTRs. This encompasses 20,921,100 pairs of variants across 19,685 genes. We envision that this data will be of tremendous help to the medical genetics community in identifying and interpreting co-occurring variants in the context of recessive conditions.
gnomAD v3.1 Mitochondrial DNA Variants
Overview
Mitochondrial DNA (mtDNA) variants for gnomAD are now available for the first time! We have called mtDNA variants for 56,434 whole genome samples in the v3.1 release. This initial release includes population frequencies for 10,850 unique mtDNA variants defined at more than half of all mtDNA bases. The vast majority of variant calls (98%) are homoplasmic or near homoplasmic, whereas 2% are heteroplasmic. Variation in mitochondrial genomes contributes to many human diseases and has had unique value in the study of human evolutionary genetics. We hope that the addition of mtDNA to gnomAD will enable researchers to better understand the role of mtDNA variation in both health and disease states.
Previous gnomAD callsets have not included mtDNA variants because their properties do not fit the assumptions that we use with our nuclear variant calling pipeline. These properties include:
gnomAD v3.1 New Content, Methods, Annotations, and Data Availability
We’re proud to announce the gnomAD v3.1 release of 759,302,267 short nuclear variants (644,267,978 passing variant quality filters) observed in 76,156 genome samples.
In this release, we have included more than 3,000 new samples specifically chosen to increase the ancestral diversity of the resource. As a result, this is the first release for which we have a designated population label for samples of Middle Eastern ancestry, and we are thrilled to be able to include these in the following population breakdown for the v3.1 release:
Population | Description | Genomes |
---|---|---|
afr | African/African American | 20,744 |
ami | Amish | 456 |
amr | Latino/Admixed American | 7,647 |
asj | Ashkenazi Jewish | 1,736 |
eas | East Asian | 2,604 |
fin | Finnish | 5,316 |
nfe | Non-Finnish European | 34,029 |
mid | Middle Eastern | 158 |
sas | South Asian | 2,419 |
oth | Other (population not assigned) | 1,047 |
gnomAD v3.1
Today, the gnomAD Production Team is proud to announce the release of gnomAD v3.1, an update to our previous genome release. The v3.1 data set adds 4,454 genomes, bringing the total to 76,156 whole genomes mapped to the GRCh38 reference sequence. (Our most recent exome release is available in gnomAD v2.1.)
Despite the minor numbering of this release, we bring you an update filled with firsts.
For the first time, we:
- Provide individual genotypes in addition to variant calls for a subset of gnomAD. This highly diverse subset includes new data from >60 distinct populations from Africa, Europe, the Middle East, South and Central Asia, East Asia, Oceania, and the Americas
- Provide and display data from samples of Middle Eastern ancestry
- Display read data visualizations for non-coding variants—an effort that required the generation of visualizations for over 2.5 billion genotypes observed in this release
- Display manual curations for predicted loss-of-function variants on the gnomAD browser
- Generated the dataset by incrementally adding new samples onto an already-existing callset, eliminating the time and cost typically required to re-call existing samples
- Make all gnomAD data—for this release as well as previous releases—freely available for download or export on three cloud providers: Amazon Web Services, Microsoft Azure, and Google Cloud
And we’re currently polishing up the final touches on our first-ever mitochondrial variant release on v3.1, which will be coming very soon.
Loss-of-Function Curations in gnomAD
Today we are pleased to announce the incorporation of manual loss-of-function (LoF) curations into the gnomAD v2.1.1 browser. As of this release, we have curated all homozygous pLoFs and a small set of recessive genes (e.g., GAA, GLA, IDUA, SMPD1, GBA, FIG4, MCOLN1, AP4B1, AP4M1, AP4S1, and AP4E1). These curations were performed for multiple projects including the recently published work, Karczewski et al. 2020 Nature, as well as other gene-specific projects. We are so excited to start sharing this data with you that we are including it in the gnomAD v3.1 release announcement but really these are a new gnomAD v2.1.1 feature at the moment. More datasets will be added to the browser as they are completed.
gnomAD v3.0
Originally published on the MacArthur Lab blog.
We are thrilled to announce the release of gnomAD v3, a catalog containing 602M SNVs and 105M indels based on the whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome. By increasing the number of whole genomes almost 5-fold from gnomAD v2.1, this release represents a massive leap in analysis power for anyone interested in non-coding regions of the genome or in coding regions poorly captured by exome sequencing.
In addition, gnomAD v3 adds new diversity – for instance, by almost doubling the number of African American samples we had in gnomAD v2 (exomes and genomes combined), and also including our first set of allele frequencies for the Amish population.
Structural variants in gnomAD
Originally published on the MacArthur Lab blog.
The first gnomAD structural variant (SV) callset is now available via the gnomAD website and integrated directly into the gnomAD Browser.
This initial gnomAD SV callset includes nearly a half-million distinct SVs across seven SV
mutational classes and 13 subclasses of complex SVs detected in 14,891 genomes spanning four major
global populations. In the publicly released callset
and gnomAD browser, you can find site, frequency, and
annotation data for ~445k SVs from 10,738 unrelated genomes with appropriate consent to allow the
release of this information.
In this post we summarize how we created this new call set, and
some important practical considerations when using it. You can get more details, including callset
generation and analyses, in the full gnomad-SV preprint available on
bioRxiv.
gnomAD v2.1
Originally published on the MacArthur Lab blog.
We are delighted to announce the release of gnomAD v2.1! This new release of gnomAD is based on the same underlying callset as gnomAD v2.0.2, but has the following improvements and new features:
- An awesome new browser
- Per-gene loss-of-function constraint
- Improved sample and variant filtering processes
- Allele frequencies in sub-continental populations in Europe and East Asia
- Allele frequencies computed for the following subsets of the data:
- Controls-only (no cases from common disease case/control studies)
- Samples not assessed for a neurological phenotype
- Samples that were not part of a cancer cohort
- Samples that are not part of the Trans-Omics for Precision Medicine (TOPMed)-BRAVO dataset
- New annotations for each variant
- Filtering allele frequency using Poisson 95% and 99% CI, per population
- Age histogram of heterozygous and homozygous carriers
gnomAD v2.1 comprises a total of 16mln SNVs and 1.2mln indels from 125,748 exomes, and 229mln SNVs and 33mln indels from 15,708 genomes. In addition to the 7 populations already present in gnomAD 2.0.2, this release now breaks down the non-Finnish Europeans and East Asian populations further into sub-populations. The population breakdown is detailed below.
The genome Aggregation Database (gnomAD)
Originally published on the MacArthur Lab blog.
Today, we are pleased to announce the formal release of the genome aggregation database (gnomAD). This release comprises two callsets: exome sequence data from 123,136 individuals and whole genome sequencing from 15,496 individuals. Importantly, in addition to an increased number of individuals of each of the populations in ExAC, we now additionally provide allele frequencies across over 5000 Ashkenazi Jewish (ASJ) individuals.