Posts by gnomAD Production Team

News

The news page highlights new features, versions, or other major announcements. See our changelog for all changes to gnomAD, including minor ones.

gnomAD v4.1

Katherine Chao, Michael Wilson, Julia Goodrich, gnomAD Production Team

We have released gnomAD v4.1, an update to our latest major release. This update fixes the allele number issue in gnomAD v4.0 previously…

gnomAD v4.0

November 01, 2023 in Announcements / Release

Katherine Chao, gnomAD Production Team

Today, we are delighted to announce the release of gnomAD v4, which includes data from 807,162 total individuals. This release is nearly 5x…

Genetic Ancestry

November 01, 2023 in Announcements

Katherine Chao, gnomAD Production Team

A critical component to the medical and functional interpretation of genetic variants involves the accurate estimation of their frequency. A…

Using the gnomAD genetic ancestry principal components analysis loadings and random forest classifier on your dataset

October 15, 2021

Julia Goodrich, gnomAD Production Team

By popular request, we are now releasing the genetic ancestry principal components analysis (PCA) variant loadings and accompanying random forest (RF) model used for genetic ancestry group inference in gnomAD v2 and v3. This post discusses how those files were generated and how they can be used on another dataset. However, the use of these resources will not be appropriate for all datasets, and therefore we are including a discussion of the caveats associated with using these loadings and the RF model.

gnomAD v3.1

October 29, 2020 in Announcements / Releases

gnomAD Production Team

Today, the gnomAD Production Team is proud to announce the release of gnomAD v3.1, an update to our previous genome release. The v3.1 data set adds 4,454 genomes, bringing the total to 76,156 whole genomes mapped to the GRCh38 reference sequence. (Our most recent exome release is available in gnomAD v2.1.)

Despite the minor numbering of this release, we bring you an update filled with firsts.

For the first time, we:

Provide individual genotypes in addition to variant calls for a subset of gnomAD. This highly diverse subset includes new data from >60 distinct populations from Africa, Europe, the Middle East, South and Central Asia, East Asia, Oceania, and the Americas
Provide and display data from samples of Middle Eastern ancestry
Display read data visualizations for non-coding variants—an effort that required the generation of visualizations for over 2.5 billion genotypes observed in this release
Display manual curations for predicted loss-of-function variants on the gnomAD browser
Generated the dataset by incrementally adding new samples onto an already-existing callset, eliminating the time and cost typically required to re-call existing samples
Make all gnomAD data—for this release as well as previous releases—freely available for download or export on three cloud providers: Amazon Web Services, Microsoft Azure, and Google Cloud

And we’re currently polishing up the final touches on our first-ever mitochondrial variant release on v3.1, which will be coming very soon.

Requester-Pays Notice to Users

July 09, 2020 in Announcements

gnomAD Production Team

Last month the gnomAD project was billed thousands of dollars in cloud egress charges—above and beyond our normal expected costs—for users who were accessing Hail-formatted public gnomAD data. The vast majority of this excess cost was due to users spinning up machines in international regions and reading data from our US-region storage bucket.

As a result, we have decided to move gnomAD Hail tables and matrix tables to a requester-pays bucket, while keeping the VCFs and smaller public files free to download as usual. We decided to do this for the following reasons:

From our beginnings as a project, we have been committed to making gnomAD data as free and accessible to the world as humanly possible. We pay for each VCF download of our data, and we have resisted proposals to add gating mechanisms (such as click-through agreements) to our data. We want to reaffirm our commitment to our users by continuing to make VCFs free to download to our growing user base.
However, to maintain gnomAD, we must keep costs as low as possible and fund aspects of gnomAD that benefit the widest user base. Providing free access to the Hail-formatted versions of the data is very costly and benefits only a small proportion of our user base—those running cloud pipelines on the data. Therefore, we have decided to require users to supply Google Cloud billing information when they access Hail versions of gnomAD.