Today, the gnomAD Production Team is proud to announce the release of gnomAD v3.1, an update to our previous genome release. The v3.1 data set adds 4,454 genomes, bringing the total to 76,156 whole genomes mapped to the GRCh38 reference sequence. (Our most recent exome release is available in gnomAD v2.1.)
Despite the minor numbering of this release, we bring you an update filled with firsts.
For the first time, we:
- Provide individual genotypes in addition to variant calls for a subset of gnomAD. This highly diverse subset includes new data from >60 distinct populations from Africa, Europe, the Middle East, South and Central Asia, East Asia, Oceania, and the Americas
- Provide and display data from samples of Middle Eastern ancestry
- Display read data visualizations for non-coding variants—an effort that required the generation of visualizations for over 2.5 billion genotypes observed in this release
- Display manual curations for predicted loss-of-function variants on the gnomAD browser
- Generated the dataset by incrementally adding new samples onto an already-existing callset, eliminating the time and cost typically required to re-call existing samples
- Make all gnomAD data—for this release as well as previous releases—freely available for download or export on three cloud providers: Amazon Web Services, Microsoft Azure, and Google Cloud
And we’re currently polishing up the final touches on our first-ever mitochondrial variant release on v3.1, which will be coming very soon.
These new features, along with further details regarding the production of the v3.1 release, are described in the following blog posts:
- gnomAD v3.1 New Content, Methods, Annotations, and Data Availability
- Loss-of-Function Curations in gnomAD
- Open access to gnomAD data on multiple cloud providers
We hope these new features will prove useful to our users, and we welcome feedback on the latest release at our email address: firstname.lastname@example.org.
We wish to acknowledge the extraordinary and extensive network of supporters and contributors who have made this release possible, beginning with the gnomAD Consortium of 143 principal investigators, whose willingness to share data is the cornerstone of all our efforts.
Additionally, we thank the Broad Genomics Platform for their continued, material support in sequencing, storing, and managing tens of thousands of genome samples included in this release; Trevyn Langsford, Nareh Sahakian, and Charlotte Tolonen in the Broad Data Sciences Platform for their help re-processing a critical cohort of externally-sequenced samples to the functional equivalence standard; Alicia Martin for help obtaining data from the Human Genome Diversity Project and parsing the relevant population labels; Chris Vittal for developing and executing the novel incremental joint calling procedure we used to create the v3.1 callset in record time (and on a record budget); Kat Tarasova for truly heroic, behind-the-scenes work ironing out sample permissions and metadata; Ben Weisburd for his resourceful and creative pipeline design enabling read visualizations for >2.5 billion individual genotypes; Laurent Francioli for his expert opinions on analysis and detailed code review; Julia Goodrich for executing the lion’s share of the quality control and production of the nuclear variant callset; Michael Wilson and William Phu for their essential work helping to produce the nuclear variant callset; Katherine Chao for important code contributions to the nuclear variant production pipeline; Moriel Singer-Berk, Eleanor Seaby, Eleina England, Dan Rhodes, Rachel Son, and Emily Evangelista for their contributions to the pLoF curation; Kishore Jaganathan and Kyle Farh at Illumina for generating and making SpliceAI scores available to our users; the Hail team for timely and essential help troubleshooting runtime issues and pipeline bugs; Nick Watts and Matt Solomonson for their brilliant and beautiful design of the gnomAD browser and for their meticulous and ongoing labors to maintain it; Grace Tiao for orchestrating and overseeing the many moving parts of the release; Anne O’Donnell-Luria, Konrad Karczewksi, Ben Neale, Mike Talkowski, Daniel MacArthur, Mark Daly, and Heidi Rehm for moral support, advice, and suggestions regarding callset generation, quality control, browser design, and scientific directions; and Eric Banks, Anthony Philippakis, Danielle Ciofani, and Cotton Seed for their assistance in making our data freely available on multiple cloud providers to the benefit of our many users around the world.