Interested in working on the development of this resource? Apply here.

About gnomAD

The Genome Aggregation Database (gnomAD), is a coalition of investigators seeking to aggregate and harmonize exome and genome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. In its first release, which contained exclusively exome data, it was known as the Exome Aggregation Consortium (ExAC).

The data set provided on this website spans 123,136 exomes and 15,496 genomes from unrelated individuals sequenced as part of various disease-specific and population genetic studies. This blog post describes the latest release. We have removed individuals known to be affected by severe pediatric disease, as well as their first-degree relatives, so this data set should serve as a useful reference set of allele frequencies for severe disease studies - however, note that some individuals with severe disease may still be included in the data set, albeit likely at a frequency equivalent to or lower than that seen in the general population.

All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. The processing pipelines were written in the WDL workflow definition language and executed using the Cromwell execution engine, open-source projects for defining and executing genomic workflows at massive scale on multiple platforms. The gnomAD data set contains individuals sequenced using multiple exome capture methods and sequencing chemistries, so coverage varies between individuals and across sites. This variation in coverage is incorporated into the variant frequency calculations for each variant.

gnomAD was QCed and analysed using the Hail open-source framework for scalable genetic analysis.

A list of gnomAD Principal Investigators and groups that have contributed data and analysis to the current release is available below.

The generation of this call set was funded primarily by the Broad Institute, and the data here are released publicly for the benefit of the wider biomedical community. There are no publication restrictions or embargoes on these data. Please cite the ExAC paper for any use of these data.

The data are available under the ODC Open Database License (ODbL) (summary available here): you are free to share and modify the gnomAD data so long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the dataset under the same ODbL license.

The aggregation and release of summary data from the exomes and genomes collected by the Genome Aggregation Database has been approved by the Partners IRB (protocol 2013P001339, "Large-scale aggregation of human genomic data").

For bug reports, please file an issue on Github.

Principal Investigators

  • Daniel MacArthur
  • Aarno Palotie
  • Andres Metspalu
  • Anne Remes
  • Adolfo Correa
  • Andre Franke
  • Ann Pulver
  • Ben Glaser
  • Ben Neale
  • Bong-Jo Kim
  • Carlos Pato
  • Carlos A Aguilar Salinas
  • Christina Hultman
  • Christine M. Albert
  • Christopher Haiman
  • Clicerio Gonzalez
  • Colin Palmer
  • Craig Hanis
  • Dan Roden
  • Dan Turner
  • Dana Dabelea
  • Daniel Chasman
  • Danish Saleheen
  • David Altshuler
  • David Goldstein
  • Dawood Darbar
  • Dermot McGovern
  • Diego Ardissino
  • Donald Bowden
  • Emelia J. Benjamin
  • Erkki Vartiainen
  • Erwin Bottinger
  • Gad Getz
  • George Kirov
  • Gil Atzmon
  • Harlan M. Krumholz
  • Harry Sokol
  • Heribert Schunkert
  • Hilkka Soininen
  • Hugh Watkins
  • Jaakko Kaprio
  • Jaana Suvisaari
  • James Meigs
  • James Ware
  • James Wilson
  • Jaspal Kooner
  • Jaume Marrugat
  • Jeanette Erdmann
  • Jeremiah Scharf
  • John Barnard
  • John Chambers
  • John D. Rioux
  • Jose Florez
  • Josée Dupuis
  • Judy Cho
  • Juliana Chan
  • Kyong Soo Park
  • Leif Groop
  • Lorena Orozco
  • Lori Bonnycastle
  • Maija Wessman
  • Mark Daly
  • Mark McCarthy
  • Markku Laakso
  • Martti Färkkilä
  • Matthew Bown
  • Matthew Harms
  • Matti Holi
  • Michael Boehnke
  • Michael O'Donovan
  • Michael Owen
  • Mikko Hiltunen
  • Mikko Kallela
  • Mina Chung
  • Ming Tsuang
  • Moore Shoemaker
  • Nazneen Rahman
  • Nilesh Samani
  • Olle Melander
  • Pamela Sklar
  • Patrick T. Ellinor
  • Patrick Sullivan
  • Peter Nilsson
  • Ramnik Xavier
  • Ravindranath Duggirala
  • Rinse Weersma
  • Roberto Elosua
  • Ronald Ma
  • Ruth Loos
  • Ruth McPherson
  • Samuli Ripatti
  • Sekar Kathiresan
  • Seppo Koskinen
  • Soo Heon Kwak
  • Stephen Glatt
  • Steve McCarroll
  • Steven A. Lubitz
  • Subra Kugathasan
  • Tai Shyong
  • Tariq Ahmad
  • Teresa Tusie Luna
  • Terho Lehtimäki
  • Tim Spector
  • Tõnu Esko
  • Tuomi Tiinamaija
  • Veikko Salomaa
  • Yik Ying Teo
  • Young Jin Kim

Contributing projects

  • 1000 Genomes
  • 1958 Birth Cohort
  • ALSGEN
  • Alzheimer's Disease Sequencing Project (ADSP)
  • Atrial Fibrillation Genetics Consortium (AFGen)
  • Estonian Genome Center, University of Tartu (EGCUT)
  • Bulgarian Trios
  • Finland-United States Investigation of NIDDM Genetics (FUSION)
  • Finnish Twin Cohort Study
  • FINN-ADGEN
  • FINRISK
  • Framingham Heart Study
  • Génome Québec - Genizon Biobank
  • Genomic Psychiatry Cohort
  • GoT2D
  • Genotype-Tissue Expression Project (GTEx)
  • Health2000
  • Inflammatory Bowel Disease:
    • Helsinki University Hospital Finland
    • NIDDK IBD Genetics Consortium
    • Quebec IBD Genetics Consortium
  • Jackson Heart Study
  • Kuopio Alzheimer Study
  • LifeLines Cohort
  • MESTA
  • METabolic Syndrome In Men (METSIM)
  • Finnish Migraine Study
  • Myocardial Infarction Genetics Consortium (MIGen):
    • Leicester Exome Seq
    • North German MI Study
    • Ottawa Genomics Heart Study
    • Pakistan Risk of Myocardial Infarction Study (PROMIS)
    • Precocious Coronary Artery Disease Study (PROCARDIS)
    • Registre Gironi del COR (REGICOR)
    • South German MI Study
    • Variation in Recovery: Role of Gender on Outcomes of Young AMI Patients (VIRGO)
  • National Institute of Mental Health (NIMH) Controls
  • NHLBI-GO Exome Sequencing Project (ESP)
  • NHLBI TOPMed
  • Schizophrenia Trios from Taiwan
  • Sequencing Initiative Suomi (SiSu)
  • SIGMA-T2D
  • Swedish Schizophrenia & Bipolar Studies
  • T2D-GENES
    • GoDARTS
  • T2D-SEARCH
  • The Cancer Genome Atlas (TCGA)

Production team

  • Eric Banks
  • Charlotte Tolonen
  • Christopher Llanwarne
  • Dave Shiga
  • Fengmei Zhao
  • Jeff Gentry
  • Jose Soto
  • Kathleen Tibbetts
  • Khalid Shakir
  • Kristian Cibulskis
  • Laura Gauthier
  • Miguel Covarrubias
  • Monkol Lek
  • Ryan Poplin
  • Ruchi Munshi
  • Sam Novod
  • Thibault Jeandet
  • Valentin Ruano-Rubio
  • Yossi Farjoun

Analysis team

  • Konrad Karczewski
  • Laurent Francioli
  • Kristen Laricchia
  • Monkol Lek
  • Anne O'Donnell Luria
  • Ben Neale
  • Beryl Cummings
  • Cotton Seed
  • Daniel Birnbaum
  • Eric Minikel
  • James Ware
  • Kaitlin Samocha
  • Laramie Duncan
  • Mark Daly
  • Tim Poterba

Website team

  • Ben Weisburd
  • Konrad Karczewski
  • Matthew Solomonson
  • Daniel Birnbaum

Ethics team

  • Jessica Alföldi
  • Andrea Saltzman
  • Molly Schleicher
  • Namrata Gupta
  • Stacey Donnelly

Broad Genomics Platform

  • Stacey Gabriel
  • Kristen Connolly
  • Steven Ferriera

Funding

NIGMS R01 GM104371
(PI: MacArthur)

NIDDK U54 DK105566
(PIs: MacArthur and Neale)

The vast majority of the data storage, computing resources, and human effort used to generate this call set were donated by the Broad Institute.