Posts by Daniel MacArthur

The gnomAD Papers

Originally published on the MacArthur Lab blog.

It is an absolute pleasure to announce the official release of the gnomAD manuscript package. In a set of seven papers, published in Nature, Nature Medicine, and Nature Communications, we describe a wide variety of different approaches to exploring and understanding the patterns of genetic variation revealed by exome and genome sequences from 141,456 humans.

Publication announcements always feel a little strange in this new era of open science. In our case, the underlying gnomAD data set has been publicly fully available for browsing and downloading since October 2016, and we’ve had the preprints available online since early 2019. However, it’s undeniable that there is something deeply gratifying about seeing these pieces of science revealed in their final, concrete form.

For me this package has a particular significance – it represents the culmination of seven and a half years of work with a phenomenal team at the Broad Institute, and marks my transition to a new role in Australia, and the handover of the gnomAD project to new leadership. So I wanted to spend some time in this post reflecting on the history of the project that became gnomAD, the people who’ve made it possible, and where things will go from here.

gnomAD v3.0

Originally published on the MacArthur Lab blog.

We are thrilled to announce the release of gnomAD v3, a catalog containing 602M SNVs and 105M indels based on the whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome. By increasing the number of whole genomes almost 5-fold from gnomAD v2.1, this release represents a massive leap in analysis power for anyone interested in non-coding regions of the genome or in coding regions poorly captured by exome sequencing.

In addition, gnomAD v3 adds new diversity – for instance, by almost doubling the number of African-American samples we had in gnomAD v2 (exomes and genomes combined), and also including our first set of allele frequencies for the Amish population.

Structural variants in gnomAD

Originally published on the MacArthur Lab blog.

The first gnomAD structural variant (SV) callset is now available via the gnomAD website and integrated directly into the gnomAD Browser.

This initial gnomAD SV callset includes nearly a half-million distinct SVs across seven SV mutational classes and 13 subclasses of complex SVs detected in 14,891 genomes spanning four major global populations. In the publicly released callset and gnomAD browser, you can find site, frequency, and annotation data for ~445k SVs from 10,738 unrelated genomes with appropriate consent to allow the release of this information.
In this post we summarize how we created this new call set, and some important practical considerations when using it. You can get more details, including callset generation and analyses, in the full gnomad-SV preprint available on bioRxiv.