Posts by Julia Goodrich

News

The news page highlights new features, versions, or other major announcements. See our changelog for all changes to gnomAD, including minor ones.


Using the gnomAD ancestry principal components analysis loadings and random forest classifier on your dataset

By popular request, we are now releasing the ancestry principal components analysis (PCA) variant loadings and accompanying random forest (RF) model used for global ancestry inference in gnomAD v2 and v3. This post discusses how those files were generated and how they can be used on another dataset. However, the use of these resources will not be appropriate for all datasets, and therefore we are including a discussion of the caveats associated with using these loadings and the RF model.

Variant Co-Occurrence (Phasing) Information in gnomAD

Today, we are pleased to announce the incorporation of variant co-occurrence (inferred phasing) information in the gnomAD v2 browser. Phase refers to the genetic relationship between a pair of variants; that is, whether the variants are on the same copy of the gene (cis) or on different copies of the gene (trans). We are releasing inferred phasing data for all pairs of variants within a gene where both variants have a global allele frequency in gnomAD exomes <5% and are either coding, flanking intronic (from position -1 to -3 in acceptor sites, and +1 to +8 in donor sites) or in the 5’/3’ UTRs. This encompasses 20,921,100 pairs of variants across 19,685 genes. We envision that this data will be of tremendous help to the medical genetics community in identifying and interpreting co-occurring variants in the context of recessive conditions.

gnomAD v3.1 New Content, Methods, Annotations, and Data Availability

We’re proud to announce the gnomAD v3.1 release of 759,302,267 short nuclear variants (644,267,978 passing variant quality filters) observed in 76,156 genome samples.

In this release, we have included more than 3,000 new samples specifically chosen to increase the ancestral diversity of the resource. As a result, this is the first release for which we have a designated population label for samples of Middle Eastern ancestry, and we are thrilled to be able to include these in the following population breakdown for the v3.1 release:

Population Description Genomes
afr African/African American 20,744
ami Amish 456
amr Latino/Admixed American 7,647
asj Ashkenazi Jewish 1,736
eas East Asian 2,604
fin Finnish 5,316
nfe Non-Finnish European 34,029
mid Middle Eastern 158
sas South Asian 2,419
oth Other (population not assigned) 1,047