We’re very pleased to announce that gnomAD data is now available as a free public dataset on Amazon Web Services, Microsoft Azure, and Google Cloud. Researchers may download and read gnomAD data for free in all regions from all three cloud providers.
From our beginnings as a project, we have been committed to making gnomAD data as free and accessible to the world as possible. Working in partnership with Amazon, Microsoft, and Google’s public data hosting programs, we have expanded the number of cloud platforms on which gnomAD data is fully free to access. Researchers will no longer need to maintain personal copies of gnomAD data on these cloud platforms, eliminating long-term storage costs as well as transfer fees associated with copying gnomAD data into private cloud storage.
Access through these cloud providers also enables researchers to integrate gnomAD data with other genomics datasets, such as the UK Biobank Pan-Ancestry Summary Statistics on AWS, Human PanGenomics Project on AWS, Azure Genomics Data Lake, Library of Integrated Network-Based Cellular Signatures (LINCS) on Google Cloud, and Human Variant Annotation Datasets on Google Cloud.
As we anticipate further exponential growth of human genomic datasets over the next few years, we believe that the computational genomics community can benefit from free and open access to shared datasets. By reducing unnecessary duplication of terabyte- and petabyte-scale genomic datasets, we as a community can save scarce environmental, capital, and human resources that would otherwise be spent maintaining many copies across separate institutions.
In doing so, we hope to encourage an even wider range of individuals and institutions to make use of gnomAD data for innovative research in human genetics and for the development of translational tools and medicines to treat and cure disease.
All gnomAD data, stretching back to our earliest release, is now available through these cloud providers, along with supporting resources (such as truth sets and interval lists used in the creation of gnomAD releases, and data from our latest collection of papers in Nature). Individual links to specific resources hosted by each cloud provider are updated on our Downloads page.
For further details about gnomAD and other open datasets hosted on each provider, please see:
- Broad Institute gnomAD data now accessible on the Registry of Open Data on AWS
- Genome Aggregation Database (gnomAD): Now available on Azure Open Datasets
- Providing open access to the Genome Aggregation Database (gnomAD) on Google Cloud
How to access the data
Registry of Open Data on AWS
Files can be browsed and downloaded using the AWS Command Line Interface.
aws s3 ls s3://gnomad-public-us-east-1/release/
Azure Open Datasets
Files can be browsed and downloaded using AzCopy or Azure Storage Explorer.
azcopy ls https://azureopendatastorage.blob.core.windows.net/gnomad/
Google Cloud Public Datasets
Files can be browsed and downloaded using gsutil.
gsutil ls gs://gcp-public-data--gnomad/release/