Public datasets

Explore 7.4 PB of genomics data across 6.5M files

Sample Genomics Data

Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies. (5.15 GB)

Sample Genomics Data
5.15 GB Data from various genomics file formats (BAM, VCF, BED, etc), and sequencing technologies.
Human Pangenome Project

Sequencing data and analysis of 10 trios. First complete human genome assembly. (4.46 PB)

Human Pangenome Project
4.46 PB Sequencing data and analysis of 10 trios. First complete human genome assembly.
Genome in a Bottle

Reference data from several sequencing technologies. Used as ground truth for benchmarking. (130 TB)

Genome in a Bottle
130 TB Reference data from several sequencing technologies. Used as ground truth for benchmarking.
1000 Genomes Project

Sequencing data and analysis of >2,500 individuals from around the world. (766 TB)

1000 Genomes Project
766 TB Sequencing data and analysis of >2,500 individuals from around the world.
Bio Data Zoo

Example genomics data for tool developers (619 kB)

Bio Data Zoo
619 kB Example genomics data for tool developers
Platinum Pedigree

Whole genome sequencing using five technologies on a 4-generation family (11.8 TB)

Platinum Pedigree
11.8 TB Whole genome sequencing using five technologies on a 4-generation family
DeepVariant Datasets

Sample data used for testing and benchmarking the DeepVariant variant caller. (6.26 TB)

DeepVariant Datasets
6.26 TB Sample data used for testing and benchmarking the DeepVariant variant caller.
KinDEL dataset

DNA-Encoded Library Dataset For Kinase Inhibitors, for benchmarking machine learning models (24.6 GB)

KinDEL dataset
24.6 GB DNA-Encoded Library Dataset For Kinase Inhibitors, for benchmarking machine learning models
Broad Public Datasets

Sample datasets from the Broad Institute for testing bioinformatics workflows. (4.09 TB)

Broad Public Datasets
4.09 TB Sample datasets from the Broad Institute for testing bioinformatics workflows.
Genome Ark

Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species. (1.61 PB)

Genome Ark
1.61 PB Data from the Vertebrate Genomes Project (VGP), featuring reference genomes for vertebrate species.
Ensembl FTP Site

Explore data on the Ensembl FTP site interactively

Ensembl FTP Site
Explore data on the Ensembl FTP site interactively
Human Microbiome Project

Microbiome data of 300 healthy adults, and several individuals with disease conditions. (5.86 TB)

Human Microbiome Project
5.86 TB Microbiome data of 300 healthy adults, and several individuals with disease conditions.
Australasian Genomes

Sequencing datasets and reference genomes of several threatened Australasian species. (7.97 TB)

Australasian Genomes
7.97 TB Sequencing datasets and reference genomes of several threatened Australasian species.
3000 Rice Genomes

Sequencing data and analysis of >3,000 rice varieties from 89 countries. (255 TB)

3000 Rice Genomes
255 TB Sequencing data and analysis of >3,000 rice varieties from 89 countries.
GATK Test Data

Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq. (1.05 TB)

GATK Test Data
1.05 TB Test datasets for the GATK variant caller, with data from WGS, WES, and RNA-seq.
Element Bio Data

Data from the Element Bio manuscript about the Avidity instrument. (535 GB)

Element Bio Data
535 GB Data from the Element Bio manuscript about the Avidity instrument.
ONT Data

Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples. (160 TB)

ONT Data
160 TB Oxford Nanopore benchmarking datasets from various sequencing chemistries and samples.
RNA-Seq Nanopore Data

RNA-Seq data from Nanopore sequencing, with matched short-read RNA-Seq from the Singapore Nanopore Expression Project (SG-NEx) (18 TB)

RNA-Seq Nanopore Data
18 TB RNA-Seq data from Nanopore sequencing, with matched short-read RNA-Seq from the Singapore Nanopore Expression Project (SG-NEx)
Pediatric Brain Tumor Atlas

Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs. (3.13 TB)

Pediatric Brain Tumor Atlas
3.13 TB Analysis of pediatric brain tumors: gene expression, gene fusions, somatic mutations, CNVs, and SVs.
Genome in a Bottle (FTP)

Reference data from several sequencing technologies. Used as ground truth for benchmarking.

Genome in a Bottle (FTP)
Reference data from several sequencing technologies. Used as ground truth for benchmarking.
To feature your public bucket on 42basepairs, please reach out to us!