Biobank Introduction

1. UK Biobank

Introduction:
UK Biobank is a large-scale, long-term biomedical database containing detailed health and genetic information from approximately 500,000 UK volunteers. The data spans genomics, imaging, health records, and lifestyle factors.

Features:

Participants: 500,000
Data Types: Genomic data, imaging data, health records, lifestyle questionnaires, etc.
Research Areas: Broad medical and health research, including cardiovascular disease, cancer, diabetes, etc.

Website: UK Biobank

Publication: What makes UK Biobank special?

2. All of Us Research Program

Introduction:
The All of Us Research Program, initiated by the National Institutes of Health (NIH), strives to transform health care by creating a comprehensive database that reflects the rich diversity of the United States. By collecting detailed health information from at least one million people, the program aims to speed up medical breakthroughs and reduce health disparities. This initiative supports the development of precision medicine by integrating diverse data sources such as genomic, electronic health records, and lifestyle information.

Features:

Participants: Over 545,000 individuals enrolled, aiming for a target of 1 million or more
Data Types: Genomic data, electronic health records, environmental data, lifestyle questionnaires, etc.
Research Areas: Precision medicine, health disparities, chronic diseases, etc.

Website: All of Us Research Program

3. FinnGen

Introduction:
FinnGen is a public–private partnership research project that combines imputed genotype data generated from newly collected and legacy samples from Finnish biobanks and digital health record data from Finnish health registries (https://www.finngen.fi/en) with the aim to provide new insights into disease genetics. FinnGen includes 9 Finnish biobanks, research institutes, universities and university hospitals, 13 international pharmaceutical industry partners and the Finnish Biobank Cooperative (FINBB) in a pre-competitive partnership.

Features:

Participants: 500,000 participants.
Data Types: Imputed genotype data from newly collected and legacy samples, digital health record data from Finnish health registries collected since 1969.
Research Areas: Disease genetics, identification of low-frequency deleterious alleles, genetic basis of health and disease, Finnish Disease Heritage.

Website: FinnGen

Publication: FinnGen provides genetic insights from a well-phenotyped isolated population

4. The Cancer Genome Atlas (TCGA)

Introduction:
TCGA is a project by the National Institutes of Health in the United States, aiming to perform genomic analyses of various cancers to advance cancer research and treatment. TCGA collected, characterized, and analyzed cancer samples from over 11,000 patients over a 12 year period. The process was complex and constantly evolving to accommodate new technologies, the nuances of different cancer types, and other changing factors.

Features:

Participants: Over 11,000 patients.
Data Types: clinical information (e.g., smoking status), molecular analyte metadata (e.g., sample portion weight), molecular characterization data (e.g., gene expression values)
Research Areas: Cancer genomics, cancer treatment, molecular markers

Website: The Cancer Genome Atlas (TCGA)

5. Million Veteran Program

Introduction:
The Million Veteran Program (MVP) is a health research initiative studying the impact of genes, lifestyles, military experiences, and exposures on health. MVP makes significant discoveries for Veterans, including better prevention and treatment for PTSD, depression, anxiety, and suicide. It also examines risk and protective factors for heart diseases and advances cancer research, focusing on breast and prostate cancers. The program promotes healthier living through nutrition and lifestyle choices and investigates various conditions such as diabetes, Alzheimer’s disease, endometriosis, osteoarthritis, and impacts of military exposures, driving medical advancements relevant to Veterans.

Features:

Participants: As of August 3, 2015, the Million Veteran Program (MVP) has enrolled 397,104 veterans across about 50 nationwide sites, with genotyping data available for 199,348 participants.
Data Types: Questionnaires, VA electronic health records, blood samples for genomic and other testing
Research Areas: Precision medicine, genomic research, health care delivery, clinical decision-making

Website: Million Veteran Program

Publication: Million Veteran Program: A mega-biobank to study genetic influences on health and disease

6. BioBank Japan

Introduction:
Established in 2003 at the University of Tokyo, BioBank Japan (BBJ) is a national project aimed at realizing personalized medicine through genetic information. BBJ collects and securely stores biological samples and clinical data, removing personal details like name, address, and date of birth, and assigning new ID numbers for research. These samples and data are provided to researchers in academic institutions and private companies to advance genomic medicine and develop new diagnostic and therapeutic methods. BBJ supports the development of personalized medicine by facilitating research based on genetic information.

Features:

Participants: 270,000 patients
Data Types: DNA from all participants at baseline, annual serum samples, clinical information via interviews and medical record reviews
Research Areas: Personalized medicine, genomic research, survival data analysis

Website: BioBank Japan

Publication: Overview of the BioBank Japan Project: Study design and profile

7. BioVU

Introduction:
BioVU is Vanderbilt University Medical Center’s biobank that has achieved a significant milestone — the deep-freeze storage of more than 300,000 biological samples. Launched in 2007, BioVU is the world’s largest DNA biobank based at a single academic institution. The project aims to develop a DNA biobank linked to phenotypic data derived from electronic medical records (EMR) to facilitate a wide range of biomedical research, particularly studies of genotype-phenotype associations.

Features:

Participants: Over 300,000 de-identified DNA samples linked to electronic health records as of January 2023.
Data Types: Genomic data, electronic health records, biospecimens.
Research Areas: Personalized medicine, genetic determinants of disease, pharmacogenomics, disease mechanisms.

Website: BioVU

Publication: Development of a Large-Scale De-Identified DNA Biobank to Enable Personalized Medicine

8. Mayo Clinic Biobank

Introduction:
The Mayo Clinic Biobank is a comprehensive biorepository created to support a wide range of medical research. Launched in 2009, it is part of Mayo Clinic's extensive commitment to individualized medicine. The biobank collects blood and blood derivatives along with health information donated by Mayo Clinic patients, aiming to create a resource for studies across various health conditions, without focusing on any specific disease.

Features:

Participants: 56,000 participants.
Data Types: Genomic data, electronic health records (EHRs), biospecimens such as DNA, serum, plasma, and white blood cells, and self-reported health and lifestyle information.
Research Areas: Personalized medicine, chronic disease research, pharmacogenomics, and various common diseases like hyperlipidemia, hypertension, osteoarthritis, and cancer.

Website: Mayo Clinic Biobank

Publication: The Mayo Clinic Biobank: A building block for individualized medicine

9. 1000 Genomes Project (IGSR: The International Genome Sample Resource)

Introduction:
The 1000 Genomes Project created a catalogue of common human genetic variation, using openly consented samples from people who declared themselves to be healthy. The reference data resources generated by the project remain heavily used by the biomedical science community. The International Genome Sample Resource (IGSR) maintains and shares the human genetic variation resources built by the 1000 Genomes Project. We also update the resources to the current reference assembly, add new data sets generated from the 1000 Genomes Project samples and add data from projects working with other openly consented samples.

Features:

Participants: 2,504 individuals from 26 populations across the globe.
Data Types: Whole-genome sequences, exome sequences, short and structural variations, genotype imputation.
Research Areas: Human genetic variation, population genetics, genetic association studies, structural variation analysis.

Website: The International Genome Sample Resource

Publication: The International Genome Sample Resource (IGSR) collection of open human genomic variation resources

10. Genome Aggregation Database (gnomAD)

Introduction:
The Genome Aggregation Database (gnomAD), originally launched in 2014 as the Exome Aggregation Consortium (ExAC), is the result of a coalition of investigators willing to share aggregate exome and genome sequencing data from a variety of large-scale sequencing projects, and make summary data available for the wider scientific community. The gnomAD database is composed of exome and genome sequences from around the world.

Features:

Participants: gnomAD v4 includes data from 807,162 individuals, comprising 730,947 exomes and 76,215 genomes.
Data Types: The dataset includes short variants such as SNVs (Single Nucleotide Variants) and InDels, as well as structural variants (SVs) like deletions, duplications, insertions, inversions, and complex variants. It also includes rare coding CNVs (Copy Number Variants) with a site frequency of less than 1%.
Research Areas: Human genetic variation, population genetics, allele frequency reference for severe pediatric diseases.

Website: Genome Aggregation Database (gnomAD)

Publication: Analysis of protein-coding genetic variation in 60,706 humans, The mutational constraint spectrum quantified from variation in 141,456 humans

11. deCODE Genetics

Introduction:
deCODE Genetics is a global leader in the discovery of genetic risk factors for common diseases. With a robust gene discovery engine that utilizes detailed genetic and medical information from around 500,000 individuals worldwide, deCODE employs proprietary statistical algorithms and informatics tools to analyze and interpret large datasets. Founded in 1996, deCODE's unique population-based approach, particularly in Iceland, has positioned the company at the forefront of gene discovery and whole-genome sequencing.

Features:

Participants: Over 160,000 volunteer participants in Iceland, comprising more than half of the adult population, with a genealogy database covering the entire Icelandic population.
Data Types: Whole-genome sequences, genotypic data, genealogical records, and medical information.
Research Areas: Genetic risk factors for common diseases, whole-genome sequencing, gene discovery, correlation between genetic variations and phenotypes.

Website: deCODE Genetics

12. 100,000 Genomes Project

Introduction:
The 100,000 Genomes Project, initiated by Genomics England, aimed to sequence 100,000 genomes from around 85,000 NHS patients who were affected by rare diseases or cancer. This British initiative has provided groundbreaking insights into the role that genomics can play in healthcare. While recruitment for the project was completed in December 2018, research and analysis continue, contributing to the advancement of genomic medicine.

Features:

Participants: 100,000 genomes sequenced from approximately 85,000 NHS patients, focusing on those affected by rare diseases or cancer.
Data Types: whole genome sequences.
Research Areas: The project has been instrumental in advancing the diagnosis and treatment of rare diseases, cancer genomics, and the study of genetic predispositions to diseases.

Website: 100,000 Genomes Project

Publication: 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report

13. Aging Research Biobank

Introduction:
The Aging Research Biobank, managed by the National Institute on Aging (NIA) at the National Institutes of Health (NIH), U.S. Department of Health and Human Services (HHS), is a comprehensive resource designed to facilitate research on aging and age-related diseases. Established in 2018 by the NIA's Division of Geriatrics and Clinical Gerontology, this state-of-the-art biobank serves as a centralized platform for the storage and distribution of valuable biospecimens and related phenotypic and clinical data. Over the years, the collections in the Aging Research Biobank have made significant contributions to public health and continue to do so by enabling research that addresses key scientific questions related to aging, with the goal of developing prognostics, markers, and therapeutics.

Participants: The biobank includes samples and data from various NIA-supported longitudinal and clinical studies, covering a wide demographic range of older adults.
Data Types: The biobank offers a variety of resources including biological samples (e.g., blood, DNA, tissue), phenotypic data, clinical data, and images.
Research Areas: The Aging Research Biobank supports research on the biological mechanisms of aging, age-related diseases, and the development of strategies to promote healthy aging. The platform also includes COVID-19 resources developed in collaboration with the NHLBI Biorepository, addressing enhanced safety procedures and guidelines for handling SARS-CoV-2 specimens.

Website: Aging Research Biobank

Publication: https://agingresearchbiobank.nia.nih.gov/publications/

The Large-scale Biobanks

1. UK Biobank

2. All of Us Research Program

3. FinnGen

4. The Cancer Genome Atlas (TCGA)

5. Million Veteran Program

6. BioBank Japan

7. BioVU

8. Mayo Clinic Biobank

9. 1000 Genomes Project (IGSR: The International Genome Sample Resource)

10. Genome Aggregation Database (gnomAD)

11. deCODE Genetics

12. 100,000 Genomes Project

13. Aging Research Biobank