Data Resources for Proteogenomics-related Studies

This collection of online resources offers brief overviews and easy access to leading databases in the fields of proteomics and proteogenomics research. It covers various research repositories, from raw mass spectrometry data files to preprocessed multi-omics abundance tables.

1. CPTAC Pan-Cancer Data

This page provides information about the data generated by the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) through the application of large-scale proteomic and genomic analysis to conduct a comprehensive and interconnected proteogenomic characterization of the most prevalent 10 types of cancer. The Supplementary Information section of the page provides download link for 11 different data types (global proteomics, 4 different types of post-translation-modification proteomics, RNAseq, miRNA-seq, CNV, mutation, Methylation Array, clinical and other metadata) for >1000 tumors. All omics data files are preprocessed and carefully QCed abundance tables without access constrain.

CPTAC Pan-Cancer Data

2. Cancer Cell Lin Encyclopedia Proteomics Data

This website is for a comprehensive proteomics study for 375 cell lines from diverse lineages. The Normalized and Other Data section provides links for data table files for peptide-level and protein-level quantification, as well as intermediate analysis results (e.g. correlation between protein abundances with mutations and RNA expressions).

Cancer Cell Lin Encyclopedia Proteomics Data

3. ProteomeXchange

ProteomeXchange is a global consortium designed to facilitate the public sharing and dissemination of proteomics data. This website provides a unified framework for researchers to submit, access, and explore high-quality proteomics datasets, ensuring transparency and reproducibility in scientific research. ProteomeXchange supports standardized data submission and retrieval, enabling seamless integration with downstream analysis tools.

ProteomeXchange

4. MassIVE

MassIVE is a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data. MassIVE datasets can be assigned ProteomeXchange accessions to satisfy publication requirements.

MassIVE

5. PeptideAtlas

PeptideAtlas is a comprehensive resource for the global proteomics community, offering a curated repository of high-quality mass spectrometry-based proteomics data. It aggregates and standardizes peptide and protein identifications from a wide range of experiments, creating an accessible and reliable reference database. By mapping peptides to annotated genomes and proteomes, PeptideAtlas supports researchers in understanding protein expression, modification, and function across diverse biological contexts.

PeptideAtlas

6. PRIDE (PRoteomics IDEntification database)

PRIDE (PRoteomics IDEntifications Database) is a leading public repository for proteomics data, maintained by the European Bioinformatics Institute (EMBL-EBI). It serves as a central hub for the deposition, sharing, and exploration of mass spectrometry-based proteomics datasets. PRIDE supports open science by enabling researchers to store and access protein and peptide identifications, post-translational modifications, and quantitative data.

PRIDE

7. NCI Proteomic Data Commons

The objectives of the National Cancer Institute’s Proteomic Data Commons (PDC) are (1) to make cancer-related proteomic datasets easily accessible to the public, and (2) facilitate direct multiomics integration in support of precision medicine through interoperability with accompanying data resources (genomic and medical image datasets). For each study hosted on this page, users can find a variety of data type files including both proteomics and genomics. For the proteomics data, in most cases, it hosts both raw and processed data.

NCI Proteomic Data Commons

8. Proteome-Phenome Atlas

This website is a public repository of protein-phenotype associations, providing full results of the phenotypic and genomic associations between ~3,000 plasma proteins and ~1,000 health-related phenotypes for ~50,000 adults in the UK Biobank.

Proteome-Phenome Atlas