In Conversation: Cancer Transcriptomic Deconvolution

A guide to transcriptomic deconvolution in cancer presents a systematic framework for selecting and applying deconvolution methods in cancer research, detailing 43 methods and their applications in tumor subtyping, biomarker discovery, and spatial transcriptomics. Nature Reviews Cancer 26, 84–103 (2026).

Link to the paper

Q&A

Regarding the background and significance of this review, what major themes, unresolved questions, or emerging directions in the field does this review summarize or synthesize? Please elaborate in detail.

This review synthesizes three interconnected themes that reshape cancer transcriptomics. First, we address the critical need to bridge single-cell and bulk RNA-seq data, recognizing that while single-cell approaches excel at resolving cellular heterogeneity, bulk RNA-seq remains indispensable for large-scale clinical studies with long-term outcome data. Second, we systematically document why cancer presents unique computational challenges compared to normal tissues, particularly tumor cell plasticity and the profound heterogeneity that violates core assumptions of methods developed for healthy tissues. Third, we demonstrate how deconvolution translates from methodology to clinical impact through applications in tumor subtyping, treatment response prediction, and biomarker discovery.

The major unresolved questions center on what we call the "missing reference" problem: how to create comprehensive profiles that capture dynamic tumor cell states rather than assuming stable expression patterns. We also identify critical benchmarking gaps, as most validation studies use peripheral blood or cell lines rather than actual tumor tissues, and the ideal approach of generating matched bulk and single-cell data from identical samples remains technically challenging and rarely performed.

Looking forward, emerging directions include adapting deconvolution for FFPE samples to unlock vast clinical archives, developing temporal models that capture how cellular compositions evolve during treatment, integrating multi-modal data including copy number and methylation patterns, and extending these approaches to spatial transcriptomics to understand not just what cell types are present but where they reside and how they interact within the tumor architecture.

Benefits and limitations of bulk RNA-seq and scRNA-seq data generation.

How did peers or experts evaluate (or praise) the value of this review? For example, did they highlight the clarity of synthesis, the timeliness of the topic, or the importance of the perspectives offered?

Reviewers particularly valued our comprehensive cataloging of 43 deconvolution methods with clear applicability classifications and the decision-tree framework in Figure 4, which transforms what could have been a technical comparison into actionable guidance for method selection. They emphasized our cancer-specific focus, noting that while many existing reviews use peripheral blood or cell-line mixtures for validation, we systematically address the unique challenges of tumor tissues including cellular plasticity and microenvironment complexity. Our balanced and transparent approach was repeatedly commended, especially our honest assessment of method limitations and clear documentation of which methods have rigorous cancer-specific validation versus those claiming applicability without proper benchmarking. The collaborative expertise spanning our fifteen-year method development history with DeMix, DeMixT, and DeMixSC to broader cancer genomics perspectives was recognized as strengthening both technical depth and practical relevance, creating a resource that bridges computational biologists with experimental researchers and clinicians.

If the perspectives or frameworks proposed in this review have potential applications, what are specific ways they may shape research or clinical practice in the next few years?

In the near term, this review enables researchers to re-analyze thousands of existing bulk RNA-seq datasets from TCGA and other sources to extract tumor microenvironment composition and tumor-specific signals, potentially discovering new biomarkers without generating new data. Our framework provides a roadmap for clinical trials already incorporating deconvolution as secondary outcomes and identifies specific gaps in FFPE optimization, temporal modeling, and rare cell detection that method developers need to address.

Looking three to five years ahead, the clinical impact could be substantial as deconvolution-derived metrics like tumor-specific mRNA abundance and immune cell proportions become validated prognostic markers. As immunotherapy expands, tumor microenvironment profiling through deconvolution could predict treatment response, while extracting tumor-specific expression profiles may enable more accurate molecular subtyping. We hope this review establishes validation standards, serves as an educational resource, and facilitates collaboration between experimentalists and computational researchers by providing a common framework for understanding what these methods can accomplish in cancer research.

Can you describe the steps or stages involved in developing this review article—such as identifying the scope, curating the literature, synthesizing findings, and shaping the conceptual framework?

The review emerged from a practical challenge we observed through years of collaborating with experimental biologists: despite our lab's fifteen-year history developing deconvolution methods, cancer researchers consistently struggled with knowing which method to choose, and many tools developed for normal tissues failed when applied to tumors. This motivated us to create a cancer-specific guide rather than a general methods review. The literature curation was our most intensive phase, systematically identifying 43 relevant methods while establishing clear criteria for what to exclude, ultimately documenting all decisions transparently in supplementary materials.

Developing the conceptual framework requires multiple iterations, constantly asking ourselves whether a cancer researcher without computational training could understand and apply our guidance. We organized the work by expertise areas, with different team members leading sections on single-cell methods, semi-reference approaches, reference-free methods, and clinical applications, while ensuring integration across the entire manuscript. The figures, especially the decision tree guiding method selection, went through numerous revisions to achieve clarity. After submission, reviewer feedback helped us strengthen the cancer-specific emphasis and expand clinical examples, while the final editing involved careful attention to scientific nomenclature and journal style requirements.

Were there any memorable events during the writing of this review? This may include stories related to the research process, collaborations, unexpected insights, or key turning points.

The most significant turning point came when we realized we couldn't simply rank methods as "best" because performance depends so heavily on individual contexts—available reference data, target cell types, research questions. We initially drafted ranking tables but kept finding exceptions that made them meaningless. Accepting this complexity and shifting to decision trees was frustrating at first, since everyone wants simple recommendations, but it made the review genuinely useful.

Coordinating across time zones meant real logistical challenges, and we had substantive disagreements about classifying certain methods that required iterative discussion rather than quick resolution. The debate about documenting excluded methods in Supplementary Table 2 was practical because we'd identified many methods that didn't fit our criteria, and while documenting exclusions took extra work, we chose transparency. What genuinely surprised me was discovering how few clinical trials use deconvolution beyond exploratory analyses despite years of methods development. This gap between computational capabilities and clinical application became a theme shaping our emphasis on translational barriers.

Are there follow-up plans based on this review? For example, do you plan to extend this review into a perspective article, a methodological tutorial, or a series of related reviews?

We're immediately developing practical tutorials on constructing cancer-specific references, interpreting deconvolution results, and avoiding common pitfalls to make these methods more accessible. We're collaborating with scientists generating matched bulk and single-cell datasets to address the benchmarking gaps we identified. Additionally, we're actively developing approaches following suggestions in the future work section of the review. For example, there are missing opportunities in spatial transcriptomic deconvolution, for which we developed DeMixNB; the manuscript is available as a preprint.

AI is one of the major topics of 2023 and beyond, relying heavily on large-scale data and computational integration. From the perspective of biostatistics, how can biostatistics contribute to the development and evaluation of AI in this area?

Biostatistics brings essential rigor to AI development that machine learning alone cannot provide. While AI methods show impressive performance metrics, biostatisticians ensure those metrics reflect real-world performance rather than artifacts of simulated data, assess whether models generalize across cancer types, and provide the uncertainty quantification critical for clinical decisions, since most AI approaches give only point estimates without confidence intervals. We also address data quality issues like batch effects that AI might inadvertently learn, curate training datasets that avoid systematic biases, and detect when reference profiles don't match target tissues.

Is there anything else about this review article that you would like to add—such as unique insights, broader impact, or future outlook?

What makes this review distinctive is its problem-oriented organization around biological questions cancer researchers actually face rather than computational methods themselves, making it actionable for experimentalists starting with cancer biology problems. We deliberately discussed limitations openly, including when deconvolution struggles with rare populations, highly plastic tumor cells, or FFPE samples, because this honesty helps researchers avoid misapplication and builds realistic expectations.

The key insight from developing this review is that our field's biggest barrier isn't computational sophistication but effective translation. We have powerful methods, yet researchers struggle with selection, validation standards remain inconsistent, and pathways to clinical application are unclear. Short-term, we anticipate better cancer-specific benchmarks and FFPE-optimized methods emerging. Longer-term, deconvolution could become routine clinical practice, enabling real-time treatment monitoring. Progress requires not just better algorithms but improved communication between computational and experimental researchers. This review represents our contribution toward building those bridges, helping more researchers extract meaningful insights from bulk transcriptomic data to advance cancer biology and improve patient outcomes.

Dr. Wenyi Wang (left) and Yaoyi Dai (right)

Author Biographies

Yaoyi Dai

MD Anderson Cancer Center · Baylor College of Medicine

Graduate research assistant in the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center and PhD candidate at Baylor College of Medicine. Her research focuses on developing and applying transcriptomic deconvolution methods for cancer genomics, with expertise in integrating multi-omics data for biomarker discovery and tumor microenvironment characterization in translational oncology applications.

Shuai Guo

Emory University

Postdoctoral fellow at Emory University. He completed his graduate training with Dr. Wenyi Wang in the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center, where he specialized in single-cell reference-based deconvolution methods and their application to cancer transcriptomics. His research focuses on developing benchmarking frameworks and computational approaches that address platform-specific biases in integrating single-cell and bulk RNA-seq data.

Yidan Pan

MD Anderson Cancer Center · Van Loo Lab

Data Scientist in the Department of Genetics at MD Anderson Cancer Center under Dr. Van Loo's lab. Her research interests include cancer genomics and the clinical translation of computational methods, with a focus on tumor evolution investigated through spatiotemporal single-cell genomic and transcriptomic analyses in malignant peripheral nerve sheath tumors (MPNST).

Carla Castignani

Francis Crick Institute, London

Graduate Research Assistant at the Francis Crick Institute in London, United Kingdom, and a PhD student with Dr. Peter Van Loo. Her research focuses on computational approaches to understanding tumor evolution and heterogeneity, with expertise in reference-free deconvolution methods and their applications to complex cancer genomics datasets.

Matthew D. Montierth

MD Anderson Cancer Center · Wang Lab

Data Scientist in the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center under Dr. Wang's lab. He completed his PhD training with Dr. Wenyi Wang at Baylor College of Medicine, with research focusing on semi-reference-based deconvolution methods, subclonal reconstruction, and statistical approaches to modeling tumor heterogeneity in cancer transcriptomics.

Peter Van LooPI

MD Anderson Cancer Center · Professor, CPRIT Scholar

Dr. Van Loo is a Professor and CPRIT scholar at the University of Texas MD Anderson Cancer Center, Department of Genetics, with a joint appointment at the Department of Genomic Medicine. His research focuses on leveraging massively parallel sequencing efforts to study the evolutionary history of cancers. During his postdoctoral training, Dr. Van Loo developed computational techniques to study copy-number alterations in cancer genomes, and approaches to study the evolutionary history and subclonal architecture of tumors from whole-genome sequencing data, a field coined molecular archaeology of cancer. As an independent researcher, first as a Group Leader at the Francis Crick Institute in London, UK, and later as a Professor at the University of Texas MD Anderson Cancer Center, Dr. Van Loo has sketched the typical evolutionary trajectories of many cancer types, allowing insight into the timelines of cancer development, as well as insight into how tumors metastasize. He was the main lead of Evolution and Heterogeneity working group of the Pan-Cancer Analysis of Whole Genomics (PCAWG) Consortium and is the genomics lead of the Sarcoma arm of the 100,000 Genomes Project.

Wenyi WangPI

MD Anderson Cancer Center · Professor

Dr. Wang is a Professor of Bioinformatics and Computational Biology and Biostatistics at the University of Texas MD Anderson Cancer Center. She received her PhD from Johns Hopkins University and performed postdoctoral training in statistical genomics and genome technology at UC Berkeley with Terry Speed and at Stanford with Ron Davis. Wenyi's research includes significant contributions to statistical bioinformatics in cancer, including MuSE for subclonal mutation calling, DeMixT for transcriptome deconvolution, Famdenovo for de novo mutation identification, and more recently, a pan-cancer characterization of genetic intra-tumor heterogeneity in subclonal selection. Her group is focused on the development and application of computational methods to study the evolution of the human genome as well as the cancer genome, and further develop risk prediction models to accelerate the translation of biological findings to clinical practice.