Decoding Tissue Complexity: The Pioneering IRIS Method in Spatial Transcriptomics

Interviews
Published

June 17, 2024

Decoding Tissue Complexity: The Pioneering IRIS Method in Spatial Transcriptomics Interview Image

The Article Link:

Accurate and efficient integrative reference-informed spatial domain detection for spatial transcriptomics
Interviewee Name

Dr. Xiang Zhou

Dr. Xiang Zhou is a Professor of Biostatistics and Assistant Director of Precision Health at the University of Michigan. He received his BS in Biology from Peking University in 2004 and PhD in Neurobiology from Duke University in 2010. He was a Postdoctoral Scholar with Dr. Matthew Stephens in the Departments of Statistics and Human Genetics at the University of Chicago from 2010 to 2013, where he was a Williams H. Kruskal Instructor in the Department of Statistics from 2013 to 2014. Dr. Zhou joined the Department of Biostatistics at Michigan as an Assistant Professor in 2014 and was a John G. Searle Assistant Professor from 2018 to 2019. He was promoted to Associate Professor in 2019 and to Professor in 2023. His primary research is focused on developing statistical methods and computational tools to facilitate substantive scientific research in biomedicine, with a particular focus on genetics and genomics.

Regarding the research background and significance, does this work discover new knowledge or solve existing problems within the field? Please elaborate in detail.

Spatially resolved transcriptomics (SRT) are a set of recently developed technologies that enable gene expression profiling in tissues with spatial context. These technologies have provided unprecedented opportunities for investigating and characterizing the transcriptomic and cellular landscape of complex tissues. As we all know, tissues are complex cellular ecosystems with spatially organized and functionally distinct anatomical domains and microenvironments, each characterized by unique cell type compositions and transcriptomic heterogeneity. The spatial organization of tissues in the form of local domains facilitates how different cell types coordinate with each other in carrying out tissue functions in development, homeostasis, communication, repair, and signaling responses. Consequently, detecting spatial domains on the tissue becomes a critical task in SRT studies. In this paper, we develop a novel statistical and machine learning method, called Integrative and Reference-Informed tissue Segmentation (IRIS), for spatial domain detection. Different from previous approaches, IRIS leverages single-cell RNA-seq data for reference-informed domain detection, incorporates cell type composition as key features to both substantially improve accuracy and enhance biological interpretability of the detected spatial domains, and integrates multiple SRT tissue slices while explicitly considering correlations both within and across slices. As a result, IRIS is accurate, scalable, interpretable, and robust. In real data applications, IRIS achieves substantial accuracy gains and speed improvements in moderate-sized datasets, while representing the only method currently applicable for large datasets including stereo-seq and 10x Xenium. IRIS allows biologists to reveal intricate brain structures, uncover tumor microenvironment heterogeneity, and detect structural changes in diabetes-affected testis, all with exceptional speed and accuracy not seen by existing approach. Our study thus showcases how statistics and machine learning methods can facilitate biological discoveries.

How did the reviewers evaluate (praise) it?

The review process was straightforward, involving effectively one round of revisions. Reviewers found our method to be novel, concise, and efficient, with comprehensive benchmarking comparisons. The revision primarily focused on incorporating additional comparisons and evaluations, as well as enhancing clarity throughout the manuscript.

If this achievement has potential applications, what are some specific applications it might have in a few years?

The approach opens new avenues for biologists to delve into the intricate architecture of complex tissues, providing opportunities to investigate the dynamic processes shaping tissue structure during development and disease progression. By characterizing refined tissue structures and examining their potential alterations during disease states, IRIS has the potential to provide essential mechanistic insights vital for understanding the structural and functional changes underlying various diseases. Therefore, IRIS not only advances our understanding of tissue biology but also presents promising avenues for the development of therapeutic intervention and disease management in the near future.

Can you recount the specific steps or stages from setting the research topic to the successful completion of the research?

The IRIS paper represents the 3rd thesis chapter from my former PhD student, Ying Ma, who is now an Assistant Professor of Biostatistics at Brown University. Several years ago, we embarked on three projects in our lab, aiming to integrate spatial correlation information into the modeling of spatial transcriptomics data, using three distinct statistical modeling approaches. The first employed a Potts model to address spatial correlation across tissue locations, eventually leading to the development of a multi-scale Bayesian hierarchical modeling method named BASS. The second utilized the Gaussian kernel as a prior specified on the latent features in dimension reduction, aiming to encourage similarity in the latent space for neighboring locations. This resulted in a spatially aware dimension reduction method known as SpatialPCA. This project of IRIS, our third endeavor, hinges on a conditional autoregressive model within the penalized regression framework to incorporate spatial similarity. This third endeavor is particularly challenging, as it seeks to integrate a distinct single-cell RNA sequencing dataset into spatial transcriptomics and estimate cell type composition to delineate tissue into multiple domains. Faced with the initial complexity of the task, we opted for a two-step approach: we develop one method for inferring cell type compositions as Ying’s 2nd thesis chapter, then extending such method to further carry out precise domain detection as her 3rd thesis chapter. Therefore, in a sense, IRIS represents a continuation and extension of Ying's earlier method, CARD, towards a different and more challenging application. For IRIS, Ying's exceptional independence, persistence, and meticulous approach to analysis became instrumental in driving this project forward. Ying's unwavering dedication and innovative spirit have been the key to pushing the boundaries and spearheading the development of IRIS.

Were there any memorable events during the research? You can tell a story about anything related to people, events, or objects.

I vividly recall the moment when Ying first shared her initial IRIS accuracy results with me—we were super excited. It was a delightful surprise to see the substantial accuracy gains achieved by IRIS compared to existing approaches. As some background, there is a standard dataset DLPFC in the field that all methods use for benchmarking. Existing methods typically yield accuracies ranging from 0.4 to 0.5, with the highest scores from a couple recent methods hovering around 0.5 to 0.6. Therefore, Ying's early implementation of IRIS, boasting an accuracy exceeding 0.7, was truly remarkable and represented the highest score ever achieved in this extensively studied benchmarking dataset. Recognizing the significance of this achievement, we sought additional datasets with potential ground truth for additional evaluation, leading Ying to discover the spermatogenesis dataset, where cells are organized in circular structures. IRIS's unique ability to detect these biologically expected circular structures further solidified our confidence in its efficacy and potential. Finally, developing an effective and accurate method is challenging, but equally daunting is crafting one that is computationally efficient. Ying's proficiency in computational implementation has positioned our method as the sole contender capable of scaling to analyze vast transcriptomics datasets, including stereo-seq and 10x Xenium. Both accuracy and scalability are key for the success of this project.

Is there a follow-up plan based on this research? If so, please elaborate.

IRIS presents exciting opportunities for large-scale spatial omics studies. One crucial future avenue for exploration is the utilization of spatial domains detected across various tissues to identify potential structural changes and alterations occurring during disease development. Detecting these structural changes will help elucidate alterations in tissue organization underlying disease etiology and thus facilitate mechanistic understanding. Additionally, we are actively investigating the incorporation of multiple single-cell references for deconvolution. By leveraging multiple references, we aim to enhance spatial domain detection capabilities, ultimately leading to more comprehensive insights into tissue organization and disease pathology.

Without a doubt, AI is one of the hot topics of 2023, requiring extensive data support in its development. What assistance can biostatistics offer to the development of AI?

As biostatisticians, we hold a pivotal role in the realm of data analytics for biomedical discoveries. Given the vast potential of AI as a computational tool, it is important that we lead its application within the biomedical and health sciences. We are equipped with a strong scientific background, statistical expertise, computational skills, and collaborative resources, effectively all the necessary foundations to propel the field forward. Therefore, I believe that biostatisticians should and will play a central role in both the development and application of AI in biomedical and health sciences.

To realize this, it is crucial that we encourage our students to embrace AI technology and lead its future development in biomedical and health sciences. To do that, we should teach AI from a statistical modeling perspective to facilitate deep understanding as well as detailed implementation for effective practical application. For example, in my machine learning course last semester, I introduced fundamental deep learning methods such as auto-encoders and convolutional neural networks from non-linear modeling perspective to facilitate the understanding of these models. I also taught detailed backpropagation algorithms for computing the gradients in deep neural networks so that students know how these algorithms are derived in detail and implement them effectively in practice. Besides teaching AI, we should also encourage students to cultivate a strong computing and engineering background to fully realize the potential of AI in biomedical contexts. For example, our department recently created an MS program in health data science, which is distinct from traditional biostatistics, with a heightened emphasis on computation and modeling beyond standard regression settings. These initiatives are crucial steps toward advancing AI in biomedicine and preparing the next generation of biostatisticians and data scientists for the challenges ahead.

Finally, it is critical for our field to embrace a practical science and engineering perspective. Our focus should not solely be on model or theory development for its own sake but rather on model development for solving real-world problems effectively. It is imperative to prioritize the effectiveness and practical application of statistical methods in real data settings, rather than fixating on theoretical scenarios that may never happen in practice.

Besides the above questions, is there anything else about this achievement that you would like to add? If so, please add it below.

Asking the right biological questions is key for data analytics in biomedical research. As biostatisticians, we should actively engage with biologists to directly ask and address pertinent biological inquiries, rather than passively waiting for them to provide questions to us for modeling. While collaboration is integral to our work, it is equally crucial for biostatisticians to take the lead in scientific discoveries, rather than merely participating in them. At the University of Michigan, our biostatistics department stands as a leader in the field of statistical genetics and genomics, with many prominent figures highly respected and influential. These leaders spearheaded multiple large-scale genetic and genomic projects that have significantly impacted and shaped the scientific landscape of genetics and genomics, serving as role models for our generation. It's my hope that our generation of biostatisticians will strive to uphold the high standards set by these pioneers, ensuring that our field continues to drive methodological innovation and advance biological knowledge in biomedical research.

Edited by: Shan Gao
Proofread by: Hongtu Zhu
Page Views: