Interview with Award-Winning Dr. Irina Gaynanova on the interface between statistics and AI

Interviews
Published

March 13, 2025

Interview with Award-Winning Dr. Irina Gaynanova on the interface between statistics and AI Interview Image

1. Can you summarize your award-winning research and its significance for statistics and AI?

Thank you for the opportunity to discuss my research. I am honored to be among the recipients of the 2025 IMS Thelma and Marvin Zelen Emerging Women Leaders in Data Science Award, which recognizes women researchers’ leadership and contributions to data science.
My research focuses on developing statistical methods for multi-view data integration. With the growing availability of data collected on the same subjects from multiple sources—such as gene expression, DNA methylation, and metabolomics in multi-omics studies, or continuous glucose monitoring (CGM), blood pressure, heart rate, and actigraphy in wearable studies—there is a pressing need for principled approaches to uncovering shared structures across these diverse data modalities.
The award cites our group’s work in developing low-rank matrix models that allow partial sharing in column spaces and mixed Gaussian copula models that account for different variable types (binary, continuous, zero-inflated, ordinal). These methods are rooted in statistics but align closely with some of the AI models’ principles, such as structure learning and latent space modeling.
I am particularly proud that the award recognizes our contributions to open-source software. No method is truly impactful without accessible and efficient implementations, and my team has developed multiple R packages, often incorporating optimized C and C++ code for scalability. Notably, our R package IGLU extracts glycemic metrics from continuous glucose monitoring (CGM) data and has seen community involvement from 22 contributors, including many former students. While these metrics are themselves valuable and often serve as primary endpoints in clinical trials with CGM data, they often also serve as inputs for AI-based models, including glucose forecasting models. Our group has spent significant efforts ensuring the extracted metrics' reliability, benchmarking existing glucose forecasting models, and curating public CGM datasets for centralized evaluation.

2. What statistical methods from your research are most applicable to AI?

Much of my work is fundamentally grounded in statistics and driven by scientific inquiry rather than AI. However, my interest in AI has deepened recently, primarily due to the work of a former student, Renat Sergazinov, who led our CGM forecasting efforts and helped me appreciate that, whether I like it or not, AI models can be highly effective, and I should at least try to understand why. So far, we have primarily concentrated on the accessibility and reproducibility of AI models in the context of glucose forecasting. Our group curates an Awesome-CGM repository on GitHub, which is a starting point for exploring publicly available CGM data, and we also maintain a public GlucoBench repository with standardized processing and the implementation of several benchmark models for glucose forecasting. Looking ahead, I am excited about potentially integrating my methodological work on structure learning with advancements in AI to explore latent representations across diverse data modalities.

3. How does your work address challenges at the intersection of statistics and AI?

My methodological work focuses on learning interpretable, lower-dimensional latent representations of input data. While AI models typically generate these representations in "black-box" settings to facilitate downstream tasks such as prediction, I emphasize learning them in more "open-box" settings, where interpretability is prioritized. I admire and am deeply inspired by Cynthia Rudin’s work on interpretable AI models for high-stakes decision-making. I firmly believe that, especially in contexts related to human health, interpretable models are preferable to “fancy” models, particularly when their performance is comparable. I am excited about the potential to incorporate more interpretable structural elements within AI modes to improve their performance and transparency.

4. What impact do you foresee your research having on the future of AI systems?

Although I can’t predict the future, my work addresses scientific challenges, with AI as a tool rather than the primary focus. I believe the most significant impact of my research on AI will come from emphasizing reproducibility and open-source software development, which promote transparency and accountability.

5. What emerging trends in statistics do you believe are crucial for advancing AI?

AI models are advancing rapidly, and statisticians often face tools that exceed initial expectations. This pace can feel overwhelming, but I see particular value in areas like uncertainty quantification for black-box model predictions, understanding AI biases, ensuring fairness, protecting sensitive data, transferring knowledge across domains, and developing rigorous models for unstructured data. Of particular interest to me is the work on identifiability. AI models, particularly transformers, excel at learning unstructured representations, but making them more interpretable and identifiable holds great promise, especially in biomedical applications. In healthcare, I don’t just want to know if a model predicts well—I want to understand how it reaches those predictions. Ensuring transparency, interpretability, and external validation of these models is critical, so reproducibility and open science remain central to my work.
Edited by: Shan Gao
Proofread by: Hongtu Zhu, Jian Kang
Page Views: