2025 Workshop at BIRS: Day 1 Recordings
๐ 2025 Workshop at BIRS: Overview
Foundation Models and Their Biomedical Applications: Bridging the Gap
๐ Banff International Research Station (BIRS), Banff, Alberta, Canada
Event Website: 2025 Workshop Homepage ยท Dates: Aug 17โ22, 2025
๐ฌ Talks โ Quick Looks, Full Notes & Recordings
- ๐ 2025 Workshop at BIRS: Day 1 Recordings (Monday, Aug 18) ๐
- ๐ 2025 Workshop at BIRS: Day 2 Recordings (Tuesday, Aug 19)
- ๐ 2025 Workshop at BIRS: Day 3 Recordings (Wednesday, Aug 20)
- ๐ 2025 Workshop at BIRS: Day 4 Recordings (Thursday, Aug 21)
โฉ๏ธ Read more on Stats Up AI ๐ฐ Community News
โถ๏ธ Day 1 Recordings: Morning Session
๐ค Hongtu Zhu: Causal Generalist Medical AI
๐
Monday, August 18, 2025 โข ๐ 09:05 โ 09:45
๐๏ธ The University of North Carolina at Chapel Hill
What It Does: Introducing Causal Generalist Medical AI, which integrates causal reasoning with generalist AI models to improve interpretability, robustness, and generalizability in medical decision-making.
๐ Read more
Introduction: The rapid evolution of flexible and reusable artificial intelligence (AI) models is transforming medical science. We will introduce Causal Generalist Medical AI (Causal GMAI)โa paradigm that integrates causal inference with generalist AI models to enhance interpretability, robustness, and generalizability in medical decision-making. Causal GMAI employs self-supervised, semi-supervised, and supervised learning on diverse multimodal datasetsโincluding imaging, electronic health records, clinical trials, laboratory results, genomics, knowledge graphs, and medical textโto perform a wide range of tasks with minimal task-specific supervision. By embedding causal reasoning, these models go beyond prediction to infer underlying causal relationships, improving diagnostic accuracy, treatment recommendations, and personalized medicine. The course covers key technical components such as causal discovery, counterfactual reasoning, and domain adaptation, alongside real-world applications. We will also explore challenges in regulation, validation, and dataset curation to ensure clinical reliability and ethical deployment. Designed for researchers, clinicians, data scientists, and AI practitioners, this course provides a foundation for advancing the next generation of trustworthy and interpretable medical AI.
๐ค Shu Yang: Integrating Diverse Evidence Sources in Clinical Research: Bridging Randomized Trials and Real-World Data
๐
Monday, August 18, 2025 โข ๐ 09:52 โ 10:24
๐๏ธ North Carolina State University
What It Does: Presenting a systematic framework for integrating randomized clinical trials and real-world data to enhance precision medicine development.
๐ Read more
Introduction: The 21st Century Cures Act, enacted in 2016, empowers the U.S. Food and Drug Administration to accelerate the development and evaluation of new medical treatments by leveraging real-world data (RWD) and real-world evidence. With the increasing availability of both randomized clinical trials (RCTs) and RWD, integrating these heterogeneous sources of evidence offers unique opportunities to address clinical questions that neither can answer in isolation. This talk presents a systematic framework for combining evidence from RCTs and real-world studies, with a focus on enhancing precision medicine development. The presentation will cover the following key topics: 1) The Evolving Landscape of RWD in Clinical Research: An overview of how RWD is used across different stages of the clinical development lifecycle and study designs. 2) Key Objectives for Integrative Evidence Synthesis: how integration of RCT and RWD can improve the generalizability and transportability of RCT findings, increase the efficiency and statistical power of treatment effect estimation, and enable long-term safety and effectiveness monitoring. 3) A Causal Roadmap for Evidence Integration: a causal inference perspective to articulate the assumptions, identification strategies, and inferential goals when combining RCTs and RWD. Emphasis is placed on the importance of understanding and addressing bias, especially in regulatory contexts. 4) The Role of AI/ML and Statistical Rigor: While artificial intelligence and machine learning offer powerful tools for data integration and prediction, rigorous statistical thinking remains paramount for valid causal inference and bias mitigation. 5) Innovative Trial Designs: recent advances in hybrid controlled trials using external real-world controls, such as test-then-pool procedures, selective borrowing, and conformal prediction approaches, with a focus on improving efficiency while safeguarding validity. 6) Challenges and Opportunities: conclusion by examining unresolved challenges and outlining future research directions.
๐ค Jian Huang: Advancing Statistical Frontiers: Leveraging Large Models in Statistical Analysis
๐
Monday, August 18, 2025 โข ๐ 10:44 โ 11:20
๐๏ธ The Hong Kong Polytechnic University
What It Does: Examining how large-scale machine learning models, including foundation models, can address contemporary statistical challenges with illustrative examples.
๐ Read more
Introduction: The advent of large-scale machine learning models, including deep neural networks and foundation models, is fundamentally reshaping the field of statistics. In this talk, we will examine how these powerful models can be leveraged to tackle contemporary statistical challenges. Through illustrative examples, including conditional generative learning, functional protein sequence generation, synthetic data augmentation, and code-free data analysis, we will highlight both the opportunities and the challenges that large models introduce to statistics.
๐ค Yong Chen: Causal AI Beyond Randomized Controlled Trials: Negative Control Calibration and Federated Learning
๐
Monday, August 18, 2025 โข ๐ 11:23 โ 11:55
๐๏ธ University of Pennsylvania
What It Does: Proposing a framework using negative control calibration and federated learning to strengthen causal inference beyond randomized controlled trials.
๐ Read more
Introduction: Causal inference from real-world data (RWD) is often threatened by unmeasured confoundingโbiases that remain even after adjusting for observed covariates. To address this fundamental challenge, we introduce a framework leveraging negative control outcomes (NCOs) to diagnose and correct for hidden biases. Our approach improves the validity of causal effect estimates across a wide range of adjustment strategies, offering a practical and generalizable path toward trustworthy inference.
However, single-site analyses are often limited by insufficient sample sizes and lack of generalizability, especially when studying rare outcomes. To overcome this barrier, we further integrate federated learning, enabling multi-site collaboration without sharing patient-level data. This allows us to scale NCO-based calibration across institutions, while also supporting privacy-preserving subphenotyping and target trial emulation.
Together, these tools form the backbone of a Causal AI framework that is both debiased and distributedโcapable of delivering robust, generalizable insights across fragmented health data ecosystems. Applications include vaccine safety surveillance, drug repurposing, and international oncology research, demonstrating the real-world impact of combining statistical rigor with collaborative infrastructure.
โถ๏ธ Day 1 Recordings: Afternoon Session 1
๐ค Ross Mitchell: Applications of Foundational Models in Medical Imaging
๐
Monday, August 18, 2025 โข ๐ 13:33 โ 14:05
๐๏ธ University of Alberta
What It Does: Demonstrating applications of foundational models in medical imaging for classification, segmentation, and uncertainty modeling across large-scale datasets.
๐ Read more
Introduction: This presentation explores the application of foundational models to medical imaging challenges, addressing both classification and segmentation tasks in label-limited scenarios. The first portion provides an overview of current foundational models in medical imaging, including Microsoftโs Medical Imaging Intelligence (MII) model and similar architectures that have transformed the field through large-scale pre-training on diverse medical datasets. We examine how these models leverage self-supervised learning and multi-modal training to achieve remarkable performance across various medical imaging tasks, establishing new benchmarks for diagnostic accuracy and clinical utility.
The second portion presents our probabilistic approach to medical image segmentation using 40,000 3D CT scans from 10,000 patients for intestinal tract segmentation. We employ multiple foundational image segmentation models to train a probabilistic 3D U-Net that explicitly models uncertainty in ground truth annotations, learning per-voxel probability distributions across segmentation tasks. We demonstrate how this probabilistic framework extends naturally to challenging applications where ground truth labels are unavailable or prohibitively expensive, specifically applying our method to organs-at-risk segmentation in cone-beam CT scans for head and neck cancer radiotherapy planning.
๐ค Yuehua Cui: Addressing some challenges in spatial transcriptomics: spatial deconvolution, gene variability, and domain detection
๐
Monday, August 18, 2025 โข ๐ 14:06 โ 14:26
๐๏ธ Michigan State University
What It Does: Developing statistical and deep learning methods to address challenges in spatial transcriptomics, including deconvolution, gene variability, and domain detection.
๐ Read more
Introduction: Spatial transcriptomics (ST) provides crucial insights into tissue-specific gene expression patterns in various studies. In this talk, I will focus on three major tasks in ST data analysis: spatial deconvolution, spatial and temporal gene identification, and spatial domain detection. For spatial deconvolution, I will present a reference-free deconvolution method for spot-level ST data, such as those obtained from the 10x Visium platform, to infer cell type compositions in each spot. While recent methodological developments have greatly advanced the detection of spatially variable genes (SVGs), whose expression patterns are non-random across tissue locations, such SVGs do not reveal cellular heterogeneity in a spatial context. Following spatial deconvolution, I will introduce a unified approach to identify both SVGs and cell type-specific SVGs (ctSVGs), under a linear mixed-effect model framework. I will further show how we can incorporate cell trajectory information to identify genes showing spatial and temporal variation, offering critical insight into tumor progression and dynamics. Finally, I will present a deep learning framework for improved spatial domain detection.
๐ค Ting Li: Principal Component Analysis in Geodesic Space
๐
Monday, August 18, 2025 โข ๐ 14:27 โ 14:57
๐๏ธ Hong Kong Polytechnic University
What It Does: Extending principal component analysis into geodesic spaces to analyze complex data structures such as neuroimaging.
๐ Read more
Introduction: Principal Component Analysis (PCA) has been widely applied and extensively studied. However, it presents significant challenges to extend it for complex data in metric spaces. In this work, we propose Geodesic-PCA (G-PCA), a unified framework that extends PCA to geodesic spaces beyond manifolds. We develop robust and optimal theoretical results for G-PCA and validate its reliability and effectiveness through extensive simulations. In practical applications, we apply G-PCA to analyze brain corpus callosum and task-fMRI data, demonstrating its potential in fields such as neuroimaging.
โถ๏ธ Day 1 Recordings: Afternoon Session 2
๐ค Peter Song: Challenges in Calculating Epigenetic Age
๐
Monday, August 18, 2025 โข ๐ 15:31 โ 16:00
๐๏ธ University of Michigan
What It Does: Addressing challenges in calculating epigenetic age by refining predictive models with high-resolution data, uncertainty quantification, and multiple clocks.
๐ Read more
Introduction: DNA methylation (DNAm) has emerged as a key source of omics data for assessing epigenetic age, offering a wealth of genetic markers that reflect cellular changes influenced by social and environmental factors. Epigenetic age can be estimated through predictive models known as epigenetic clocks, which rely on high-dimensional data analytics. However, current epigenetic age calculators face significant limitations as DNAm data collection technology rapidly advances. In this talk, I will present approaches to tackle a few data science challenges, including refining epigenetic clocks with higher-resolution DNAm data using convolutional neural networks, quantifying prediction uncertainty using conformal prediction techniques to address increasing variability over aging, and combining multiple epigenetic clocks. This presentation will integrate both computational methodologies and algorithmic solutions, demonstrated through real-world data applications.
๐ค Ting Ye: Multimodal AI for Predicting Atrial Fibrillation and Heart Failure Using ECG and Cardiac MRI
๐
Monday, August 18, 2025 โข ๐ 16:07 โ 16:28
๐๏ธ University of Washington
What It Does: Introducing a multimodal deep learning framework that integrates ECG and cardiac MRI to improve prediction of atrial fibrillation and heart failure.
๐ Read more
Introduction: Atrial fibrillation (AF) and heart failure (HF) are leading causes of cardiovascular morbidity, mortality, and healthcare burden worldwide. Early detection of individuals at elevated risk, especially those who are asymptomatic, is critical for timely intervention and disease prevention. In this talk, I will present a novel multimodal deep learning framework that integrates electrocardiogram (ECG) and cardiac magnetic resonance imaging (MRI) data to jointly learn shared and modality-specific representations. By combining the temporal features of ECG with the structural insights from Cardiac MRI, the model significantly improves predictive performance on key clinical tasks, including predicting the onset of AF and HF. The framework is developed and evaluated using data from the UK Biobank, demonstrating the potential of multimodal AI to enhance cardiovascular risk stratification and inform targeted prevention strategies.
๐ค Zhengwu Zhang: Representation Learning and Generative Models in Network Data Analysis
๐
Monday, August 18, 2025 โข ๐ 16:29 โ 16:50
๐๏ธ The University of North Carolina at Chapel Hill
What It Does: Demonstrating how representation learning with VAEs and generative models can extract meaningful embeddings from brain networks to advance neuroimaging.
๐ Read more
Introduction: Brain network analysis grapples with high-dimensional, complex data. This talk will focus on how representation learning, particularly through Variational Auto-Encoders (VAEs), offers a powerful framework to extract meaningful low-dimensional embeddings from brain networks. We will delve into how VAEs learn latent representations that not only allow for accurate reconstruction of the original network data but also serve as a versatile foundation for diverse downstream tasks, such as predicting human traits or disentangling nuisance factors. We will showcase some generative models, e.g., Graph Auto-Encoding (GATE), which characterizes brain graph population distributions to improve cognitive trait prediction and a motion-invariant VAE (inv-VAE) that learns representations robust to motion artifacts in structural connectomes. Ultimately, this talk will demonstrate the transformative potential of generation-driven representation learning for advancing neuroimaging analyses and deepening our understanding of brain structure and function.
๐ค Yumou Qiu: Physics-informed Statistical Data Fusion for Reconstructing 3D Current Fields of Oceanic Eddies
๐
Monday, August 18, 2025 โข ๐ 16:53 โ 17:15
๐๏ธ Peking University
What It Does: Applying a physics-informed transfer learning framework to integrate multi-source ocean data for reconstructing 3D current fields and guiding real-time field campaigns.
๐ Read more
Introduction: Accurate reconstruction of three-dimensional ocean current fields is critical for understanding ocean dynamics and real-time control of modern oceanographic field campaigns, particularly in mesoscale eddy environments. We propose a physics-informed transfer learning framework, designed for multi-source data fusion, to estimate the three-dimensional current structure of oceanic eddies by integrating satellite altimetry and temperature data, ocean reanalysis data, and in situ drifting buoy observations. The approach leverages geostrophic balance, derived from the Navier-Stokes equations, to guide a neural network trained on GLORYS reanalysis data in inferring subsurface currents from surface conditions. The surface conditions were estimated using a high-dimensional linear mixed model, which integrates systematically biased satellite altimetry and sparse drifting buoy data, allowing for spatially adaptive bias correction and yielding more accurate and spatially coherent surface velocity fields. This framework was deployed during a September 2024 field campaign targeting a cyclonic eddy in the Kuroshio Extension, guiding the real-time control of seven underwater gliders. Compared with existing data products, our method reduced RMSE by over 40% in cross-validation with drifting buoys and showed improved consistency with ADCP observations. The resulting glider trajectories provided enhanced spatial coverage of the eddy interior, enabling the first fine-scale three-dimensional survey of an eddy in the Kuroshio Extension region.
๐ Watch All Recordings
- StatsUpAI YouTube Channel: Subscribe for updates
- BIRS Official Videos Page: 2025 Workshop Videos
- Direct Video Downloads: BIRS Video Server
AI is rapidly reshaping biomedical research by integrating diverse data, accelerating discovery, and supporting decision-making under uncertainty. With statisticians at the forefront, these applications gain the depth, rigor, and reliability needed to truly transform science and medicine.