2025 Workshop at BIRS: Day 1 Recordings

Events
Videos
Monday, August 18 ยท Day 1 of the 2025 BIRS Workshop โ€œFoundation Models and Their Biomedical Applications: Bridging the Gapโ€
Published

August 19, 2025

Stats Up AI Stats Up AI YouTube

Visit the Stats Up AI Channel for More

๐Ÿ 2025 Workshop at BIRS: Overview

Foundation Models and Their Biomedical Applications: Bridging the Gap

๐Ÿ“ Banff International Research Station (BIRS), Banff, Alberta, Canada
Event Website: 2025 Workshop Homepage ยท Dates: Aug 17โ€“22, 2025

๐ŸŽฌ Talks โ€” Quick Looks, Full Notes & Recordings

โฎ• Full program: 2025 Workshop Schedule

โ†ฉ๏ธŽ Read more on Stats Up AI ๐Ÿ“ฐ Community News

โ–ถ๏ธ Day 1 Recordings: Morning Session

๐ŸŽค Hongtu Zhu: Causal Generalist Medical AI

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 09:05 โ€“ 09:45
๐Ÿ›๏ธ The University of North Carolina at Chapel Hill

Keywords: Causal inference, generalist AI, medical decision-making, interpretability, robustness, generalizability, multimodal datasets, causal discovery, counterfactual reasoning, domain adaptation, clinical reliability
What It Does: Introducing Causal Generalist Medical AI, which integrates causal reasoning with generalist AI models to improve interpretability, robustness, and generalizability in medical decision-making.
๐Ÿ“– Read more

Introduction: The rapid evolution of flexible and reusable artificial intelligence (AI) models is transforming medical science. We will introduce Causal Generalist Medical AI (Causal GMAI)โ€”a paradigm that integrates causal inference with generalist AI models to enhance interpretability, robustness, and generalizability in medical decision-making. Causal GMAI employs self-supervised, semi-supervised, and supervised learning on diverse multimodal datasetsโ€”including imaging, electronic health records, clinical trials, laboratory results, genomics, knowledge graphs, and medical textโ€”to perform a wide range of tasks with minimal task-specific supervision. By embedding causal reasoning, these models go beyond prediction to infer underlying causal relationships, improving diagnostic accuracy, treatment recommendations, and personalized medicine. The course covers key technical components such as causal discovery, counterfactual reasoning, and domain adaptation, alongside real-world applications. We will also explore challenges in regulation, validation, and dataset curation to ensure clinical reliability and ethical deployment. Designed for researchers, clinicians, data scientists, and AI practitioners, this course provides a foundation for advancing the next generation of trustworthy and interpretable medical AI.

๐ŸŽฌOpen the video directly

๐ŸŽค Shu Yang: Integrating Diverse Evidence Sources in Clinical Research: Bridging Randomized Trials and Real-World Data

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 09:52 โ€“ 10:24
๐Ÿ›๏ธ North Carolina State University

Keywords: Clinical trials, real-world data, precision medicine, evidence integration, statistical framework, 21st Century Cures Act, FDA, causal inference, bias assessment, hybrid trial designs, conformal prediction
What It Does: Presenting a systematic framework for integrating randomized clinical trials and real-world data to enhance precision medicine development.
๐Ÿ“– Read more

Introduction: The 21st Century Cures Act, enacted in 2016, empowers the U.S. Food and Drug Administration to accelerate the development and evaluation of new medical treatments by leveraging real-world data (RWD) and real-world evidence. With the increasing availability of both randomized clinical trials (RCTs) and RWD, integrating these heterogeneous sources of evidence offers unique opportunities to address clinical questions that neither can answer in isolation. This talk presents a systematic framework for combining evidence from RCTs and real-world studies, with a focus on enhancing precision medicine development. The presentation will cover the following key topics: 1) The Evolving Landscape of RWD in Clinical Research: An overview of how RWD is used across different stages of the clinical development lifecycle and study designs. 2) Key Objectives for Integrative Evidence Synthesis: how integration of RCT and RWD can improve the generalizability and transportability of RCT findings, increase the efficiency and statistical power of treatment effect estimation, and enable long-term safety and effectiveness monitoring. 3) A Causal Roadmap for Evidence Integration: a causal inference perspective to articulate the assumptions, identification strategies, and inferential goals when combining RCTs and RWD. Emphasis is placed on the importance of understanding and addressing bias, especially in regulatory contexts. 4) The Role of AI/ML and Statistical Rigor: While artificial intelligence and machine learning offer powerful tools for data integration and prediction, rigorous statistical thinking remains paramount for valid causal inference and bias mitigation. 5) Innovative Trial Designs: recent advances in hybrid controlled trials using external real-world controls, such as test-then-pool procedures, selective borrowing, and conformal prediction approaches, with a focus on improving efficiency while safeguarding validity. 6) Challenges and Opportunities: conclusion by examining unresolved challenges and outlining future research directions.

๐ŸŽฌOpen the video directly

๐ŸŽค Jian Huang: Advancing Statistical Frontiers: Leveraging Large Models in Statistical Analysis

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 10:44 โ€“ 11:20
๐Ÿ›๏ธ The Hong Kong Polytechnic University

Keywords: Large-scale machine learning, foundation models, statistical challenges, contemporary methods, conditional generative learning, functional protein sequence generation, synthetic data augmentation, code-free data analysis
What It Does: Examining how large-scale machine learning models, including foundation models, can address contemporary statistical challenges with illustrative examples.
๐Ÿ“– Read more

Introduction: The advent of large-scale machine learning models, including deep neural networks and foundation models, is fundamentally reshaping the field of statistics. In this talk, we will examine how these powerful models can be leveraged to tackle contemporary statistical challenges. Through illustrative examples, including conditional generative learning, functional protein sequence generation, synthetic data augmentation, and code-free data analysis, we will highlight both the opportunities and the challenges that large models introduce to statistics.

๐ŸŽฌOpen the video directly

๐ŸŽค Yong Chen: Causal AI Beyond Randomized Controlled Trials: Negative Control Calibration and Federated Learning

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 11:23 โ€“ 11:55
๐Ÿ›๏ธ University of Pennsylvania

Keywords: Causal inference, negative control calibration, federated learning, randomized controlled trials, unmeasured confounding, NCOs, privacy-preserving, vaccine safety surveillance, drug repurposing, oncology research
What It Does: Proposing a framework using negative control calibration and federated learning to strengthen causal inference beyond randomized controlled trials.
๐Ÿ“– Read more

Introduction: Causal inference from real-world data (RWD) is often threatened by unmeasured confoundingโ€”biases that remain even after adjusting for observed covariates. To address this fundamental challenge, we introduce a framework leveraging negative control outcomes (NCOs) to diagnose and correct for hidden biases. Our approach improves the validity of causal effect estimates across a wide range of adjustment strategies, offering a practical and generalizable path toward trustworthy inference.

However, single-site analyses are often limited by insufficient sample sizes and lack of generalizability, especially when studying rare outcomes. To overcome this barrier, we further integrate federated learning, enabling multi-site collaboration without sharing patient-level data. This allows us to scale NCO-based calibration across institutions, while also supporting privacy-preserving subphenotyping and target trial emulation.

Together, these tools form the backbone of a Causal AI framework that is both debiased and distributedโ€”capable of delivering robust, generalizable insights across fragmented health data ecosystems. Applications include vaccine safety surveillance, drug repurposing, and international oncology research, demonstrating the real-world impact of combining statistical rigor with collaborative infrastructure.

๐ŸŽฌOpen the video directly

โ–ถ๏ธ Day 1 Recordings: Afternoon Session 1

๐ŸŽค Ross Mitchell: Applications of Foundational Models in Medical Imaging

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 13:33 โ€“ 14:05
๐Ÿ›๏ธ University of Alberta

Keywords: Foundational models, medical imaging, classification, segmentation, uncertainty modeling, Microsoft MII, self-supervised learning, multi-modal training, probabilistic 3D U-Net, organs-at-risk segmentation, cone-beam CT
What It Does: Demonstrating applications of foundational models in medical imaging for classification, segmentation, and uncertainty modeling across large-scale datasets.
๐Ÿ“– Read more

Introduction: This presentation explores the application of foundational models to medical imaging challenges, addressing both classification and segmentation tasks in label-limited scenarios. The first portion provides an overview of current foundational models in medical imaging, including Microsoftโ€™s Medical Imaging Intelligence (MII) model and similar architectures that have transformed the field through large-scale pre-training on diverse medical datasets. We examine how these models leverage self-supervised learning and multi-modal training to achieve remarkable performance across various medical imaging tasks, establishing new benchmarks for diagnostic accuracy and clinical utility.

The second portion presents our probabilistic approach to medical image segmentation using 40,000 3D CT scans from 10,000 patients for intestinal tract segmentation. We employ multiple foundational image segmentation models to train a probabilistic 3D U-Net that explicitly models uncertainty in ground truth annotations, learning per-voxel probability distributions across segmentation tasks. We demonstrate how this probabilistic framework extends naturally to challenging applications where ground truth labels are unavailable or prohibitively expensive, specifically applying our method to organs-at-risk segmentation in cone-beam CT scans for head and neck cancer radiotherapy planning.

๐ŸŽฌOpen the video directly

๐ŸŽค Yuehua Cui: Addressing some challenges in spatial transcriptomics: spatial deconvolution, gene variability, and domain detection

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 14:06 โ€“ 14:26
๐Ÿ›๏ธ Michigan State University

Keywords: Spatial transcriptomics, spatial deconvolution, gene variability, domain detection, statistical methods, 10x Visium, spatially variable genes, cell type-specific SVGs, linear mixed-effect model, tumor progression, deep learning
What It Does: Developing statistical and deep learning methods to address challenges in spatial transcriptomics, including deconvolution, gene variability, and domain detection.
๐Ÿ“– Read more

Introduction: Spatial transcriptomics (ST) provides crucial insights into tissue-specific gene expression patterns in various studies. In this talk, I will focus on three major tasks in ST data analysis: spatial deconvolution, spatial and temporal gene identification, and spatial domain detection. For spatial deconvolution, I will present a reference-free deconvolution method for spot-level ST data, such as those obtained from the 10x Visium platform, to infer cell type compositions in each spot. While recent methodological developments have greatly advanced the detection of spatially variable genes (SVGs), whose expression patterns are non-random across tissue locations, such SVGs do not reveal cellular heterogeneity in a spatial context. Following spatial deconvolution, I will introduce a unified approach to identify both SVGs and cell type-specific SVGs (ctSVGs), under a linear mixed-effect model framework. I will further show how we can incorporate cell trajectory information to identify genes showing spatial and temporal variation, offering critical insight into tumor progression and dynamics. Finally, I will present a deep learning framework for improved spatial domain detection.

๐ŸŽฌOpen the video directly

๐ŸŽค Ting Li: Principal Component Analysis in Geodesic Space

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 14:27 โ€“ 14:57
๐Ÿ›๏ธ Hong Kong Polytechnic University

Keywords: Principal component analysis, geodesic space, complex data structures, neuroimaging, Geodesic-PCA, G-PCA, brain corpus callosum, task-fMRI, Riemannian manifolds
What It Does: Extending principal component analysis into geodesic spaces to analyze complex data structures such as neuroimaging.
๐Ÿ“– Read more

Introduction: Principal Component Analysis (PCA) has been widely applied and extensively studied. However, it presents significant challenges to extend it for complex data in metric spaces. In this work, we propose Geodesic-PCA (G-PCA), a unified framework that extends PCA to geodesic spaces beyond manifolds. We develop robust and optimal theoretical results for G-PCA and validate its reliability and effectiveness through extensive simulations. In practical applications, we apply G-PCA to analyze brain corpus callosum and task-fMRI data, demonstrating its potential in fields such as neuroimaging.

๐ŸŽฌOpen the video directly

โ–ถ๏ธ Day 1 Recordings: Afternoon Session 2

๐ŸŽค Peter Song: Challenges in Calculating Epigenetic Age

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 15:31 โ€“ 16:00
๐Ÿ›๏ธ University of Michigan

Keywords: Epigenetic age, predictive models, high-resolution data, uncertainty quantification, multiple clocks, DNA methylation, epigenetic clocks, convolutional neural networks, conformal prediction, aging research
What It Does: Addressing challenges in calculating epigenetic age by refining predictive models with high-resolution data, uncertainty quantification, and multiple clocks.
๐Ÿ“– Read more

Introduction: DNA methylation (DNAm) has emerged as a key source of omics data for assessing epigenetic age, offering a wealth of genetic markers that reflect cellular changes influenced by social and environmental factors. Epigenetic age can be estimated through predictive models known as epigenetic clocks, which rely on high-dimensional data analytics. However, current epigenetic age calculators face significant limitations as DNAm data collection technology rapidly advances. In this talk, I will present approaches to tackle a few data science challenges, including refining epigenetic clocks with higher-resolution DNAm data using convolutional neural networks, quantifying prediction uncertainty using conformal prediction techniques to address increasing variability over aging, and combining multiple epigenetic clocks. This presentation will integrate both computational methodologies and algorithmic solutions, demonstrated through real-world data applications.

๐ŸŽฌOpen the video directly

๐ŸŽค Ting Ye: Multimodal AI for Predicting Atrial Fibrillation and Heart Failure Using ECG and Cardiac MRI

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 16:07 โ€“ 16:28
๐Ÿ›๏ธ University of Washington

Keywords: Multimodal AI, atrial fibrillation, heart failure, ECG, cardiac MRI, deep learning, cardiovascular risk stratification, UK Biobank, shared representations, modality-specific representations, disease prevention
What It Does: Introducing a multimodal deep learning framework that integrates ECG and cardiac MRI to improve prediction of atrial fibrillation and heart failure.
๐Ÿ“– Read more

Introduction: Atrial fibrillation (AF) and heart failure (HF) are leading causes of cardiovascular morbidity, mortality, and healthcare burden worldwide. Early detection of individuals at elevated risk, especially those who are asymptomatic, is critical for timely intervention and disease prevention. In this talk, I will present a novel multimodal deep learning framework that integrates electrocardiogram (ECG) and cardiac magnetic resonance imaging (MRI) data to jointly learn shared and modality-specific representations. By combining the temporal features of ECG with the structural insights from Cardiac MRI, the model significantly improves predictive performance on key clinical tasks, including predicting the onset of AF and HF. The framework is developed and evaluated using data from the UK Biobank, demonstrating the potential of multimodal AI to enhance cardiovascular risk stratification and inform targeted prevention strategies.

๐ŸŽฌOpen the video directly

๐ŸŽค Zhengwu Zhang: Representation Learning and Generative Models in Network Data Analysis

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 16:29 โ€“ 16:50
๐Ÿ›๏ธ The University of North Carolina at Chapel Hill

Keywords: Representation learning, generative models, VAEs, brain networks, neuroimaging, Variational Auto-Encoders, GATE, motion-invariant VAE, inv-VAE, brain connectivity, cognitive trait prediction
What It Does: Demonstrating how representation learning with VAEs and generative models can extract meaningful embeddings from brain networks to advance neuroimaging.
๐Ÿ“– Read more

Introduction: Brain network analysis grapples with high-dimensional, complex data. This talk will focus on how representation learning, particularly through Variational Auto-Encoders (VAEs), offers a powerful framework to extract meaningful low-dimensional embeddings from brain networks. We will delve into how VAEs learn latent representations that not only allow for accurate reconstruction of the original network data but also serve as a versatile foundation for diverse downstream tasks, such as predicting human traits or disentangling nuisance factors. We will showcase some generative models, e.g., Graph Auto-Encoding (GATE), which characterizes brain graph population distributions to improve cognitive trait prediction and a motion-invariant VAE (inv-VAE) that learns representations robust to motion artifacts in structural connectomes. Ultimately, this talk will demonstrate the transformative potential of generation-driven representation learning for advancing neuroimaging analyses and deepening our understanding of brain structure and function.

๐ŸŽฌOpen the video directly

๐ŸŽค Yumou Qiu: Physics-informed Statistical Data Fusion for Reconstructing 3D Current Fields of Oceanic Eddies

๐Ÿ“… Monday, August 18, 2025 โ€ข ๐Ÿ•˜ 16:53 โ€“ 17:15
๐Ÿ›๏ธ Peking University

Keywords: Physics-informed transfer learning, multi-source ocean data, 3D current fields, real-time field campaigns, geostrophic balance, Navier-Stokes equations, GLORYS reanalysis, Kuroshio Extension, underwater gliders, ADCP observations
What It Does: Applying a physics-informed transfer learning framework to integrate multi-source ocean data for reconstructing 3D current fields and guiding real-time field campaigns.
๐Ÿ“– Read more

Introduction: Accurate reconstruction of three-dimensional ocean current fields is critical for understanding ocean dynamics and real-time control of modern oceanographic field campaigns, particularly in mesoscale eddy environments. We propose a physics-informed transfer learning framework, designed for multi-source data fusion, to estimate the three-dimensional current structure of oceanic eddies by integrating satellite altimetry and temperature data, ocean reanalysis data, and in situ drifting buoy observations. The approach leverages geostrophic balance, derived from the Navier-Stokes equations, to guide a neural network trained on GLORYS reanalysis data in inferring subsurface currents from surface conditions. The surface conditions were estimated using a high-dimensional linear mixed model, which integrates systematically biased satellite altimetry and sparse drifting buoy data, allowing for spatially adaptive bias correction and yielding more accurate and spatially coherent surface velocity fields. This framework was deployed during a September 2024 field campaign targeting a cyclonic eddy in the Kuroshio Extension, guiding the real-time control of seven underwater gliders. Compared with existing data products, our method reduced RMSE by over 40% in cross-validation with drifting buoys and showed improved consistency with ADCP observations. The resulting glider trajectories provided enhanced spatial coverage of the eddy interior, enabling the first fine-scale three-dimensional survey of an eddy in the Kuroshio Extension region.

๐ŸŽฌOpen the video directly

๐Ÿ“Œ Watch All Recordings

Stats Up AI Stats Up AI YouTube

Visit the Stats Up AI Channel for More

AI is rapidly reshaping biomedical research by integrating diverse data, accelerating discovery, and supporting decision-making under uncertainty. With statisticians at the forefront, these applications gain the depth, rigor, and reliability needed to truly transform science and medicine.