Statistical and AI Methods
for Health Data Science
Empowering Health Data Science through the Integration of Statistics and AI
Deconstructing Alzheimer’s Disease and Related Dementias to Enable Precision Medicine: Challenges, Models, and Resources for Data Science
, Psy.D., is the Raymond C. Beeler Professor of Radiology and Imaging Sciences and Professor of Medical and Molecular Genetics at the Indiana University School of Medicine. Dr. Saykin serves as director of the Indiana Alzheimer’s Disease Research Center (IADRC) and IU Center for Neuroimaging. He leads the ADNI Genetics Core and is an MPI of several NIA-sponsored consortia (CLEAR-AD, KBASE, AI4AD) with experience in collaborative, transdisciplinary, and multi-institutional team science. His current research program focuses on systems biology approaches to understanding pathways driving AD/ADRD, leveraging multimodal neuroimaging, multi-omics biomarkers, and AI-based strategies. Dr. Saykin is the founding editor-in-chief of Brain Imaging and Behavior, and a highly cited author with over 700 publications.
Alzheimer’s disease (AD) and related dementia (ADRD) research has become increasingly big data-driven, relying on large, heterogeneous, and longitudinal datasets that span imaging, biomarkers, genetics, and clinical phenotyping. Experimental data from model systems and clinical trials now often include rich high dimensional data along with key outcomes. Although researchers have known about plaques and tangles, composed of amyloid beta and tau proteins for over a century, the underlying causes of most forms of AD/ADRD remain unknown. This lecture will discuss foundational questions and challenges in ADRD research including how data sciences/AI can help advance precision diagnostic and therapeutic strategies for ADRD and will introduce the growing ensemble of accessible data resources available for analysis in an open science framework. Major NIA-funded observational data resources such as the National Alzheimer’s Coordinating Center (NACC) and Alzheimer’s Disease Neuroimaging Initiative (ADNI) and affiliated neuroimaging, genetics, and multi-omics data repositories will be described, along with efforts at data harmonization. Future directions include increasing use of large-scale electronic health records and national databases providing a window to diagnoses, treatment and outcomes in real-world clinical healthcare settings. Finally, bottlenecks in advancing knowledge of causal mechanisms, early detection, and development of novel therapeutic strategies will be discussed.
Generative AI and Statistics
Bridging the Gap Between Generative AI and Statistical Methodologies
Leveraging LLMs for student feedback in introductory data science courses
is a Professor of the Practice and Director of Undergraduate Studies in the Department of Statistical Science at Duke University, and Director of the First-Year Experience in Trinity College of Arts & Sciences. Her work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education. She is a contributor to the OpenIntro Statistics textbook and develops R packages and resources for data science education.
A considerable recent challenge for learners and teachers of data science courses is the increasing use of LLM-based tools in generating answers. In this talk, I will introduce an R package that leverages LLMs to produce immediate feedback on student work to motivate them to give it a try themselves first. I will discuss the technical details of augmenting models with course materials, as well as backend and user interface decisions, challenges surrounding evaluations that are not performed correctly by the LLM, and student feedback from the first set of users. Finally, I will discuss incorporating this tool into low-stakes assessments and address ethical considerations for the formal assessment structure of the course, which relies on LLMs.
AI agents to accelerate scientific discoveries
is an Associate Professor of Biomedical Data Science and, by courtesy, Computer Science and Electrical Engineering at Stanford University. His research focuses on making AI more reliable and human-compatible, with particular interest in applications for human disease and health. He is a two-time Chan-Zuckerberg Investigator (2017, 2023), Faculty Director at the Stanford Data Science Institute, and a member of the Stanford AI Lab. He received the Overton Prize, NSF CAREER Award, and Sloan Fellowship, along with multiple best paper awards and faculty awards from Google, Amazon, Genentech, and Apple.
AI agents–large language models equipped with tools and reasoning capabilities–are emerging as powerful research enablers. This talk will explore how agentic AI can accelerate scientific discoveries. I’ll first introduce the Virtual Lab–a collaborative team of AI scientist agents conducting in silico research meetings to tackle open-ended research projects. As an example application, the Virtual Lab designed new nanobody binders to recent Covid variants that we experimentally validated. Then I will introduce Paper2Agent, a framework to automatically convert passive research papers into interactive AI agents. Finally I will discuss learnings from Agents4Science, the first conference where the authors and reviewers are primarily AI systems.
