Short Courses

STAI-X 2026 will host six short courses on July 31, 2026 at Harvard University. Courses are taught by leading researchers and cover foundational and applied topics across statistics, machine learning, and AI — from agentic AI workflows and scaling theory to large language models, reinforcement learning, clinical trials, and diffusion models.

Three morning courses (8:00 AM – 12:00 PM) and three afternoon courses (1:00 PM – 5:00 PM) will run in parallel. Participants may register for one morning and one afternoon course.

#	Time	Title	Instructor(s)
SC1	8:00 AM – 12:00 PM	Agentic AI: From Zero to Infinity	Tian Zheng (Columbia)
SC2	8:00 AM – 12:00 PM	Theory of Scaling in Modern Deep Learning	Soufiane Hayou (JHU), Nikhil Ghosh (Flatiron)
SC3	8:00 AM – 12:00 PM	An Overview of LLMs for Statisticians	Linjun Zhang (Rutgers)
SC4	1:00 PM – 5:00 PM	Reinforcement Learning: Foundations and Applications	Chengchun Shi (LSE)
SC5	1:00 PM – 5:00 PM	The Role of AI in Accelerating Clinical Trials and Drug Development	Alexia Iasonos (MSKCC), John O’Quigley (UCL)
SC6	1:00 PM – 5:00 PM	Theory for Diffusion Models: Continuous and Discrete	Sitan Chen (Harvard)

SC1 July 31, 2026 · 8:00 AM – 12:00 PM

Agentic AI: From Zero to Infinity — A Hands-On Workshop for Statisticians on Building and Debugging AI Workflows

Instructor: Tian Zheng, Columbia University

Course Description

Agentic AI systems, which iteratively plan, act, and refine outputs, are rapidly reshaping how research, teaching, and professional workflows are conducted. This hands-on workshop introduces statisticians to the design and use of such systems through a practical, systems-oriented lens. Participants will build a simple agentic AI workflow that addresses a task in their own research, teaching, or professional practice. Participants will learn how to identify and diagnose common failure modes, such as inconsistency, bias, and context mis-specification, and how to iteratively improve system performance using natural language as a programming interface. By the end of the session, participants will have developed and refined a working prototype, along with a principled framework for evaluating and extending agentic AI workflows in their own work.

Prerequisites

This workshop is designed for statisticians and data scientists with basic familiarity with programming and an interest in integrating AI into their workflows. No prior experience with agentic AI systems is required.

Instructor Biography

Tian Zheng is currently Professor of Statistics at Columbia University. In her research, she develops novel methods for exploring and understanding patterns in complex data from different application domains such as biology, psychology, climate modeling, etc. Her research has been recognized by the 2008 Outstanding Statistical Application Award from the American Statistical Association (ASA), the Mitchell Prize from ISBA, and a Google research award. She became a Fellow of the American Statistical Association in 2014, a Fellow of the Institute of Mathematical Statistics in 2022, and a Fellow of the American Association for the Advancement of Science in 2024. From 2017 to 2020, she served as Associate Director for Education at the Columbia Data Science Institute. From 2019 to 2025, she was chair of the Department of Statistics at Columbia. Professor Zheng is the recipient of the 2017 Columbia Presidential Award for Outstanding Teaching. In 2021, she was recognized with a Lenfest Distinguished Columbia Faculty Award, which honors the excellence of faculty as teachers and mentors of both undergraduate and graduate students.

SC2 July 31, 2026 · 8:00 AM – 12:00 PM

Theory of Scaling in Modern Deep Learning

Instructors: Soufiane Hayou, Johns Hopkins University · Nikhil Ghosh, Flatiron Institute

Scope and Course Objectives

Scaling modern deep learning models such as large language models is constrained by hyperparameter (HP) tuning cost. This short course explains how theory can guide scaling in practice via techniques such as the Maximal Update Parameterization (muP). We connect the theoretical analysis (infinite-width training dynamics, stability of effective learning rates, and parameterization design) to practical workflows for reliable HP transfer across width and depth.

Learning objectives. Attendees will:

Understand what it means for HPs to be transferable across scale and why this matters for compute-optimal experimentation.
Learn the core design principles of muP and how they arise from theory (including a self-contained proof sketch).
Learn how to implement muP and HP transfer in practice and how to diagnose common failure modes.
Understand the notion of fast transfer, when it is guaranteed/useful, and when it fails.
Leave with actionable recipes for transferring LR, weight decay, and related HPs from small proxy runs to large runs.

Course Content

The short course will focus on the theory of scaling of neural networks. We will re-visit topics on infinite-width analysis such as the Neural Tangent Kernel (Jacot et al. 2018) and feature-learning infinite-width networks (Yang et al. 2022). It will (tentatively) be structured as follows:

Part I: General introduction to hyperparameter transfer

The scaling bottleneck: why direct HP tuning at large scale is infeasible; the proxy-run paradigm.
Definitions: transfer error, stability across width/depth, and what practitioners actually need (scale-invariant HPs).

Part II: Theory of muP and why it works

From standard parameterization to muP: what is being stabilized (update magnitudes / effective learning rates).
Tensor Programs intuition: width limits, feature learning vs. lazy regimes, and how parameterization controls the limit.
A proof for LR transfer with muP: a self-contained derivation showing how muP equalizes per-layer update scales and yields width-stable training dynamics under standard assumptions.
Practical implications: what HPs transfer well (LR, weight decay, init scales) and which typically do not.
Extensions and related ideas: depth-wise considerations, and how theory predicts additional scaling rules.

Part III: Fast HP transfer

Defining fast transfer: suboptimality from transferring HPs vanishes faster than the finite-scale performance gap.
When fast transfer provably holds vs. fails: dependence on problem structure; synthetic counterexamples and what they teach us.
A conjectured mechanism: decomposing loss improvement into width-stable vs. width-sensitive components as a lens into the structure of fast transfer.
Practical implications: using the decomposition lens to provide intuition for when transfer will be fast or slow.

References

G. Yang et al. “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer” (NeurIPS 2021 / arXiv:2203.03466).
S. Hayou. “A Proof of Learning Rate Transfer” (AISTATS 2026 / arXiv:2511.01734).
N. Ghosh, D. Wu, A. Bietti. “Understanding the Mechanisms of Fast Hyperparameter Transfer” (ICLR 2026 / arXiv:2512.22768).
G. Yang et al. “Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks” (ICLR 2024 / arXiv:2310.02244).

Prerequisites

Standard concentration results (central limit theorem and advanced variants).

Instructor Biographies

Soufiane Hayou is currently an assistant professor at Johns Hopkins in the department of Applied Mathematics and Statistics with a secondary appointment at the Computer Science department. He is also a member of the Data Science and AI Institute. Previously, he was a research fellow at Simons Institute, UC Berkeley, and a visiting assistant professor of mathematics at the National University of Singapore. He obtained his PhD in statistics and machine learning in 2021 from the University of Oxford, and graduated from Ecole Polytechnique in 2018 before joining Oxford. His research is mainly focused on the theory and practice of learning at scale: theoretical analysis of large scale neural networks with the goal of obtaining principled methods for training/finetuning.

Nikhil Ghosh is a Research Fellow at the Flatiron Institute. His main interests are in the theory of deep learning, particularly in the topics of optimization and scaling of neural networks. Previously he was a PhD student in the Statistics department at UC Berkeley working with Bin Yu and Song Mei.

SC3 July 31, 2026 · 8:00 AM – 12:00 PM

An Overview of LLMs for Statisticians: Basics, LLM-assisted Statistical Analysis, and Agentic AI

Instructor: Linjun Zhang, Rutgers University

Course Description

This course explores the foundations and frontiers of modern large language models (LLMs). We will cover the statistical principles underlying embeddings, transformers, and the LLM training pipeline (pretraining, parameter-efficient finetuning, reinforcement learning from human feedback). Building on this foundation, we will discuss reasoning models, agentic AI (including context engineering and harness engineering), and key issues of AI safety. The course emphasizes both statistical insights and practical applications, equipping students with tools to critically analyze and design trustworthy AI systems.

Prerequisites

Linear algebra, calculus, basic probability theory and mathematical statistics, basic skills of vibe coding.

Instructor Biography

Linjun Zhang is an Associate Professor in the Department of Statistics and an associated faculty member of Computer Science, at Rutgers University. He obtained his Ph.D. in Statistics at the Wharton School, the University of Pennsylvania in 2019, and received the J. Parker Bursk Memorial Prize and Donald S. Murray Prize for excellence in research and teaching, respectively, upon graduation. He also received the NSF CAREER Award, Rutgers Presidential Teaching Award in 2024, and the Warren I. Susman Award for Excellence in Teaching in 2025. His current research interests include statistical foundations of large language models, algorithmic fairness, privacy-preserving data analysis, and deep learning theory.

SC4 July 31, 2026 · 1:00 PM – 5:00 PM

Reinforcement Learning: Foundations and Applications

Instructor: Chengchun Shi, London School of Economics and Political Science

Course Description

This short course offers a comprehensive introduction to reinforcement learning (RL), combining foundational theory with modern applications. Drawing on the RL “bible” (Sutton & Barto) and more recent advances in artificial intelligence, it covers Markov decision processes, planning and learning, Q-learning, policy- and model-based methods, and offline reinforcement learning, and their applications to video games, ridesharing, and large language models. The course has been delivered at over 10 universities, with its materials openly available on GitHub.

Prerequisites

Basic knowledge of probability and statistics is required.

Instructor Biography

Chengchun Shi is an Associate Professor in the Department of Statistics at LSE. He works at the interface of RL, LLMs and statistics, with applications to ride-sharing and healthcare. His work brings to light the relevance and significance of statistical learning in AI, and demonstrates the usefulness of RL as a framework for policy evaluation and A/B testing in two-sided marketplaces. Chengchun has published over 70 papers, with majority of them accepted in prestigious statistical journals (JRSSB, JASA, AoS) and top machine learning venues (NeurIPS, ICML, KDD, JMLR, CVPR, ICLR). His outstanding contributions have been recognized with esteemed awards such as the Peter Gavin Hall IMS Early Career Prize, IMS Tweedie Award and the Royal Statistical Society Research Prize. He has served as the associate editor of prestigious journals JRSS-B, JASA and AoAS.

SC5 July 31, 2026 · 1:00 PM – 5:00 PM

The Role of AI in Accelerating Clinical Trials and Drug Development

Instructors: Alexia Iasonos, Memorial Sloan Kettering Cancer Center · John O’Quigley, University College London

Course Description

Artificial Intelligence (AI) is transforming drug development by accelerating discovery, optimizing clinical trials, and reducing overall costs and timelines. Traditional drug development is often lengthy and expensive, hindered by high attrition rates and complex uncertainties. AI-driven methods, including machine learning, deep learning, and natural language processing, enable rapid analysis of vast biomedical datasets to identify novel drug targets, predict molecular interactions, and optimize compound design. In preclinical stages, AI models simulate pharmacokinetics, toxicity, and efficacy, narrowing down viable candidates before laboratory testing. During clinical development, AI can assist in trial design, expedite patient recruitment, and real-time monitoring, improving efficiency and precision. Furthermore, AI enhances repurposing of existing drugs by uncovering new therapeutic applications through pattern recognition in genomic and clinical data.

Integration of AI with advances in computational biology, genomics, and high-throughput screening is driving a paradigm shift toward data-driven, personalized, and adaptive drug discovery processes. However, challenges remain in data quality, interpretability, regulatory acceptance, and ethical governance. Continued collaboration between AI experts, pharmaceutical industry partners, and regulatory bodies will be essential to fully harness AI’s potential in creating safer, more effective, and accessible treatments.

In this course, we will illustrate three applications of AI use in oncology clinical trials:

The setting of Phase III pivotal, registrational trials where there is a head-to-head definitive comparison with a survival endpoint that leverages real-world data and synthetic controls.
The Phase II single-arm setting with limited sample size using pick-the-winner approaches that optimize drug portfolio selection.
The causal inference setting in Phase I dose-finding studies where the aim is to compare different dose levels with respect to both safety and efficacy.

Prerequisites

Statistical knowledge required: general clinical trial design expertise. Examples include 2-arm sample size calculations with time-to-event endpoints, single-arm 2-stage Simon design, and model-based Phase I designs. No programming required.

Instructor Profiles

Alexia Iasonos, Memorial Sloan Kettering Cancer Center. Profile.

John O’Quigley, University College London. Publications.

SC6 July 31, 2026 · 1:00 PM – 5:00 PM

Theory for Diffusion Models: Continuous and Discrete

Instructor: Sitan Chen, Harvard University

Course Description

This short course will overview recent theoretical developments in diffusion generative modeling over both continuous and discrete spaces. Following an overview of the basic notions underlying these models (Fokker-Planck, time reversal, flow matching, continuous Markov chains, etc.), the course will present three threads of research in this area: (1) discretization bounds which control the number of sampling steps needed to produce a sample with small statistical error; (2) provably correct algorithms for estimating the score function of a distribution and computational hardness results; and (3) methods for “steering” the outputs of diffusion-based samplers towards downstream objectives (e.g., guidance, sequential Monte Carlo, stochastic optimal control, etc.). While the focus will be on presenting rigorous results, the course will also be an opportunity to surface contemporary challenges which do not yet possess a crisp mathematical framing, but which could benefit from a more principled statistical and algorithmic lens.

Prerequisites

Familiarity with the basics of convex optimization, log-concave sampling, and high-dimensional probability is encouraged.

Instructor Biography

Sitan Chen is an Assistant Professor of Computer Science at Harvard University, where he is a member of the Theory of Computation, the ML Foundations group, and the Harvard Quantum Initiative. Previously, he was an NSF math postdoc at UC Berkeley, after completing his PhD in EECS at MIT in 2021. He is broadly interested in algorithmic questions about learning from data, most recently related to the science and theory of diffusion generative modeling, and the design of quantum protocols for learning about the physical universe. His work has been recognized with an NSF CAREER award, an ICML Outstanding Paper Award, and the Harvard Dean’s Competitive Fund for Promising Scholarship.