STAI-X 2026 · Harvard University · July 31–August 1, 2026

STAI-X Challenge 2026

A statistics and AI challenge for autonomous data analysis and reusable statistical agents.

Problem Statement

The STAI-X Challenge invites participants to develop integrated statistical and AI-driven solutions for a high-impact public health problem: forecasting state-level rates of suspected nonfatal overdose emergency department visits. Drug overdose remains a leading cause of injury-related mortality in the United States, and timely, accurate surveillance is critical for public health response. Participants harness a broad array of public signals: social determinants of health, environmental exposures, digital behavioral indicators and clinical syndromic surveillance to forecast.

Awards Summary

Award A — Leaderboard Excellence

Work on a public health-related data prediction task and get the best performance.

Award B — AI Automation

Develop a general-purpose AI-driven automated data analysis pipeline.

Award C — Statistical Agents

Develop all kinds of statistical analysis agents and skills.

How to Participate

1 Read the competition rules carefully.
2 Assemble your team up to 4 members.
3 Decide your award track(s) and you can attend all tracks.
Award A: Leaderboard Excellence. Submit a Kaggle notebook that writes a valid prediction file.
Award B: AI Automation. Must attend Award A as the case to illustrate the performance of your AI pipeline hosted on Github repository.
Award C: Statistical Agents. Publish a Kaggle Discussion post with the Github repository link.
4 Complete the official registration form before submitting to any award track. Registration closes 2026/06/01 at 11:59pm UTC.
5 Join the Kaggle competition using the competition link. Only registered teams are qualified for awards. The competition will be open from 2026/05/04 at 12:00am UTC to 2026/06/15 at 11:59pm UTC.
6 Deadline for all award submissions is 2026/06/15 at 11:59pm UTC.

Key Dates

Phase Dates Activity
Registration DeadlineJun 1, 2026Teams must register before submitting to any award track.
Phase I — CompetitionMay 4 – Jun 15, 2026All award submissions are due 2026/06/15 at 11:59pm UTC.
Phase II — EvaluationJun 16 – Jun 30, 2026The committee reviews award participants and determines winners.
Phase III — AnnouncementJul 1, 2026Winners announced.

Participant Rules

1. Eligibility & Registration

  • Open to individuals and teams worldwide.
  • One Kaggle account per person. Multi-account participation disqualifies all affiliated submissions.
  • Maximum team size is 4. Team mergers are not allowed.
  • Sponsor employees, organizers, and reviewers may participate but are not eligible to be ranked.
  • Every team must register by 2026/06/01 at 11:59pm UTC. Team membership cannot change after registration.

2. Data Rules

  • The official competition dataset is the only permitted source.
  • No external data: no web scraping, no third-party datasets, and no attached Kaggle datasets beyond the official competition files.
  • Do not redistribute the competition data outside Kaggle.

3. Award A — Leaderboard Excellence

  • Submit a Kaggle notebook in Python or R that writes a valid submission.csv.
  • Internet is disabled at evaluation. Notebook size is limited to 1 MB and runtime to 9 hours CPU / 9 hours GPU.
  • Daily limit: 1 submission per team per day. Each team selects one final submission before the deadline.

Final ranking combines Stage 1 performance on the current holdout with Stage 2 re-execution on a later window. Each scoring category contributes equally through block-averaged MAE.

4. Award B — AI Automation

  • Participants for Award B must also enter Award A.
  • After Phase I, up to the top 20 teams by Stage 1 full-holdout MAE are selected for Award B evaluation.
  • Add the Award B markdown cell as the first cell of the Award A notebook so reviewers can find team information, GitHub repository, and architecture notes.
  • Tag the repository commit award-b-submission before 2026/06/15 11:59pm UTC.

5. Award C — Statistical Agents

  • Publish one Kaggle Discussion post with the title prefix [Award C].
  • Add the necessary information in the post so reviewers can find team information, GitHub repository, and architecture notes.
  • Describe what the statistical agent or skills does in plain language, and include team information.
  • Include a demo link, architecture summary, and reproducibility link or notebook.
  • Tag the repository commit award-c-submission before 2026/06/15 11:59pm UTC.
  • Posts must be published by 2026/06/15 at 11:59pm UTC.

6. General Integrity Rules

Use one identity. Account sharing, sockpuppets, and multi-account participation are prohibited.
Keep work reproducible. Finalist notebooks and agent repositories may be re-run by organizers exactly as submitted.
Respect data boundaries. Do not use or redistribute non-official data for the competition.
No private evaluation discussion. Questions about rules or evaluation should go through public competition channels.
Committee decisions are final. Organizers may disqualify submissions that violate the published rules.

Please reach out to staix.contact@gmail.com if you have any questions about the data challenge.