STAI-X Challenge 2026
A statistics and AI challenge for autonomous data analysis and reusable statistical agents.
Problem Statement
The STAI-X Challenge invites participants to develop integrated statistical and AI-driven solutions for a high-impact public health problem: forecasting state-level rates of suspected nonfatal overdose emergency department visits. Drug overdose remains a leading cause of injury-related mortality in the United States, and timely, accurate surveillance is critical for public health response. Participants harness a broad array of public signals: social determinants of health, environmental exposures, digital behavioral indicators and clinical syndromic surveillance to forecast.
Awards Summary
Award A — Leaderboard Excellence
Work on a public health-related data prediction task and get the best performance.
Award B — AI Automation
Develop a general-purpose AI-driven automated data analysis pipeline.
Award C — Statistical Agents
Develop all kinds of statistical analysis agents and skills.
How to Participate
• Award A: Leaderboard Excellence. Submit a Kaggle notebook that writes a valid prediction file.
• Award B: AI Automation. Must attend Award A as the case to illustrate the performance of your AI pipeline hosted on Github repository.
• Award C: Statistical Agents. Publish a Kaggle Discussion post with the Github repository link.
Key Dates
| Phase | Dates | Activity |
|---|---|---|
| Registration Deadline | Jun 1, 2026 | Teams must register before submitting to any award track. |
| Phase I — Competition | May 4 – Jun 15, 2026 | All award submissions are due 2026/06/15 at 11:59pm UTC. |
| Phase II — Evaluation | Jun 16 – Jun 30, 2026 | The committee reviews award participants and determines winners. |
| Phase III — Announcement | Jul 1, 2026 | Winners announced. |
Participant Rules
1. Eligibility & Registration
- Open to individuals and teams worldwide.
- One Kaggle account per person. Multi-account participation disqualifies all affiliated submissions.
- Maximum team size is 4. Team mergers are not allowed.
- Sponsor employees, organizers, and reviewers may participate but are not eligible to be ranked.
- Every team must register by 2026/06/01 at 11:59pm UTC. Team membership cannot change after registration.
2. Data Rules
- The official competition dataset is the only permitted source.
- No external data: no web scraping, no third-party datasets, and no attached Kaggle datasets beyond the official competition files.
- Do not redistribute the competition data outside Kaggle.
3. Award A — Leaderboard Excellence
- Submit a Kaggle notebook in Python or R that writes a valid
submission.csv. - Internet is disabled at evaluation. Notebook size is limited to 1 MB and runtime to 9 hours CPU / 9 hours GPU.
- Daily limit: 1 submission per team per day. Each team selects one final submission before the deadline.
Final ranking combines Stage 1 performance on the current holdout with Stage 2 re-execution on a later window. Each scoring category contributes equally through block-averaged MAE.
4. Award B — AI Automation
- Participants for Award B must also enter Award A.
- After Phase I, up to the top 20 teams by Stage 1 full-holdout MAE are selected for Award B evaluation.
- Add the Award B markdown cell as the first cell of the Award A notebook so reviewers can find team information, GitHub repository, and architecture notes.
- Tag the repository commit
award-b-submissionbefore 2026/06/15 11:59pm UTC.
5. Award C — Statistical Agents
- Publish one Kaggle Discussion post with the title prefix
[Award C]. - Add the necessary information in the post so reviewers can find team information, GitHub repository, and architecture notes.
- Describe what the statistical agent or skills does in plain language, and include team information.
- Include a demo link, architecture summary, and reproducibility link or notebook.
- Tag the repository commit
award-c-submissionbefore 2026/06/15 11:59pm UTC. - Posts must be published by 2026/06/15 at 11:59pm UTC.
6. General Integrity Rules
Please reach out to staix.contact@gmail.com if you have any questions about the data challenge.