Explore the data, build your model, and apply machine learning to solve health challenges
Challenge 1
December 2025 - February 2026
Goal: To develop new algorithms to determine patient outcomes from clinical correlates, molecular and cytogenetic findings, and proteomic (RPPA) data.
Prizes: $250 total in Giftly multi-store gift cards
🥇 1st Place - $125 gift card
🥈 2nd Place - $75 gift card
🥉 3rd Place - $50 gift card
aiMATCH presents this outcome prediction challenge as part of the AIM-AHEAD Program of the National Institutes of Health (NIH), Data Infrastructure and Capacity Building (DICB) Program. We equip AIM-AHEAD trainees, stakeholders, and the broader research community with AI skills and tools to accelerate progress towards their research projects.
The challenge dataset is a multi-omic, multi-cohort dataset designed to advance predictive modeling of leukemia outcomes in pediatric and adult patients at the MD Anderson Cancer Center. The data integrates clinical, genomic/cytogenetic, and proteomic measurements to enable the discovery of robust biomarkers, pathway-level insights, and patient-specific therapeutic guidance.
Overview
Use the specialized aiMATCH Chatbot developed for this challenge for questions about the dataset features, developing a modeling plan, and more.
The aiMATCH Challenge is hosted on Kaggle, which is an online platform for data science competitions.
The aiMATCH Leukemia Patient Outcome Prediction Challenge is organized into four objectives. The current challenge addresses Question 1, with future challenge phases planned to focus on Questions 2–4.
-
Develop new algorithms to determine patient outcomes from clinical correlates, molecular and cytogenetic findings, and proteomic (RPPA) data.
-
Identify patterns of protein utilization in the three (AML, CML, and T-cell ALL) datasets, identifying what is shared, or common, between multiple types of leukemia and what is unique to a type(s) of leukemia.
-
Identify how the patterns of protein utilization are affected by age (pediatric vs. adult) within the same leukemia or across the different types of leukemia.
3.1. Are the patterns of protein utilization in pediatric and adult populations similar or different?
3.2. Are some protein utilization patterns age-specific? shared?
-
Rank the potential best FDA-approved and/or novel therapies that target those pathways. Identify individual proteins, or subsets of proteins, that form a particular pattern that should be targeted, then match available drugs to those proteins, and tell us how they would ID those cases and targets, and match drugs to specific patients.
Important dates
Challenge Question 1
December 17, 2025
January 2025
February 20, 2025
(11:59 CT)
February 23, 2025
Official launch
Midway point check-in
Challenge ends
Winners announced
Challenge Questions 2-4
Official launch
Mid-2026
Participant Eligibility
Anyone can participate provided that they meet the following:
Age & eligibility: You must be 18 years or older. If you are signing up as a Learner or Mentee, you must be a U.S. citizen, U.S. national, or permanent resident.
AIM-AHEAD Connect Profile: Each member of the team must complete a required user profile on AIM-AHEAD Connect and ensure all information is accurate.
Kaggle Account: At least one member of the team must have a Kaggle account to submit an entry.
Agreement to terms: You must read and agree to the following:
The AIM-AHEAD Connect User Agreement and Privacy Policy
The Kaggle competition rules, which include the data use agreement and Kaggle-specific rules.
Getting started
Form a competing team with 1 to 5 members.
1
Receive the invite link to the Kaggle competition page after registering all team members on AIM-AHEAD Connect
2
Download the data, develop models, and submit your predictions, write-ups, and code notebooks on Kaggle, accessible anytime via the link below.
3
Note: Since this is a private competition, this competition page link will only work once you’ve joined via the invite link!
Meet the challenge
Here are some introductory and overview videos to kick off the data competition
Submissions
To be considered for a prize, all members of the team must be registered on AIM-AHEAD Connect, and final submissions must include the following:
A solution file
Two columns, 193 rows (length of test set).
Column 1:
UPIN— unique ID.Column 2:
Response Simple— predicted class (values 0 or 1). Use0 → NR,1 → CR/CRi.
Working code
A reproducible Kaggle Notebook that outputs the submitted predictions end-to-end. The notebook must include: environment & package setup; loading the training & test data; the training pipeline; and the final cell that writes the prediction CSV.
Kaggle notebooks support Python and R. Learn more about Kaggle notebooks here.A short write-up (1-5 paragraphs) describing the following:
Details of pre-processing done on the data (e.g., encoding, normalization, dimensionality reduction, etc.)
Motivation behind the choice of classification algorithm/s
Description of model
Key findings (e.g., feature importances, significant correlations, effective techniques to improve model performance, etc.)
(Optional) Description of the use of the accompanying aiMATCH chatbot, if any, and comments/suggestions to improve its effectivity
(Optional) Possible future directions, insights, areas for further exploration
Mechanics
Leaderboard
To discourage cheating, the accuracy reported upon model submission is computed using only a random subset of the test set (the public set). Leaderboard rankings displayed during the competition are based on this public set. Final rankings, however, will be determined using the private test set, which consists of the remaining test data.
Thus, public leaderboard scores provide a reasonable estimate of model performance but may not reflect final standings. The public and private splits were fixed prior to the launch of the challenge. The final leaderboard results will be released on February 21, 2026, at 12 AM ET.