…A national health data science challenge established to advance human health through machine learning
Sepsis 2 Onset and Mortality Among Adult Inpatients
Introduction
The early identification of sepsis cases is the difference between life and death for the patient, and is mission critical to healthcare providers re: quality and cost; also, this use case is well supported by Cerner Health Facts data (Note: 1) these data are deidentified and 2) we have more complete information for inpatients than outpatients). We focus on inpatients to cover a cohort with a much larger vulnerable patient population sample size, in an environment that may feature a smaller nurse/patient ratio.
Challenge Tasks and Data
The Challenge has three tasks:
- Sepsis 2 onset risk prediction (4 hours before onset)
- 30-day mortality risk prediction among sepsis patients (at the time of onset); and
- Innovation regarding interpretability
We included all hospitalized adult (at least 16 years old) patients with suspicious infection. The sepsis 2 patients must meet at least 2 SIRS criteria:
- Body temperature > 100.4 or < 95.0
- RR > 20 or PaCO2 < 32mmHg
- HR > 90/min
- WBC > 12k or < 4k or Band > 10%
We excluded patients who 1) are children, and 2) have been in the hospital for less than 8 hours or more than 30 days.
There are 3 critical time points for each patient:
We will provide patient demographic and admission data for both tasks.
adm_id | gender | race | admission_type | addission_source | care_setting | age_grp |
---|---|---|---|---|---|---|
A100019 | Male | Caucasian | Elective | Physician Referral | Care Setting Undefined | 60~70 |
A100032 | Female | African American | Emergency | Physician Referral | Care Setting Undefined | 50~60 |
A100034 | Male | Caucasian | Elective | Others/unknown | Care Setting Undefined | 40~50 |
A100035 | Male | Caucasian | Emergency | Others/unknown | Care Setting Undefined | 70~80 |
Task 1: Sepsis 2 Onset Risk Prediction (4 hours before onset)
Goal: To predict sepsis-2 onset 4 hours before it occurs
We provide clinical events and lab test results between Tadmission
and Tonset - 4 for each patient, in the matrix format. The time is offset by Tadmission.
adm_id | event_time | A/G Ratio | ALT/SGPT | AST/SGOT | Albumin Quant | Albumin, Serum | Alk Phos, Serum | Amylase, Serum | Anion Gap | ... |
---|---|---|---|---|---|---|---|---|---|---|
A100008 | 0.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A100008 | 2.0 | 1.2 | 26.0 | 38.0 | NaN | 2.9 | 75.0 | NaN | 9.0 | ... |
A100008 | 3.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A100008 | 4.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
Each patent is labeled with whether they have been identified for sepsis 2 onset.
adm_id | sepsis2 |
---|---|
A100001 | 0 |
A100002 | 0 |
A100003 | 0 |
A100004 | 0 |
A100005 | 0 |
A100006 | 0 |
Total data size:
Evaluation: Standard AUC, with randomly supplied samples from the testing cohort. We will test:
- Case and control segments from the same patient: over the longer term ( > 4 hours before sepsis onset) vs. segmentation close to sepsis onset ( = 4 hours)
- Case and control segments from different patients who have sepsis onset in the next 4 hours, as well as those who do not have a sepsis
Task 2: 30-day Mortality Risk Prediction for Patients Identified with Sepsis 2
Goal: To predict whether the patient will die in the hospital within 30 days, using up to 48 hours of data before sepsis onset.
We provide the clinical events and lab test results between Tonset - 48 and Tonset - 4 for each sepsis 2 patient, in the matrix format. The time is offset by Tonset.
adm_id | event_time | A/G Ratio | ALT/SGPT | AST/SGOT | Albumin Quant | Albumin, Serum | Alk Phos, Serum | Amylase, Serum | Anion Gap | ... |
---|---|---|---|---|---|---|---|---|---|---|
A1000019 | -47.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -46.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -45.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -45.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -44.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
Each patent is labeled with their mortality status and the time between Tonset and Tdischarge.
adm_id | time | mortality |
---|---|---|
A100079 | 200.5 | 0 |
A100244 | 78.5 | 0 |
A100328 | 78.5 | 0 |
A100388 | 55.5 | 0 |
A100398 | 117.0 | 0 |
Total data size:
Evaluation:
Cumulative case/dynamic control ROC; judge performance on multiple timestamps to see how well and how early (relative to mortality/discharge) the model Mi can obtain a good prediction from t0nset
Evaluate and compare using R Package timeROC
https://cran.r-project.org/web/packages/timeROC/index.html
sensitivityC(c,t) = P(Mi > c|Ti < t)
specificityD(c,t) = P(Mi < c|Ti > t)
Using different time cutoffs t to calculate AUC (in the traditional way) allows one to access the model's performance in predicting short-term, medium term, and long term mortality risk after sepsis onset.
Task 3: Innovation Regarding Interpretability
While many machine learning models can conduct classification and regression tasks, not all of them achieve valid interpretation that potentially enables the application of findings to better inform decision support in the clinical setting.
There is no means of providing interpretability (e.g., automatic decisions on the threshold, finding combined patterns, designing novel visualizations, etc.), without evaluation from human experts. We have assembled a group of machine learning and clinical experts to judge the Challenge innovation track, which will be focused on interpretability.
Submitting Your Entry
The prediction result must be submitted via the SECURESTOR submission directory (one is assigned for each team).
For Task 1, please submit the probability that the patient will have sepsis 2 onset in the next 4 hours
For Task 2, please submit the probability for the patient’s mortality within 30 days
The submission must be in CSV (comma-separated) format, with column headers. Below is the sample layout for both tasks 1 and 2.
adm_id | probability |
---|---|
0.98330 | |
A100093 | 0.34455 |
A100044 | 0.12333 |
A100046 | 0.23322 |
Rules
Rules:
- Participants must not download the dataset
- Participants are responsible for any additional access/logons created on their server and for keeping their password secret
- Solutions must be submitted in the required format by the designated deadline