The early identification of sepsis cases is the difference between life and death for the patient, and is mission critical to healthcare providers re: quality and cost; also, this use case is well supported by Cerner Health Facts data (Note: 1) these data are deidentified and 2) we have more complete information for inpatients than outpatients). We focus on inpatients to cover a cohort with a much larger vulnerable patient population sample size, in an environment that may feature a smaller nurse/patient ratio.
The Challenge has three tasks:
We included all hospitalized adult (at least 16 years old) patients with suspicious infection. The sepsis 2 patients must meet at least 2 SIRS criteria:
We excluded patients who 1) are children, and 2) have been in the hospital for less than 8 hours or more than 30 days.
There are 3 critical time points for each patient:
We will provide patient demographic and admission data for both tasks.
adm_id | gender | race | admission_type | addission_source | care_setting | age_grp |
---|---|---|---|---|---|---|
A100019 | Male | Caucasian | Elective | Physician Referral | Care Setting Undefined | 60~70 |
A100032 | Female | African American | Emergency | Physician Referral | Care Setting Undefined | 50~60 |
A100034 | Male | Caucasian | Elective | Others/unknown | Care Setting Undefined | 40~50 |
A100035 | Male | Caucasian | Emergency | Others/unknown | Care Setting Undefined | 70~80 |
Goal: To predict sepsis-2 onset 4 hours before it occurs
We provide clinical events and lab test results between Tadmission
and Tonset - 4 for each patient, in the matrix format. The time is offset by Tadmission.
adm_id | event_time | A/G Ratio | ALT/SGPT | AST/SGOT | Albumin Quant | Albumin, Serum | Alk Phos, Serum | Amylase, Serum | Anion Gap | ... |
---|---|---|---|---|---|---|---|---|---|---|
A100008 | 0.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A100008 | 2.0 | 1.2 | 26.0 | 38.0 | NaN | 2.9 | 75.0 | NaN | 9.0 | ... |
A100008 | 3.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A100008 | 4.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
Each patent is labeled with whether they have been identified for sepsis 2 onset.
adm_id | sepsis2 |
---|---|
A100001 | 0 |
A100002 | 0 |
A100003 | 0 |
A100004 | 0 |
A100005 | 0 |
A100006 | 0 |
Total data size:
Evaluation: Standard AUC, with randomly supplied samples from the testing cohort. We will test:
Goal: To predict whether the patient will die in the hospital within 30 days, using up to 48 hours of data before sepsis onset.
We provide the clinical events and lab test results between Tonset - 48 and Tonset - 4 for each sepsis 2 patient, in the matrix format. The time is offset by Tonset.
adm_id | event_time | A/G Ratio | ALT/SGPT | AST/SGOT | Albumin Quant | Albumin, Serum | Alk Phos, Serum | Amylase, Serum | Anion Gap | ... |
---|---|---|---|---|---|---|---|---|---|---|
A1000019 | -47.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -46.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -45.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -45.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
A1000019 | -44.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... |
Each patent is labeled with their mortality status and the time between Tonset and Tdischarge.
adm_id | time | mortality |
---|---|---|
A100079 | 200.5 | 0 |
A100244 | 78.5 | 0 |
A100328 | 78.5 | 0 |
A100388 | 55.5 | 0 |
A100398 | 117.0 | 0 |
Total data size:
Evaluation:
Cumulative case/dynamic control ROC; judge performance on multiple timestamps to see how well and how early (relative to mortality/discharge) the model Mi can obtain a good prediction from t0nset
Evaluate and compare using R Package timeROC
https://cran.r-project.org/web/packages/timeROC/index.html
sensitivityC(c,t) = P(Mi > c|Ti < t)
specificityD(c,t) = P(Mi < c|Ti > t)
Using different time cutoffs t to calculate AUC (in the traditional way) allows one to access the model's performance in predicting short-term, medium term, and long term mortality risk after sepsis onset.
While many machine learning models can conduct classification and regression tasks, not all of them achieve valid interpretation that potentially enables the application of findings to better inform decision support in the clinical setting.
There is no means of providing interpretability (e.g., automatic decisions on the threshold, finding combined patterns, designing novel visualizations, etc.), without evaluation from human experts. We have assembled a group of machine learning and clinical experts to judge the Challenge innovation track, which will be focused on interpretability.
The prediction result must be submitted via the SECURESTOR submission directory (one is assigned for each team).
For Task 1, please submit the probability that the patient will have sepsis 2 onset in the next 4 hours
For Task 2, please submit the probability for the patient’s mortality within 30 days
The submission must be in CSV (comma-separated) format, with column headers. Below is the sample layout for both tasks 1 and 2.
adm_id | probability |
---|---|
A100079 | 0.98330 |
A100093 | 0.34455 |
A100044 | 0.12333 |
A100046 | 0.23322 |
Rules: