Predicting 72-hour and 9-day return to the emergency department using machine learning

Woo Suk Hong; Adrian Daniel Haimovich; Richard Andrew Taylor

doi:10.1093/jamiaopen/ooz019

Predicting 72-hour and 9-day return to the emergency department using machine learning

JAMIA Open. 2019 Jul 1;2(3):346-352. doi: 10.1093/jamiaopen/ooz019. eCollection 2019 Oct.

Authors

Woo Suk Hong¹, Adrian Daniel Haimovich², Richard Andrew Taylor²

Affiliations

¹ Yale School of Medicine, New Haven, Connecticut, USA.
² Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA.

Abstract

Objectives: To predict 72-h and 9-day emergency department (ED) return by using gradient boosting on an expansive set of clinical variables from the electronic health record.

Methods: This retrospective study included all adult discharges from a level 1 trauma center ED and a community hospital ED covering the period of March 2013 to July 2017. A total of 1500 variables were extracted for each visit, and samples split randomly into training, validation, and test sets (80%, 10%, and 10%). Gradient boosting models were fit on 3 selections of the data: administrative data (demographics, prior hospital usage, and comorbidity categories), data available at triage, and the full set of data available at discharge. A logistic regression (LR) model built on administrative data was used for baseline comparison. Finally, the top 20 most informative variables identified from the full gradient boosting models were used to build a reduced model for each outcome.

Results: A total of 330 631 discharges were available for analysis, with 29 058 discharges (8.8%) resulting in 72-h return and 52 748 discharges (16.0%) resulting in 9-day return to either ED. LR models using administrative data yielded test AUCs of 0.69 (95% confidence interval [CI] 0.68-0.70) and 0.71(95% CI 0.70-0.72), while gradient boosting models using administrative data yielded test AUCs of 0.73 (95% CI 0.72-0.74) and 0.74 (95% CI 0.73-0.74) for 72-h and 9-day return, respectively. Gradient boosting models using variables available at triage yielded test AUCs of 0.75 (95% CI 0.74-0.76) and 0.75 (95% CI 0.74-0.75), while those using the full set of variables yielded test AUCs of 0.76 (95% CI 0.75-0.77) and 0.75 (95% CI 0.75-0.76). Reduced models using the top 20 variables yielded test AUCs of 0.73 (95% CI 0.71-0.74) and 0.73 (95% CI 0.72-0.74).

Discussion and conclusion: Gradient boosting models leveraging clinical data are superior to LR models built on administrative data at predicting 72-h and 9-day returns.

Keywords: decision support techniques; emergency medicine; machine learning.