Early prediction of sepsis-induced respiratory tract infection using a biomarker-based machine-learning algorithm

Mingkuan Su; Haiying Wu; Hongbin Chen; Jianfeng Guo; Zongyun Chen; Jie Qiu; Jiancheng Huang

doi:10.1080/00365513.2024.2346914

Early prediction of sepsis-induced respiratory tract infection using a biomarker-based machine-learning algorithm

Scand J Clin Lab Invest. 2024 Apr 29:1-9. doi: 10.1080/00365513.2024.2346914. Online ahead of print.

Authors

Mingkuan Su^{1

2}, Haiying Wu^{1

2}, Hongbin Chen^{1

2}, Jianfeng Guo^{1

2}, Zongyun Chen^{1

2}, Jie Qiu^{1

2}, Jiancheng Huang^{1

2}

Affiliations

¹ Department of Laboratory Medicine, Mindong Hospital of Ningde City, Fuan City, China.
² Department of Laboratory Medicine, Mindong Hospital Affiliated to Fujian Medical University, Fuan City, China.

PMID: 38683948
DOI: 10.1080/00365513.2024.2346914

Abstract

Early and differential diagnosis of sepsis is essential to avoid unnecessary antibiotic use and further reduce patient morbidity and mortality. Here, we aimed to identify predictors of sepsis and advance a machine-learning strategy to predict sepsis-induced respiratory tract infection (RTI). Patients with sepsis and RTI were selected via retrospective analysis, and essential population characteristics and laboratory parameters were recorded. To improve the performance of the primary model and avoid over-fitting, a recursive feature elimination with cross-validation (RFECV) strategy was used to screen the optimal subset of biomarkers and construct nine machine-learning models based on this subset; the average accuracy, precision, recall, and F1-score were used for evaluation of the models. We identified 430 patients with sepsis and 686 patients with RTI. A total of 39 features were collected, with 23 features identified for initial model construction. Using the RFECV algorithm, we found that the XGBoost classifier, which only needed to include seven biomarkers, demonstrated the best performance among all prediction models, with an average accuracy of 89.24 ± 2.28, while the Ridge classifier, which included 11 biomarkers, had an average accuracy of only 83.87 ± 4.69. The remaining models had prediction accuracies greater than 88%. We developed nine models for predicting sepsis using a strategy that combined RFECV with machine learning. Among these models, the XGBoost classifier, which included seven biomarkers, showed the best performance and highest accuracy for predicting sepsis and may be a promising tool for the timely identification of sepsis.

Keywords: D-dimer; Machine learning; procalcitonin; respiratory tract infection; sepsis.