A comparison of natural language processing to ICD-10 codes for identification and characterization of pulmonary embolism

Thromb Res. 2021 Jul:203:190-195. doi: 10.1016/j.thromres.2021.04.020. Epub 2021 May 6.

Abstract

Introduction: The 10th revision of the International Classification of Diseases (ICD-10) codes is frequently used to identify pulmonary embolism (PE) events, although the validity of ICD-10 has been questioned. Natural language processing (NLP) is a novel tool that may be useful for pulmonary embolism identification.

Methods: We performed a retrospective comparative accuracy study of 1000 randomly selected healthcare encounters with a CT pulmonary angiogram ordered between January 1, 2019 and January 1, 2020 at a single academic medical center. Two independent observers reviewed each radiology report and abstracted key findings related to PE presence/absence, chronicity, and anatomic location. NLP interpretations of radiology reports and ICD-10 codes were queried electronically and compared to the reference standard, manual chart review.

Results: A total of 970 encounters were included for analysis. The prevalence of PE was 13% by manual review. For PE identification, sensitivity was similar between NLP (96.0%) and ICD-10 (92.9%; p = 0.405), and specificity was significantly higher with NLP (97.7%) compared to ICD-10 (91.0%; p < 0.001). NLP demonstrated higher sensitivity (70.0% vs 16.5%, p < 0.001) and specificity (99.9% vs 99.4%, p = 0.014) for saddle/main PE recognition, and significantly higher sensitivity (86.7% vs 8.3%, p < 0.001) and specificity (99.8% vs 96.5%, p < 0.001) for subsegmental PE compared to ICD-10.

Conclusions: NLP is highly sensitive for PE identification and more specific than ICD-10 coding. NLP outperformed ICD-10 coding for recognition of subsegmental, saddle, and chronic PE. Our results suggest NLP is an efficient and more reliable method than ICD-10 for PE identification and characterization.

Keywords: Accuracy; ICD-10; NLP; Natural language processing; Pulmonary embolism.

MeSH terms

  • Algorithms
  • Humans
  • International Classification of Diseases
  • Natural Language Processing*
  • Pulmonary Embolism* / diagnosis
  • Retrospective Studies