A Stress Test of Artificial Intelligence: Can Deep Learning Models Trained From Formal Echocardiography Accurately Interpret Point-of-Care Ultrasound?

J Ultrasound Med. 2022 Dec;41(12):3003-3012. doi: 10.1002/jum.16007. Epub 2022 May 12.

Abstract

Objectives: To test if a deep learning (DL) model trained on echocardiography images could accurately segment the left ventricle (LV) and predict ejection fraction on apical 4-chamber images acquired by point-of-care ultrasound (POCUS).

Methods: We created a dataset of 333 videos from cardiac POCUS exams acquired in the emergency department. For each video we derived two ground-truth labels. First, we segmented the LV from one image frame and second, we classified the EF as normal, reduced, or severely reduced. We then classified the media's quality as optimal, adequate, or inadequate. With this dataset we tested the accuracy of automated LV segmentation and EF classification by the best-in-class echocardiography trained DL model EchoNet-Dynamic.

Results: The mean Dice similarity coefficient for LV segmentation was 0.72 (N = 333; 95% CI 0.70-0.74). Cohen's kappa coefficient for agreement between predicted and ground-truth EF classification was 0.16 (N = 333). The area under the receiver-operating curve for the diagnosis of heart failure was 0.74 (N = 333). Model performance improved with video quality for the tasks of LV segmentation and diagnosis of heart failure, but was unchanged with EF classification. For all tasks the model was less accurate than the published benchmarks for EchoNet-Dynamic.

Conclusions: Performance of a DL model trained on formal echocardiography worsened when challenged with images captured during resuscitations. DL models intended for assessing bedside ultrasound should be trained on datasets composed of POCUS images. Such datasets have yet to be made publicly available.

Keywords: artificial intelligence; deep learning; echocardiography; point of care ultrasound.

MeSH terms

  • Artificial Intelligence
  • Deep Learning*
  • Echocardiography / methods
  • Exercise Test
  • Heart Failure*
  • Humans
  • Point-of-Care Systems