Robustness of deep learning segmentation of cardiac substructures in noncontrast computed tomography for breast cancer radiotherapy

Med Phys. 2021 Nov;48(11):7172-7188. doi: 10.1002/mp.15237. Epub 2021 Sep 30.

Abstract

Purpose: To develop and evaluate deep learning-based autosegmentation of cardiac substructures from noncontrast planning computed tomography (CT) images in patients undergoing breast cancer radiotherapy and to investigate the algorithm sensitivity to out-of-distribution data such as CT image artifacts.

Methods: Nine substructures including aortic valve (AV), left anterior descending (LAD), tricuspid valve (TV), mitral valve (MV), pulmonic valve (PV), right atrium (RA), right ventricle (RV), left atrium (LA), and left ventricle (LV) were manually delineated by a radiation oncologist on noncontrast CT images of 129 patients with breast cancer; among them 90 were considered in-distribution data, also named as "clean" data. The image/label pairs of 60 subjects were used to train a 3D deep neural network while the remaining 30 were used for testing. The rest of the 39 patients were considered out-of-distribution ("outlier") data, which were used to test robustness. Random rigid transformations were used to augment the dataset during training. We investigated multiple loss functions, including Dice similarity coefficient (DSC), cross-entropy (CE), Euclidean loss as well as the variation and combinations of these, data augmentation, and network size on overall performance and sensitivity to image artifacts due to infrequent events such as the presence of implanted devices. The predicted label maps were compared to the ground-truth labels via DSC and mean and 90th percentile symmetric surface distance (90th-SSD).

Results: When modified Dice combined with cross-entropy (MD-CE) was used as the loss function, the algorithm achieved a mean DSC = 0.79 ± 0.07 for chambers and 0.39 ± 0.10 for smaller substructures (valves and LAD). The mean and 90th-SSD were 2.7 ± 1.4 and 6.5 ± 2.8 mm for chambers and 4.1 ± 1.7 and 8.6 ± 3.2 mm for smaller substructures. Models with MD-CE, Dice-CE, MD, and weighted CE loss had highest performance, and were statistically similar. Data augmentation did not affect model performances on both clean and outlier data and model robustness was susceptible to network size. For a certain type of outlier data, robustness can be improved via incorporating them into the training process. The execution time for segmenting each patient was on an average 2.1 s.

Conclusions: A deep neural network provides a fast and accurate segmentation of large cardiac substructures in noncontrast CT images. Model robustness of two types of clinically common outlier data were investigated and potential approaches to improve them were explored. Evaluation of clinical acceptability and integration into clinical workflow are pending.

Keywords: breast radiotherapy; cardiac substructures; deep learning; image segmentation; model robustness.

MeSH terms

  • Breast
  • Breast Neoplasms* / diagnostic imaging
  • Breast Neoplasms* / radiotherapy
  • Deep Learning*
  • Female
  • Heart
  • Humans
  • Image Processing, Computer-Assisted
  • Tomography, X-Ray Computed