Metrics to evaluate the performance of auto-segmentation for radiation treatment planning: A critical review

Michael V Sherer; Diana Lin; Sharif Elguindi; Simon Duke; Li-Tee Tan; Jon Cacicedo; Max Dahele; Erin F Gillespie

doi:10.1016/j.radonc.2021.05.003

Metrics to evaluate the performance of auto-segmentation for radiation treatment planning: A critical review

Radiother Oncol. 2021 Jul:160:185-191. doi: 10.1016/j.radonc.2021.05.003. Epub 2021 May 11.

Authors

Michael V Sherer¹, Diana Lin², Sharif Elguindi³, Simon Duke⁴, Li-Tee Tan⁴, Jon Cacicedo⁵, Max Dahele⁶, Erin F Gillespie⁷

Affiliations

¹ Department of Radiation Medicine and Applied Sciences, University of California San Diego, La Jolla, United States.
² Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, United States.
³ Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, United States.
⁴ Department of Oncology, Cambridge University Hospitals, United Kingdom.
⁵ Department of Radiation Oncology, Cruces University Hospital/BioCruces Health Research Institute, Osakidetza, Barakaldo, Spain.
⁶ Department of Radiation Oncology, Amsterdam University Medical Center, Amsterdam, The Netherlands.
⁷ Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, United States. Electronic address: efgillespie@ucsd.edu.

Abstract

Advances in artificial intelligence-based methods have led to the development and publication of numerous systems for auto-segmentation in radiotherapy. These systems have the potential to decrease contour variability, which has been associated with poor clinical outcomes and increased efficiency in the treatment planning workflow. However, there are no uniform standards for evaluating auto-segmentation platforms to assess their efficacy at meeting these goals. Here, we review the most frequently used evaluation techniques which include geometric overlap, dosimetric parameters, time spent contouring, and clinical rating scales. These data suggest that many of the most commonly used geometric indices, such as the Dice Similarity Coefficient, are not well correlated with clinically meaningful endpoints. As such, a multi-domain evaluation, including composite geometric and/or dosimetric metrics with physician-reported assessment, is necessary to gauge the clinical readiness of auto-segmentation for radiation treatment planning.

Keywords: Auto-segmentation; Contouring; Quality assurance; Treatment planning.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, P.H.S.
Review

MeSH terms

Artificial Intelligence*
Benchmarking*
Humans
Organs at Risk
Radiometry
Radiotherapy Planning, Computer-Assisted

Abstract

Publication types

MeSH terms

Grants and funding