Number of Operative Performance Ratings Needed to Reliably Assess the Difficulty of Surgical Procedures

Kenneth L Abbott; Xilin Chen; Michael Clark; Nikki L Bibler Zaidi; David B Swanson; Brian C George

doi:10.1016/j.jsurg.2019.07.008

Number of Operative Performance Ratings Needed to Reliably Assess the Difficulty of Surgical Procedures

J Surg Educ. 2019 Nov-Dec;76(6):e189-e192. doi: 10.1016/j.jsurg.2019.07.008. Epub 2019 Sep 6.

Authors

Kenneth L Abbott¹, Xilin Chen², Michael Clark³, Nikki L Bibler Zaidi¹, David B Swanson⁴, Brian C George⁵

Affiliations

¹ University of Michigan Medical School, Ann Arbor, Michigan.
² Center for Surgical Training and Research, Department of Surgery, University of Michigan, Ann Arbor, Michigan.
³ Consulting for Statistics, Computing, and Analytics Research, University of Michigan, Ann Arbor, Michigan.
⁴ American Board of Medical Specialties, Chicago, Illinois; University of Melbourne Medical School, Melbourne, Victoria.
⁵ Center for Surgical Training and Research, Department of Surgery, University of Michigan, Ann Arbor, Michigan. Electronic address: bcgeorge@med.umich.edu.

PMID: 31501065
DOI: 10.1016/j.jsurg.2019.07.008

Abstract

Objective: The profession of surgery is entering a new era of "big data," where analyses of longitudinal trainee assessment data will be used to inform ongoing efforts to improve surgical education. Given the high-stakes implications of these types of analyses, researchers must define the conditions under which estimates derived from these large datasets remain valid. With this study, we determine the number of assessments of residents' performances needed to reliably assess the difficulty of "Core" surgical procedures.

Design: Using the SIMPL smartphone application from the Procedural Learning and Safety Collaborative, 402 attending surgeons directly observed and provided workplace-based assessments for 488 categorical residents after 5259 performances of 87 Core surgical procedures performed at 14 institutions. We used these faculty ratings to construct a linear mixed model with resident performance as the outcome variable and multiple predictors including, most significantly, the operative procedure as a random effect. We interpreted the variance in performance ratings attributable to the procedure, after controlling for other variables, as the "difficulty" of performing the procedure. We conducted a generalizability analysis and decision study to estimate the number of SIMPL performance ratings needed to reliably estimate the difficulty of a typical Core procedure.

Results: Twenty-four faculty ratings of resident operative performance were necessary to reliably estimate the difficulty of a typical Core surgical procedure (mean dependability coefficient 0.80, 95% confidence interval 0.73-0.87).

Conclusions: At least 24 operative performance ratings are required to reliably estimate the difficulty of a typical Core surgical procedure. Future research using performance ratings to establish procedure difficulty should include adequate numbers of ratings given the high-stakes implications of those results for curriculum design and policy.

Keywords: Core; Medical Knowledge; Practice-Based Learning and Improvement; Systems-Based Practice; dependability; difficulty; generalizability; performance; procedure.

MeSH terms

Adult
Big Data
Clinical Competence*
Educational Measurement
Employee Performance Appraisal*
Female
General Surgery / education*
Humans
Internship and Residency
Male
Mobile Applications
Professional Autonomy
Reproducibility of Results
Surgical Procedures, Operative / standards*