How Do Programs Measure Resident Performance? A Multi-Institutional Inventory of General Surgery Assessments

J Surg Educ. 2021 Nov-Dec;78(6):e189-e195. doi: 10.1016/j.jsurg.2021.08.024. Epub 2021 Sep 28.

Abstract

Objective: To perform an inventory of assessment tools in use at surgical residency programs and their alignment with the Milestone Competencies.

Design: We conducted an inventory of all assessment tools from a sample of general surgery training programs participating in a multi-center study of resident operative development in the United States. Each instrument was categorized using a data extraction tool designed to identify criteria for effective assessment in competency based education and according to which Milestone Competency was being evaluated. Tabulations of each category were then analyzed using descriptive statistics. Interviews with program directors and assessment coordinators were conducted to understand each instrument's intended use within each program.

Setting: Multi-institutional review of general surgery assessment programs.

Participants: We identified assessment tools used by 10 general surgery programs during the 2019 to 2020 academic year. Programs were selected from a cohort already participating in a separate research study of resident operative development in the United States.

Results: We identified 42 unique assessment tools used. Each program used an average of 7.2 (range 4-13) unique assessment instruments to measure performance, of which only 5 (11.9%) were used by at least 1 other program in our sample. Of all assessments, 59.5% were used monthly or less frequently. The majority (66.7%) of instruments were retrospective global assessments, rather than discrete observed performances. There were 4 (9.5%) instruments with established reliability or validity evidence. Across programs there was also significant variation in the volume of assessment used to evaluate residents, with the median total number of evaluations/trainee across all Milestone Competencies being 217 (IQR 78) per year. Patient care was the most frequently evaluated Milestone Competency.

Conclusions: General surgical assessment systems predominantly employ non-standardized global assessment tools that lack reliability or validity evidence. This variability makes it challenging to interpret and compare competency standards across programs. A standardized assessment toolkit with established reliability and validity evidence would allow training programs to measure the competence of their trainees more uniformly and understand where improvements in our training system can be made.

Keywords: Assessment; Competency measures; Milestones; Surgical education.

Publication types

  • Multicenter Study

MeSH terms

  • Clinical Competence
  • Education, Medical, Graduate
  • General Surgery* / education
  • Humans
  • Internship and Residency*
  • Reproducibility of Results
  • Retrospective Studies
  • United States