Racial and Ethnic Bias in Risk Prediction Models for Colorectal Cancer Recurrence When Race and Ethnicity Are Omitted as Predictors

JAMA Netw Open. 2023 Jun 1;6(6):e2318495. doi: 10.1001/jamanetworkopen.2023.18495.

Abstract

Importance: Including race and ethnicity as a predictor in clinical risk prediction algorithms has received increased scrutiny, but there continues to be a lack of empirical studies addressing whether simply omitting race and ethnicity from the algorithms will ultimately affect decision-making for patients of minoritized racial and ethnic groups.

Objective: To examine whether including race and ethnicity as a predictor in a colorectal cancer recurrence risk algorithm is associated with racial bias, defined as racial and ethnic differences in model accuracy that could potentially lead to unequal treatment.

Design, setting, and participants: This retrospective prognostic study was conducted using data from a large integrated health care system in Southern California for patients with colorectal cancer who received primary treatment between 2008 and 2013 and follow-up until December 31, 2018. Data were analyzed from January 2021 to June 2022.

Main outcomes and measures: Four Cox proportional hazards regression prediction models were fitted to predict time from surveillance start to cancer recurrence: (1) a race-neutral model that explicitly excluded race and ethnicity as a predictor, (2) a race-sensitive model that included race and ethnicity, (3) a model with 2-way interactions between clinical predictors and race and ethnicity, and (4) separate models by race and ethnicity. Algorithmic fairness was assessed using model calibration, discriminative ability, false-positive and false-negative rates, positive predictive value (PPV), and negative predictive value (NPV).

Results: The study cohort included 4230 patients (mean [SD] age, 65.3 [12.5] years; 2034 [48.1%] female; 490 [11.6%] Asian, Hawaiian, or Pacific Islander; 554 [13.1%] Black or African American; 937 [22.1%] Hispanic; and 2249 [53.1%] non-Hispanic White). The race-neutral model had worse calibration, NPV, and false-negative rates among racial and ethnic minority subgroups than non-Hispanic White individuals (eg, false-negative rate for Hispanic patients: 12.0% [95% CI, 6.0%-18.6%]; for non-Hispanic White patients: 3.1% [95% CI, 0.8%-6.2%]). Adding race and ethnicity as a predictor improved algorithmic fairness in calibration slope, discriminative ability, PPV, and false-negative rates (eg, false-negative rate for Hispanic patients: 9.2% [95% CI, 3.9%-14.9%]; for non-Hispanic White patients: 7.9% [95% CI, 4.3%-11.9%]). Inclusion of race interaction terms or using race-stratified models did not improve model fairness, likely due to small sample sizes in subgroups.

Conclusions and relevance: In this prognostic study of the racial bias in a cancer recurrence risk algorithm, removing race and ethnicity as a predictor worsened algorithmic fairness in multiple measures, which could lead to inappropriate care recommendations for patients who belong to minoritized racial and ethnic groups. Clinical algorithm development should include evaluation of fairness criteria to understand the potential consequences of removing race and ethnicity for health inequities.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Aged
  • Asian American Native Hawaiian and Pacific Islander
  • Black or African American
  • Colorectal Neoplasms* / diagnosis
  • Ethnicity*
  • Female
  • Hispanic or Latino
  • Humans
  • Male
  • Middle Aged
  • Minority Groups
  • Retrospective Studies
  • White