How subgroup analyses can miss the trees for the forest plots: A simulation study

J Clin Epidemiol. 2020 Oct:126:65-70. doi: 10.1016/j.jclinepi.2020.06.020. Epub 2020 Jun 19.

Abstract

Objectives: Subgroup analyses of clinical trial data can be an important tool for understanding when treatment effects differ across populations. That said, even effect estimates from prespecified subgroups in well-conducted trials may not apply to corresponding subgroups in the source population. While this divergence may simply reflect statistical imprecision, there has been less discussion of systematic or structural sources of misleading subgroup estimates.

Study design and setting: We use directed acyclic graphs to show how selection bias caused by associations between effect measure modifiers and trial selection, whether explicit (e.g., eligibility criteria) or implicit (e.g., self-selection based on race), can result in subgroup estimates that do not correspond to subgroup effects in the source population. To demonstrate this point, we provide a hypothetical example illustrating the sorts of erroneous conclusions that can result, as well as their potential consequences. We also provide a tool for readers to explore additional cases.

Conclusion: Treating subgroups within a trial essentially as random samples of the corresponding subgroups in the wider population can be misleading, even when analyses are conducted rigorously and all findings are internally valid. Researchers should carefully examine associations between (and consider adjusting for) variables when attempting to identify heterogeneous treatment effects.

Keywords: Subgroups; causal graphs; external validity; selection bias.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biometry / methods
  • Clinical Trials as Topic
  • Computer Simulation / statistics & numerical data*
  • Data Interpretation, Statistical
  • Female
  • Humans
  • Male
  • Models, Statistical
  • Models, Theoretical
  • Myocardial Infarction / epidemiology
  • Myocardial Infarction / ethnology*
  • Myocardial Infarction / mortality
  • Reproducibility of Results
  • Research Design / statistics & numerical data*
  • Research Design / trends
  • Sample Size
  • Selection Bias