Tree-based subgroup discovery using electronic health record data: heterogeneity of treatment effects for DTG-containing therapies

Biostatistics. 2024 Apr 15;25(2):323-335. doi: 10.1093/biostatistics/kxad014.

Abstract

The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the subgroup discovery for longitudinal data algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus who are at higher risk of weight gain when receiving dolutegravir (DTG)-containing antiretroviral therapies (ARTs) versus when receiving non-DTG-containing ARTs.

Keywords: Causal inference; Dolutegravir; Electronic health record; Heterogeneity of treatment effects; Longitudinal targeted maximum likelihood estimation; Machine learning; Recursive partitioning; Subgroup discovery.

MeSH terms

  • Electronic Health Records*
  • HIV Infections* / drug therapy
  • Heterocyclic Compounds, 3-Ring*
  • Humans
  • Oxazines
  • Piperazines*
  • Pyridones*
  • Treatment Effect Heterogeneity

Substances

  • dolutegravir
  • Oxazines
  • Heterocyclic Compounds, 3-Ring
  • Piperazines
  • Pyridones