Efficient designs and analysis of two-phase studies with longitudinal binary data

Biometrics. 2024 Jan 29;80(1):ujad010. doi: 10.1093/biomtc/ujad010.

Abstract

Researchers interested in understanding the relationship between a readily available longitudinal binary outcome and a novel biomarker exposure can be confronted with ascertainment costs that limit sample size. In such settings, two-phase studies can be cost-effective solutions that allow researchers to target informative individuals for exposure ascertainment and increase estimation precision for time-varying and/or time-fixed exposure coefficients. In this paper, we introduce a novel class of residual-dependent sampling (RDS) designs that select informative individuals using data available on the longitudinal outcome and inexpensive covariates. Together with the RDS designs, we propose a semiparametric analysis approach that efficiently uses all data to estimate the parameters. We describe a numerically stable and computationally efficient EM algorithm to maximize the semiparametric likelihood. We examine the finite sample operating characteristics of the proposed approaches through extensive simulation studies, and compare the efficiency of our designs and analysis approach with existing ones. We illustrate the usefulness of the proposed RDS designs and analysis method in practice by studying the association between a genetic marker and poor lung function among patients enrolled in the Lung Health Study (Connett et al, 1993).

Keywords: EM algorithm; biased sampling; lung health study; outcome-dependent sampling; semiparametric efficiency; sieve approximation.

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Longitudinal Studies
  • Models, Statistical*
  • Probability
  • Sample Size
  • Sampling Studies