Robust data integration from multiple external sources for generalized linear models with binary outcomes

Biometrics. 2024 Jan 29;80(1):ujad005. doi: 10.1093/biomtc/ujad005.

Abstract

We aim to estimate parameters in a generalized linear model (GLM) for a binary outcome when, in addition to the raw data from the internal study, more than 1 external study provides summary information in the form of parameter estimates from fitting GLMs with varying subsets of the internal study covariates. We propose an adaptive penalization method that exploits the external summary information and gains efficiency for estimation, and that is both robust and computationally efficient. The robust property comes from exploiting the relationship between parameters of a GLM and parameters of a GLM with omitted covariates and from downweighting external summary information that is less compatible with the internal data through a penalization. The computational burden associated with searching for the optimal tuning parameter for the penalization is reduced by using adaptive weights and by using an information criterion when searching for the optimal tuning parameter. Simulation studies show that the proposed estimator is robust against various types of population distribution heterogeneity and also gains efficiency compared to direct maximum likelihood estimation. The method is applied to improve a logistic regression model that predicts high-grade prostate cancer making use of parameter estimates from 2 external models.

Keywords: adaptive weights; generalized information criterion; penalization; ratio of parameters; robustness.

MeSH terms

  • Computer Simulation
  • Humans
  • Likelihood Functions
  • Linear Models
  • Logistic Models
  • Male
  • Models, Statistical*
  • Regression Analysis