Improving estimation efficiency for regression with MNAR covariates

Biometrics. 2020 Mar;76(1):270-280. doi: 10.1111/biom.13131. Epub 2019 Nov 7.

Abstract

For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15, 719-730) and Xie and Zhang (2017, Int J Biostat 13, 1-20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.

Keywords: complete-case analysis; empirical likelihood; estimating equations; missing covariates; missing not at random.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Analysis of Variance
  • Bias
  • Biometry / methods*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Likelihood Functions
  • Models, Statistical
  • Nutrition Surveys / statistics & numerical data
  • Probability
  • Regression Analysis*
  • United States