Deductible imputation in administrative medical claims datasets

Health Serv Res. 2024 Apr;59(2):e14278. doi: 10.1111/1475-6773.14278. Epub 2024 Jan 17.

Abstract

Objective: To validate imputation methods used to infer plan-level deductibles and determine which enrollees are in high-deductible health plans (HDHPs) in administrative claims datasets.

Data sources and study setting: 2017 medical and pharmaceutical claims from OptumLabs Data Warehouse for US individuals <65 continuously enrolled in an employer-sponsored plan. Data include enrollee and plan characteristics, deductible spending, plan spending, and actual plan-level deductibles.

Study design: We impute plan deductibles using four methods: (1) parametric prediction using individual-level spending; (2) parametric prediction with imputation and plan characteristics; (3) highest plan-specific mode of individual annual deductible spending; and (4) deductible spending at the 80th percentile among individuals meeting their deductible. We compare deductibles' levels and categories for imputed versus actual deductibles.

Data collection/extraction methods: Not applicable.

Principal findings: All methods had a positive predictive value (PPV) for determining high- versus low-deductible plans of ≥87%; negative predictive values (NPV) were lower. The method imputing plan-specific deductible spending modes was most accurate and least computationally intensive (PPV: 95%; NPV: 91%). This method also best correlated with actual deductible levels; 69% of imputed deductibles were within $250 of the true deductible.

Conclusions: In the absence of plan structure data, imputing plan-specific modes of individual annual deductible spending best correlates with true deductibles and best predicts enrollees in HDHPs.

Keywords: data analysis; health insurance; high-deductible health plans; research methods.

MeSH terms

  • Deductibles and Coinsurance*
  • Health Planning*
  • Humans