Automated linkage of patient records from disparate sources

Stat Methods Med Res. 2018 Jan;27(1):172-184. doi: 10.1177/0962280215626180. Epub 2016 Jul 20.

Abstract

We introduce an automated method of record linkage that has two key features, automated selection of match field interactions to include in the model for estimation and automated threshold determination for classifying record pairs to matches or non-matches. We applied our method to two real-world examples. The first example demonstrated results consistent with our earlier work: When data quality is adequate and the match field discriminating power is high, matching algorithms exhibit similar performance. The second example demonstrated that our method yields a lower false positive rate and higher positive predictive value than the Fellegi-Sunter model in the face of low data quality. When compared to the Fellegi-Sunter model, simulation studies suggest that our method exhibits better overall performance as indicated by higher area under the curve, and less biased estimates for both the match prevalence rate and the m- and u-probabilities over a range of data scenarios, especially when the match prevalence is extreme. Computationally, our method is as efficient as the Fellegi-Sunter model. We recommend this method in situations that an unsupervised linking algorithm is needed.

Keywords: Diagnostic tests; Fellegi-Sunter model; latent class model; log-linear model; patient matching; record linkage.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Automation*
  • Diagnostic Tests, Routine
  • Medical Record Linkage / methods*