The effect of number of healthcare visits on study sample selection in electronic health record data

Laura J Rasmussen-Torvik; Al'ona Furmanchuk; Alexander J Stoddard; Kristen I Osinski; John R Meurer; Nicholas Smith; Elizabeth Chrischilles; Bernard S Black; Abel Kho

doi:10.23889/ijpds.v5i1.1156

The effect of number of healthcare visits on study sample selection in electronic health record data

Int J Popul Data Sci. 2020;5(1):1156. doi: 10.23889/ijpds.v5i1.1156. Epub 2020 Apr 2.

Authors

Laura J Rasmussen-Torvik¹, Al'ona Furmanchuk², Alexander J Stoddard³, Kristen I Osinski³, John R Meurer³, Nicholas Smith⁴, Elizabeth Chrischilles⁴, Bernard S Black⁵, Abel Kho²

Affiliations

¹ Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
² Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, IL 60611.
³ Clinical and Translational Science Institute/Institute for Health & Equity, Medical College of Wisconsin, Milwaukee, WI, 53226.
⁴ Department of Epidemiology, College of Public Health, University of Iowa, Iowa City, 52242.
⁵ Pritzker School of Law and Kellogg School of Management, Northwestern University, Chicago, IL 60611.

Abstract

Introduction: Few studies have addressed how to select a study sample when using electronic health record (EHR) data.

Objective: To examine how changing criterion for number of visits in EHR data required for inclusion in a study sample would impact one basic epidemiologic measure: estimates of disease period prevalence.

Methods: Year 2016 EHR data from three Midwestern health systems (Northwestern Medicine in Illinois, University of Iowa Health Care, and Froedtert & the Medical College of Wisconsin, all regional tertiary health care systems including hospitals and clinics) was used to examine how alternate definitions of the study sample, based on number of healthcare visits in one year, affected measures of disease period prevalence. In 2016, each of these health systems saw between 160,000 and 420,000 unique patients. Curated collections of ICD-9, ICD-10, and SNOMED codes (from CMS-approved electronic clinical quality measures) were used to define three diseases: acute myocardial infarction, asthma, and diabetic nephropathy).

Results: Across all health systems, increasing the minimum required number of visits to be included in the study sample monotonically increased crude period prevalence estimates. The rate at which prevalence estimates increased with number of visits varied across sites and across diseases.

Conclusions: In addition to providing thorough descriptions of case definitions, when using EHR data authors must carefully describe how a study sample is identified and report data for a range of sample definitions, including minimum number of visits, so that others can assess the sensitivity of reported results to sample definition in EHR data.

Keywords: Electronic Health Records; Methods; Prevalence; Sampling Studies.

Abstract

Grants and funding