A scalable method for supporting multiple patient cohort discovery projects using i2b2

J Biomed Inform. 2018 Aug:84:179-183. doi: 10.1016/j.jbi.2018.07.010. Epub 2018 Jul 19.

Abstract

Although i2b2, a popular platform for patient cohort discovery using electronic health record (EHR) data, can support multiple projects specific to individual disease areas or research interests, the standard approach for doing so duplicates data across projects, requiring additional disk space and processing time, which limits scalability. To address this deficiency, we developed a novel approach that stored data in a single i2b2 fact table and used structured query language (SQL) views to access data for specific projects. Compared to the standard approach, the view-based approach reduced required disk space by 59% and extract-transfer-load (ETL) time by 46%, without substantially impacting query performance. The view-based approach has enabled scalability of multiple i2b2 projects and generalized to another data model at our institution. Other institutions may benefit from this approach, code of which is available on GitHub (https://github.com/wcmc-research-informatics/super-i2b2).

Keywords: Clinical data warehouse; Cohort discovery; Electronic health record; I2b2; Scalability; Secondary use.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Academic Medical Centers
  • Algorithms
  • Cohort Studies
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval
  • Language
  • Medical Informatics / methods*
  • Medical Informatics / organization & administration*
  • New York
  • Reproducibility of Results
  • Software
  • Translational Research, Biomedical / organization & administration