Temporal properties of diagnosis code time series in aggregate

IEEE J Biomed Health Inform. 2013 Mar;17(2):477-83. doi: 10.1109/JBHI.2013.2244610.

Abstract

Time series are essential to health data research and data mining. We aim to study the properties of one of the more commonly available but historically unreliable types of data: administrative diagnoses in the form of the International Classification of Diseases, Ninth Revision (ICD9) codes. We use differential entropy of ICD9 code time series as a surrogate measure for disease time course and also explore Gaussian kernel smoothing to characterize the time course of diseases in a more fine-grained way. Compared to a gold standard created by a panel of clinicians, the first model classified diseases into acute and chronic groups with a receiver operating characteristic area under curve of 0.83. In the second model, several characteristic temporal profiles were observed including permanent, chronic, and acute. In addition, condition dynamics such as the refractory period for giving birth following childbirth were observed. These models demonstrate that ICD9 codes, despite well-documented concerns, contain valid and potentially valuable temporal information.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Data Mining / methods*
  • Electronic Health Records
  • Entropy
  • Female
  • Humans
  • International Classification of Diseases*
  • Male
  • Medical Informatics Applications*
  • ROC Curve
  • Time Factors