Automatic data source identification for clinical trial eligibility criteria resolution

AMIA Annu Symp Proc. 2017 Feb 10:2016:1149-1158. eCollection 2016.

Abstract

Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion. In this work, we discuss the creation of an eligibility criteria dataset for clinical trials for patients with two disparate diseases, annotated with the preferred data source for each criterion (i.e., structured or unstructured) by annotators with medical training. The dataset includes 50 heart-failure trials with a total of 766 eligibility criteria and 50 trials for chronic lymphocytic leukemia (CLL) with 677 criteria. Further, we developed machine learning models to predict the preferred data source: kernel methods outperform simpler learning models when used with a combination of lexical, syntactic, semantic, and surface features. Evaluation of these models indicates that the performance is consistent across data from both diagnoses, indicating generalizability of our method. Our findings are an important step towards ongoing efforts for automation of clinical trial screening.

MeSH terms

  • Clinical Trials as Topic*
  • Electronic Health Records*
  • Eligibility Determination / methods
  • Heart Failure
  • Humans
  • Information Storage and Retrieval
  • Leukemia, Lymphocytic, Chronic, B-Cell
  • Machine Learning
  • Natural Language Processing*
  • Patient Selection*