Feasibility of feature-based indexing, clustering, and search of clinical trials. A case study of breast cancer trials from ClinicalTrials.gov

Methods Inf Med. 2013;52(5):382-94. doi: 10.3414/ME12-01-0092. Epub 2013 May 13.

Abstract

Background: When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space.

Objectives: This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes.

Methods: We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively.

Results: We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction.

Conclusions: It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.

Keywords: Medical informatics; clinical trials; eligibility determination; knowledge representation; search engine.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Abstracting and Indexing / methods*
  • Adult
  • Aged
  • Breast Neoplasms* / pathology
  • Clinical Trials as Topic*
  • Cluster Analysis*
  • Data Mining / methods*
  • Feasibility Studies
  • Female
  • Humans
  • Internet*
  • Medical Informatics
  • Middle Aged
  • Patient Education as Topic
  • Search Engine
  • User-Computer Interface
  • Young Adult