Generating disease-pertinent treatment vocabularies from MEDLINE citations

J Biomed Inform. 2017 Jan:65:46-57. doi: 10.1016/j.jbi.2016.11.004. Epub 2016 Nov 16.

Abstract

Objective: Healthcare communities have identified a significant need for disease-specific information. Disease-specific ontologies are useful in assisting the retrieval of disease-relevant information from various sources. However, building these ontologies is labor intensive. Our goal is to develop a system for an automated generation of disease-pertinent concepts from a popular knowledge resource for the building of disease-specific ontologies.

Methods: A pipeline system was developed with an initial focus of generating disease-specific treatment vocabularies. It was comprised of the components of disease-specific citation retrieval, predication extraction, treatment predication extraction, treatment concept extraction, and relevance ranking. A semantic schema was developed to support the extraction of treatment predications and concepts. Four ranking approaches (i.e., occurrence, interest, degree centrality, and weighted degree centrality) were proposed to measure the relevance of treatment concepts to the disease of interest. We measured the performance of four ranks in terms of the mean precision at the top 100 concepts with five diseases, as well as the precision-recall curves against two reference vocabularies. The performance of the system was also compared to two baseline approaches.

Results: The pipeline system achieved a mean precision of 0.80 for the top 100 concepts with the ranking by interest. There were no significant different among the four ranks (p=0.53). However, the pipeline-based system had significantly better performance than the two baselines.

Conclusions: The pipeline system can be useful for an automated generation of disease-relevant treatment concepts from the biomedical literature.

Keywords: Data mining; Information extraction; MEDLINE citations; Ontology; SemMedDB; Treatment.

MeSH terms

  • Automation
  • Humans
  • Information Storage and Retrieval
  • MEDLINE*
  • Semantics*
  • Vocabulary
  • Vocabulary, Controlled*