A novel tool that allows interactive screening of PubMed citations showed promise for the semi-automation of identification of Biomedical Literature

Gaelen P Adam; Dimitris Pappas; Haris Papageorgiou; Evangelos Evangelou; Thomas A Trikalinos

doi:10.1016/j.jclinepi.2022.06.007

A novel tool that allows interactive screening of PubMed citations showed promise for the semi-automation of identification of Biomedical Literature

J Clin Epidemiol. 2022 Oct:150:63-71. doi: 10.1016/j.jclinepi.2022.06.007. Epub 2022 Jun 20.

Authors

Gaelen P Adam¹, Dimitris Pappas², Haris Papageorgiou³, Evangelos Evangelou⁴, Thomas A Trikalinos⁵

Affiliations

¹ Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence RI USA. Electronic address: gaelen_adam@brown.edu.
² Institute for Language and Speech Processing, 'Athena' Research and Innovation Center, Marousi, Greece; Department of Informatics, Athens University of Economics and Business, Athens, Greece.
³ Institute for Language and Speech Processing, 'Athena' Research and Innovation Center, Marousi, Greece.
⁴ Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina, Greece; Institute of Biosciences, University Research Center of Ioannina (U.R.C.I), Ioannina, Greece; Department of Epidemiology and Biostatistics, Imperial College London, London, UK.
⁵ Center for Evidence Synthesis in Health, Brown University School of Public Health, Providence RI USA.

PMID: 35738306
DOI: 10.1016/j.jclinepi.2022.06.007

Abstract

Background and objectives: Systematic reviews form the basis of evidence-based medicine, but are expensive and time-consuming to produce. To address this burden, we have developed a literature identification system (Pythia) that combines the query formulation and citation screening steps.

Methods: Pythia incorporates a set of natural-language questions with machine-learning algorithms to rank all PubMed citations based on relevance, returning the 100 top-ranked citations for human screening. The tagged citations are iteratively exploited by Pythia to refine the search and re-rank the citations.

Results: Across seven systematic reviews, the ability of Pythia to identify the relevant citations (sensitivity) ranged from 0.09 to 0.58. The number of abstracts reviewed per relevant abstract number needed to read (NNR) was lower than in the manually screened project in four reviews, higher in two, and had mixed results in one. The reviews that had greater overall sensitivity retrieved more relevant citations in early batches, but retrieval was generally unaffected by other aspects, such as study design, study size, and specific key question.

Conclusion: Due to its low sensitivity, Pythia is not ready for widespread use. Future research should explore ways to encode domain knowledge in query formulation to better enrich the questions used in the search.

Keywords: Abstract screening; Evidence synthesis; Literature identification; Machine learning; Systematic review methods; Text mining.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms*
Automation
Humans
Machine Learning*
PubMed
Research Design

Grants and funding

R03 HS027247/HS/AHRQ HHS/United States