Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records

Guergana K Savova; Ioana Danciu; Folami Alamudun; Timothy Miller; Chen Lin; Danielle S Bitterman; Georgia Tourassi; Jeremy L Warner

doi:10.1158/0008-5472.CAN-19-0579

Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records

Cancer Res. 2019 Nov 1;79(21):5463-5470. doi: 10.1158/0008-5472.CAN-19-0579. Epub 2019 Aug 8.

Authors

Guergana K Savova^{1

2}, Ioana Danciu³, Folami Alamudun³, Timothy Miller^{4

2}, Chen Lin⁴, Danielle S Bitterman^{2

5}, Georgia Tourassi³, Jeremy L Warner⁶

Affiliations

¹ Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts. Guergana.Savova@childrens.harvard.edu.
² Harvard Medical School, Boston, Massachusetts.
³ Oak Ridge National Lab, Knoxville, Tennessee.
⁴ Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts.
⁵ Dana Farber Cancer Institute, Boston, Massachusetts.
⁶ Vanderbilt University Medical Center, Nashville, Tennessee.

Abstract

Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.

Publication types

Research Support, N.I.H., Extramural
Review

MeSH terms

Data Mining / methods*
Electronic Health Records
Humans
Machine Learning
Medical Oncology / methods*
Natural Language Processing
Phenotype

Abstract

Publication types

MeSH terms

Grants and funding