Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record

AMIA Jt Summits Transl Sci Proc. 2019 May 6:2019:173-181. eCollection 2019.

Abstract

Background. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). Objective. To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. Methods. Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. Results. For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. Conclusion. NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.