Disease Concept Normalization for Clinical Documents Using Deep Learning and Knowledge Graphs
Author: Jingqi Wang, MS (2022)
Primary advisor: Cui Tao, PhD
Committee members: Kirk Roberts, PhD; Degui Zhi, PhD
PhD thesis, The University of Texas School of Biomedical Informatics at Houston.
Medical concept normalization, also known as entity linking, is a crucial component of clinical information extraction. It maps entity mentions in free-text format to a controlled vocabulary, such as the Unified Medical Language System, to improve interoperability between different healthcare institutions and enable a wide range of applications in biomedical research and clinical practice. Data-driven methods of concept normalization, such as deep learning-based algorithms (DL), have the flexibility to handle various text variations. However, they require large-scale data annotation and are susceptible to confusion with ambiguous cases, such as medical abbreviations, acronyms, and polysemy. On the other hand, biomedical knowledge graphs (KGs) can provide precise normalizations with clear explanations, but they have low coverage. Using disease normalization as an example, this study first evaluates state-of-the-art DL methods for concept normalization. Next, a novel KG architecture is designed to organize medical domain knowledge, including concept hierarchies, concept abbreviations, and post-coordination patterns, to support concept normalization. DL and KG are then combined to provide an explainable and extensible framework for biomedical concept normalization. The proposed ensemble approach is thoroughly evaluated on two clinical datasets (SemEval2015 and N2C2 2019) and shows promising results.