Skip to Content
SBMI Horizontal Logo

Question Answering from Electronic Health Records

Author: Sarvesh Soni, B.Tech., MS (2023)

Primary advisor: Kirk Roberts, PhD

Committee members: Hua Xu, PhD; Peter Killoran, MD

PhD thesis, The University of Texas School of Biomedical Informatics at Houston.


Electronic health records (EHRs) contain a wealth of useful patient information and are frequently used by clinicians to provide care. However, due to many usability issues associated with EHRs, accessing the required information is often complicated and time-consuming. Further, most existing work to access EHR information uses information retrieval techniques that typically employ keyword- based searches and generate multiple results, requiring users to filter and identify relevant information manually. A more organic way of interacting with the EHRs is by posing natural language questions and getting exact answers back from the records, or in other words, question answering (QA) from EHRs. This dissertation presents research investigating the different aspects enabling EHR QA, with a focus on the range of EHR data and patient-specific questions. Thus, the broad aims explore QA techniques for (1) structured (tabular data such as lab values) and (2) unstructured (text data such as clinical notes) EHR data and (3) investigate the impact of paraphrasing on the performance of EHR QA. Specifically, this dissertation (1) proposes three novel representative datasets for EHR QA (one for each of the broader aims), (2) builds an end-to-end framework that goes from questions all the way to their exact answers in structured EHR data, (3) implements methods to automatically generate paraphrases of clinical questions and improve EHR QA, and (4) designs systems to automatically retrieve EHR text documents and underlying exact answer spans for a given information need. Collectively, these investigations push the boundaries of existing efforts toward improving information access from EHRs. Moreover, this dissertation sheds light on the future directions in this field.