Skip to Content
SBMI Horizontal Logo

Integrating Domain Knowledge to Improve Signal Detection from Electronic Health Records for Pharmacovigilance 

Author: Ning Shang, MS (2014)

Primary Advisor: Trevor Cohen, MBChB, PhD

Committee Members:  Jorge Herskovic, MD, PhD, Hua Xu, PhD, Elmer V. Bernstam, MD, MSE, MS, Peng Wei, PhD

PhD Thesis, The University of Texas School of Biomedical Informatics at Houston.


The intent of this dissertation is to make a contribution to the field of pharmacovigilance. Pharmacovigilance, also known as post-marketing drug surveillance, is the process of continued monitoring for adverse drug reactions (ADRs) after drugs are released into the market. An ADR is a harmful or unpleasant reaction related to the use of a medical product. ADRs were reported to be between the fourth and sixth leading cause of death in the United States in 1994, accounting for 3-7% of medical hospital admissions. On account of the practice of pharmacovigilance, Vioxx (Rofecoxib) and Avandia (Rosiglitazone) are examples of high profile drugs that were suspended from the American or European market.

To prevent these effects on human health, pre-marketing clinical trials are designed to test drug safety and efficacy. Although clinical trials are extensive and last multiple years, rare ADRs may not be detected, and others may occur on account of idiosyncratic characteristics of individuals excluded from the evaluated sample.

To aid the pharmacovigilance process, automated methods for the identification of strongly correlated drug/ADR pairs from data sources such as adverse event reporting systems, or Electronic Health Records (EHRs), have been developed. These methods however are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. 

In this dissertation I investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents in a computationally automated manner. LBD methods can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. The hypothesis underlying this work is that by combining signals mined from EHR data with biomedical domain knowledge, the accuracy of side effects detection may be improved. This also addresses the lack of causality assessment in existing statistical methods in pharmacovigilance practice.

My theoretical contribution is that by conducting automated abductive reasoning and by estimating the strength of generated explanatory hypotheses the plausibility of a drug/ADR signal can be assessed. I adapt and extend the original abductive reasoning process as defined by Peirce in 19th century by stating that the strength of the explanations found for an observation is a measure for its plausibility, rather than taking an observation as given. Practical contributions to pharmacovigilance and informatics include the development of methods to leverage the knowledge from biomedical literature, the detection of signals from the EHR data and the subsequent evaluation using supporting evidence from the
literature on a large scale in an automated way, and the development of an improved drug/ADR reference set. My contributions are not restricted to pharmacovigilance and as such constitute a contribution to the field of informatics in general.

I demonstrate that my work has extended the state of the art in EHR-based pharmacovigilance and contribute new ideas that pave the way for further studies with the potential to further enhance the field of pharmacovigilance and drug safety.