Skip to Content
SBMI logo

Understanding Patient Safety Reports via Multi-label Text Classification and Semantic Representation

Author: Chen Liang, MS (2017)

Primary advisor: Yang Gong, MD, PhD

Committee members: Trevor Cohen, MBChB, PhD, Cui Tao, PhD, Hua Xu, PhD

PhD thesis, The University of Texas School of Health Information Sciences at Houston.


Medical errors are the results of problems in health care delivery. One of the key steps to eliminate errors and improve patient safety is through patient safety event reporting. A patient safety report may record a number of critical factors that are involved in the health care when incidents, near misses, and unsafe conditions occur. Therefore, clinicians and risk management can generate actionable knowledge by harnessing useful information from reports. To date, efforts have been made to establish a nationwide reporting and error analysis mechanism. The increasing volume of reports has been driving improvement in quantity measures of patient safety. For example, statistical distributions of errors across types of error and health care settings have been well documented. Nevertheless, a shift to quality measure is highly demanded. In a health care system, errors are likely to occur if one or more components (e.g., procedures, equipment, etc.) that are intrinsically associated go wrong. However, our understanding of what and how these components are connected is limited for at least two reasons. Firstly, the patient safety reports present difficulties in aggregate analysis since they are large in volume and complicated in semantic representation. Secondly, an efficient and clinically valuable mechanism to identify and categorize these components is absent.

I strive to make my contribution by investigating the multi-labeled nature of patient safety reports. To facilitate clinical implementation, I propose that machine learning and semantic information of reports, e.g., semantic similarity between terms, can be used to jointly perform automated multi-label classification. My work is divided into three specific aims. In the first aim, I developed a patient safety ontology to enhance semantic representation of patient safety reports. The ontology supports a number of applications including automated text classification. In the second aim, I evaluated multilabel text classification algorithms on patient safety reports. The results demonstrated a list of productive algorithms with balanced predictive power and efficiency. In the third aim, to improve the performance of text classification, I developed a framework for incorporating semantic similarity and kernel-based multi-label text classification. Semantic similarity values produced by different semantic representation models are evaluated in the classification tasks. Both ontology-based and distributional semantic similarity exerted positive influence on classification performance but the latter one shown significant efficiency in terms of the measure of semantic similarity.

Our work provides insights into the nature of patient safety reports, that is a report can be labeled by multiple components (e.g., different procedures, settings, error types, and contributing factors) it contains. Multi-labeled reports hold promise to disclose system vulnerabilities since they provide the insight of the intrinsically correlated components of health care systems. I demonstrated the effectiveness and efficiency of the use of automated multi-label text classification embedded with semantic similarity information on patient safety reports. The proposed solution holds potential to incorporate with existing reporting systems, significantly reducing the workload of aggregate report analysis.