Skip to Content
SBMI logo

Utilizing Temporal Information in The EHR for Developing a Novel Continuous Prediction Model

Author: Kang Lin Hsieh, MS (2019)

Primary advisor: Susan H. Fenton, PhD, RHIA, FAHIMA

Committee members: Robert E. Murphy, MD; Kirk Roberts, PhD, MS; Cui Tao, PhD

PhD thesis, The University of Texas School of Health Information Sciences at Houston.

Type 2 diabetes mellitus (T2DM) is a nation-wide prevalent chronic condition, which includes direct and indirect healthcare costs. T2DM, however, is a preventable chronic condition based on previous clinical research. Many prediction models were based on the risk factors identified by clinical trials. One of the major tasks of the T2DM prediction models is to estimate the risks for further testing by HbA1c or fasting plasma glucose to determine whether the patient has or does not have T2DM because nation-wide screening is not cost-effective.

Those models had substantial limitations on data quality, such as missing values. In this dissertation, I tested the conventional models which were based on the most widely used risk factors to predict the possibility of developing T2DM. The AUC was an average of 0.5, which implies the conventional model cannot be used to screen for T2DM risks. Based on this result, I further implemented three types of temporal representations, including non-temporal representation, interval-temporal representation, and continuous-temporal representation for building the T2DM prediction model. According to the results, continuous-temporal representation had the best performance. Continuous-temporal representation was based on deep learning methods. The result implied that the deep learning method could overcome the data quality issue and could achieve better performance.

This dissertation also contributes to a continuous risk output model based on the seq2seq model. This model can generate a monotonic increasing function for a given patient to predict the future probability of developing T2DM. The model is workable but still has many limitations to overcome.

Finally, this dissertation demonstrates some risks factors which are underestimated and are worthy for further research to revise the current T2DM screening guideline. The results were still preliminary. I need to collaborate with an epidemiologist and other fields to verify the findings. In the future, the methods for building a T2DM prediction model can also be used for other prediction models of chronic conditions.