Author: Jingcheng Du, BS (2019)
Primary advisor: Cui Tao, PhD
Committee members: Hua Xu, PhD; Yong Chen, PhD; Sahiti Myneni, PhD; Trevor Cohen, PhD
PhD thesis, The University of Texas School of Health Information Sciences at Houston.
Vaccination is considered one of the greatest public health achievements of the 20th century. A high vaccination rate is required to reduce the prevalence and incidence of vaccine-preventable diseases. However, in the last two decades, there has been a significant and increasing number of people who refuse or delay getting vaccinated and who prohibit their children from receiving vaccinations. Importantly, under-vaccination is associated with infectious disease outbreaks. A good understanding of public perceptions regarding vaccinations is important if we are to develop effective vaccination promotion strategies. Traditional methods of research, such as surveys, suffer limitations that impede our understanding of public perceptions, including resources cost, delays in data collection and analysis, especially in large samples. The popularity of social media (e.g. Twitter), combined with advances in artificial intelligence algorithms (e.g. natural language processing, deep learning), open up new avenues for accessing large scale data on public perceptions related to vaccinations. This dissertation reports on an original and systematic effort to develop artificial intelligence algorithms that will increase our ability to use Twitter discussions to understand vaccine-related perceptions and intentions. The research is framed within the perspectives offered by grounded behavior change theories. Tweets concerning the human papillomavirus (HPV) vaccine were used to accomplish three major aims: 1) Develop a deep learning-based system to better understand public perceptions of the HPV vaccine, using Twitter data and behavior change theories; 2) Develop a deep learning-based system to infer Twitter users’ demographic characteristics (e.g. gender and home location) and investigate demographic differences in public perceptions of the HPV vaccine; 3) Develop a web-based interactive visualization system to monitor real-time Twitter discussions of the HPV vaccine.
For Aim 1, the bi-directional long short-term memory (LSTM) network with attention mechanism outperformed traditional machine learning and competitive deep learning algorithms in mapping Twitter discussions to the theoretical constructs of behavior change theories. Domain-specific embedding trained on HPV vaccine-related Twitter corpus by fastText algorithms further improved performance on some tasks. Time series analyses revealed evolving trends of public perceptions regarding the HPV vaccine. For Aim 2, the character-based convolutional neural network model achieved favorable state-of-the-art performance in Twitter gender inference on a Public Author Profiling challenge. The trained models then were applied to the Twitter corpus and they identified gender differences in public perceptions of the HPV vaccine. The findings on gender differences were largely consistent with previous survey-based studies. For the Twitter users’ home location inference, geo-tagging was framed as text classification tasks that resulted in a character-based recurrent neural network model. The model outperformed machine learning and deep learning baselines on home location tagging. Interstate variations in public perceptions of the HPV vaccine also were identified. For Aim 3, a prototype web-based interactive dashboard, VaxInsight, was built to synthesize HPV vaccine-related Twitter discussions in a comprehendible format. The usability test of VaxInsight showed high usability of the system.
Notably, this maybe the first study to use deep learning algorithms to understand Twitter discussions of the HPV vaccine within the perspective of grounded behavior change theories. VaxInsight is also the first system that allows users to explore public health beliefs of vaccine related topics from Twitter. Thus, the present research makes original and systematical contributions to medical informatics by combining cutting-edge artificial intelligence algorithms and grounded behavior change theories. This work also builds a foundation for the next generation of real-time public health surveillance and research.