SBMI Horizontal Logo

Deep Learning Frameworks for Disease Detection and Outcome Prediction using Sequential Biomedical Data

Author: Xiaotian Ma, MS (2025)

Primary advisor: Xiaoqian Jiang, PhD

Committee members: Shayan Shams, PhD and Yejin Kim, PhD

PhD thesis, McWilliams School of Biomedical Informatics at UTHealth Houston.


ABSTRACT

Biomedical sequential data, including longitudinal clinical trial data, 3D imaging, and surgical videos, are essential for disease detection and outcome prediction. Since manual review of such data is time-consuming and requires significant domain expertise, deep learning approaches are widely adopted to facilitate clinical decision-making. However, it remains challenging to interpret model predictions by identifying informative, localized features from a large volume of sequences, such as small blood clots in a large 3D CT scans or specific disease-relevant frames in a long surgical video.

This dissertation presents a comprehensive investigation into using deep learning for multiple modalities of biomedical sequential data. It introduces effective models for disease detection and outcome prediction, along with an automated pipeline to select task-relevant subsets of sequences.

First, a recurrent neural network with attention mechanisms is built to predict rapid progression (RP) of Alzheimer’s Disease (AD) using longitudinal data from pooled clinical trials. This study defines RP by changes in four neurocognitive and functional health measures, computes importance scores to identify predictive features, and visualizes the temporal trajectories of the selected features.

Second, a two-phase multitask learning framework is proposed to detect and characterize pulmonary embolism (PE) using 3D computed tomographic pulmonary angiogram (CTPA). This study can provide interpretation through attention-weight heatmaps to select 2D image slices and gradient-weighted class activation mapping (Grad-CAM) to highlight salient local regions.

Third, a deep learning framework consisting of contrastive pre-training and a location-aware transformer is designed to predict treatment outcome of ovarian cancer using sets of still frames from laparoscopic surgical videos. By integrating anatomical information, the model assigns attention scores for anatomical locations and uses Grad-CAM to highlight disease areas in each frame.

Lastly, a training-free, large language model (LLM)-driven pipeline is developed to automatically select meaningful frames from laparoscopic surgical videos. This method uses LLMs with deep research and web search to generate frame selection criteria with corresponding scores. The scores are then used to search important frames from surgical videos by updating the frame selection distribution iteratively.

In summary, this dissertation thoroughly investigated how to tailor deep learning solutions to learn details from biomedical sequential data across multiple modalities for both detection and prediction tasks. By automatically select clinically relevant features from complex biomedical sequences, this work reduces the burden of manual data review and may support more efficient diagnosis and treatment planning. Future directions include integrating the models and pipelines into a unified feature selection system to support broader clinical applications using multimodal sequential data.