Skip to Content
SBMI Horizontal Logo

From Latent Patterns to Interpretable Subtypes: Machine Learning Driven Subtyping in Alzheimer’s Clinical Trials

Author: Dulin Wang, MS (2025)

Primary advisor: Xiaoqian Jiang, PhD

Committee members: Yejin Kim, PhD and Yu-Chun Hsu, PhD

PhD thesis, McWilliams School of Biomedical Informatics at UTHealth Houston.


ABSTRACT

Randomized controlled trials (RCTs) in neurodegenerative diseases often fail to show treatment benefits due to underlying patient variability. To overcome this challenge and improve clinical trial design, it is essential to identify responsive patient subgroups and understand distinct patterns of disease progression. This dissertation addresses these challenges by applying and developing advanced AI methodologies to Alzheimer’s Disease (AD) clinical trials.

The first study investigates heterogeneous treatment effects (HTEs) across ten AD RCTs to identify responsive and non-responsive patient subgroups within trials previously considered negative or neutral. By implementing causal forests in ten AD clinical trials, this work reveals the HTEs and key moderators for AD drug response, highlighting promising personalized treatment tailored to patient-specific characteristics.

While HTEs capture baseline moderators of drug response, disease heterogeneity also manifests over time through distinct progression patterns. Therefore, the second aim characterizes longitudinal patterns of AD progression using trial data to uncover data- driven disease subtypes. We proposed a novel clinical Outcome-Guided Deep Temporal Clustering (OG-DTC) method that generates data representations informed by both clustering objectives and clinical outcomes. The learned representations are then grouped using a Gaussian mixture model to identify distinct subtypes. We identified three distinct subtypes with unique patterns associated with differentiated clinical declines across multiple measures. The resulting clusters were extensively validated for their reproducibility, stability, and statistical significance.

Although the discovered clusters are statistically robust, they lack the explicit clinical definitions necessary for practical application. Traditional rule-based models can generate explicit rules to explain these clusters, but they frequently suffer from redundancy and limited context from high-dimensional contexts. To address this, we leverage large language model reasoning capabilities within a Self-Refine loop, where candidate rule sets are refined using structured feedback from a rule-based classifier. A case study translates Alzheimer’s disease progression clusters into simple and clinically applicable rules. This approach demonstrates the feasibility of combining classical rule-based methods with LLM reasoning to balance interpretability and predictive performance, thereby bridging the gap between statistical clustering and clinical utility.

In summary, this dissertation develops AI-driven frameworks to quantify and interpret both individual- and group-level variability in Alzheimer’s disease. By quantifying heterogeneous treatment effects, learning outcome-aware embeddings from longitudinal trial data, and converting latent patterns into clear explainable phenotypes, the work turns raw clinical sequences into actionable insights for personalized therapy.