Advancing Genetic Association Discovery in Brain MRI using Unsupervised and Self-supervised Deep Learning: Exploring Learning Dynamics, Region-Specific Features, and Spatially Resolved Representations

Author: Sheikh Muhammad Saiful Islam, MS (2025)

Primary advisor: Degui Zhi, PhD

Committee members: Arif Harmanci, PhD and Ziqian Xie, PhD

PhD thesis: McWilliams School of Biomedical Informatics at UTHealth Houston.

ABSTRACT

Deep learning has unlocked significant potential for advancing the discovery of genetic associations in imaging genetics, particularly in brain imaging using T1-weighted Magnetic Resonance Imaging (MRI). Traditional methods for this task often relied on hand-crafted or semi-automated feature extraction, followed by genetic association studies utilizing these features. While effective, these approaches are limited by their lack of data-driven exploration and the generation of features with limited informativeness.

Unsupervised deep learning, in particular, has emerged as a powerful alternative, addressing some of these limitations by enabling automated and data-driven feature discovery. Recent research in this domain has primarily focused on adapting standard practices in deep learning, such as leveraging model checkpoints from the epoch with the best learning performance for genetic association studies. However, this practice is limited in its ability to explore learning dynamics across the training process. Furthermore, existing studies predominantly emphasize global image features, neglecting the spatial specificity and regional focus necessary for more precise genetic association exploration. To overcome these challenges, First, this dissertation aims to develop a novel framework for analyzing genetic pattern learning dynamics in deep learning, which will facilitate a deeper exploration of genetic associations in imaging genetics. This aim develops an unsupervised deep learning framework using a 3D convolutional autoencoder to extract latent representations from T1-weighted brain MRI data for genome-wide association studies (GWAS). The model identifies a two-phase pattern of genetic discovery: an early phase detecting broad signals and a later phase refining genetic associations. To enhance discovery, we introduce an epoch ensembling strategy, which outperforms traditional methods by identifying more significant loci. This work demonstrates the value of considering intermediate model states in deep learning-based GWAS, offering a novel approach to improve the sensitivity and interpretability of neuroimaging genetics.

Second, this dissertation introduces an unsupervised deep learning framework to uncover genetic variations associated with brain regions by focusing on tissue-specific genetic signals in white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). By utilizing T1-weighted MRI data from the UK Biobank, we trained separate autoencoder models for each tissue type, with dynamic emphasis placed on the target tissue during model training. The models, which include both context-based and region-focused architectures, were able to capture robust tissue-specific genetic associations across training epochs. Results show that genetic signals emerge early in training, even when reconstruction accuracy is suboptimal, suggesting that the learned representations can capture relevant genetic information before full convergence of the model. Epoch ensembling, which combines significant SNPs from key epochs, further strengthened the discovery of tissue-specific loci, with the highest number of significant loci identified for GM and WM. The study highlights the power of tissue-specific models in uncovering genomic patterns that may be missed in global models, underscoring the potential for advancing neuroimaging genetics by focusing on brain tissue distinctions.

Finally, this dissertation introduces a self-supervised deep learning framework for learning voxel-level representations of the brain from T1-weighted MRI scans, enabling genetic analysis of brain regions. Using voxel-level contrastive learning, the model captures fine anatomical details and identifies genetic variations associated with specific subcortical regions. Applied to UK Biobank data, our method discovered 60 significant loci, outperforming baseline models. High-resolution embeddings revealed both shared and region-specific genetic signals, particularly in areas like the thalamus. These findings demonstrate the power of voxel-level contrastive learning for enhancing genetic discovery in neuroimaging, offering new insights into brain structure and its genetic underpinnings.

In summary, this dissertation makes significant advancements in neuroimaging genetics by developing a series of deep learning frameworks that enhance the discovery of genetic associations linked to brain structure. Starting with global models to capture broad genetic signals, the research progressively narrows its focus to regional and finally voxel-level analyses, enabling increasingly precise identification of genetic variants associated with specific brain regions. By leveraging unsupervised deep learning methods, including contrastive learning and epoch ensembling, the study uncovers both shared and tissue-specific genetic signals, offering new insights into the genetic underpinnings of brain morphology. The results demonstrate that this hierarchical approach—from global to regional to voxel-level models—greatly improves the sensitivity and interpretability of genetic studies in neuroimaging, paving the way for a more detailed understanding of the genetic basis of neurological and psychiatric conditions.

Search UTHealth Houston

Advancing Genetic Association Discovery in Brain MRI using Unsupervised and Self-supervised Deep Learning: Exploring Learning Dynamics, Region-Specific Features, and Spatially Resolved Representations