Skip to Content
SBMI Horizontal Logo

Utilizing Molecular Pathways to Provide Generalizable, Explainable, and Translatable Models from Cancer Drug Response Prediction

Author: Yi-Ching Tang, MS (2022)

Primary advisor: Assaf Gottlieb, PhD

Committee members: Degui Zhi, PhD; Xiaoqian Jiang, PhD; Jeffery Chang, PhD

PhD thesis, The University of Texas School of Biomedical Informatics at Houston.


There is a crucial need to develop computational methods to identify the best fitting drug for each cancer patient. Cancer cell lines are still the most popular models for drug screening, supported by massive amounts of genomic profiling and drug sensitivity profiles measured against single agents and combination therapies. However, the challenge stays in finding genomic-matched drugs and evaluating their efficacies in the clinic. Three factors form this challenge. First, the high dimensionality of genomic data requires large data to distinguish predictive features and generalize to new samples. Second, the differences in cancer cell lines and tumors in patients poses a challenge for clinical translatability of computational models. Third, missing mechanistic interpretation of the prediction outcomes could limit the development of predictive models in the clinic. We developed a pathway-based deep learning framework for drug response prediction to address these challenges by assuming that pathway information can reveal molecular interactions responding to drug actions. We mapped drug and cell line features to pathways to form pan-cancer, pan-drug models We developed a novel transfer learning approach, pre-training a model on the large-scale cancer cell line data and apply it to the tumor data. We evaluated our pathway-based framework in different scenarios, including single drug response prediction, drug synergy prediction, and clinical drug sensitivity prediction. Prediction performance was validated by using repeated cross-validation and by evaluating the model on an independent data. Using feature importance, we identified pathways that contributed the most to the prediction outcomes. Our framework addresses the aforementioned challenges. First, our approach greatly reduced the input dimensions from whole genome gene profiles (e.g., gene expression, mutations) to thousands of curated pathway gene sets. The results of internal and external validation showed that the pathway-level features can generate robust predictions and generalize to an independent dataset. Second, the analysis of top contributing pathways identified the drug-gene associated network, pathway signals enriched from gene expression that are explainable of the drug responses, and the major proteins within those pathways have been reported by existing evidence. Moreover, the statistical analysis showed that there is a relationship between the topological distance of top contributing pathways and drug synergism ± synergistic combinations tend to have a closer distance within the context of protein-protein interaction network. Third, transferring the models from cancer cell lines to tumors (including PDX), our results demonstrated significant improvement in prediction accuracy, both for samples that have/have not been seen by the pre-trained model. In conclusion, our deep learning models that integrate pathway-specific features show high predictive performance, are explanatory and generalizable, and can be transferred from cancer cell lines to tumors to accurately predict clinical drug sensitivity, showing the potential of computational modeling to be translated into the clinic.