Skip to Content
SBMI Horizontal Logo

W. Jim Zheng, PhD, MS joined McWilliams School of Biomedical Informatics in 2013. He earned his M.S. in Computer Science from UT Dallas and his Ph.D. from UT Southwestern, where he identified a novel enzyme mechanism [1]. Early in his career, Dr. Zheng developed commercial genomic databases and bioinformatics software platforms in industry.

Dr. Zheng’s research focuses on developing novel machine learning and data science methodologies to analyze, integrate, and model biomedical data across the full spectrum of biomedicine, a vision he articulated in a JAMA perspective [2]. His work spans multiple biological scales—from small-molecule therapeutics [3], genes and proteins [4,5], and cells [6], to tissues and organs [7,8], as well as patient- and population-level clinical studies [9,10].

In AI methodology development, Dr. Zheng’s group introduced Ontology Fingerprints, the first distributed representation of genes derived from literature mining [11,12]. His team was also an early adopter of deep learning for predicting effective drug combinations [3]. Dr. Zheng and collaborators have advanced large language model–based approaches for biomedical NLP, co-leading the UTHealth team to achieve 2nd place in both the 2021 DrugProt BioCreative VII Large-Scale Track [13] and the 2022 NIH/NCATS LitCoin Challenge [14,15]. More recently, his group has applied AlphaFold to systematically analyze the structural impact of alternative splicing, contributing to AI-enabled structural genomics resources [5], and developed a pathway-informed transformer that aligns cell type labels with gene expression patterns using contrastive learning [6].

Dr. Zheng has led several large-scale data science initiatives. His group developed Genome3D, the first platform for integrated 3D genome modeling and visualization [16]. His team identified extensive murine viral sequences in patient-derived xenografts, highlighting the importance of quality control in cancer drug development [4]. Clinically, his group has leveraged electronic health record data from over 70 million patients to model risks of brain metastasis in lung cancer [9] and liver cancer in non-alcoholic fatty liver disease [10].

Dr. Zheng also works in high-performance computing [17] and has built a state-of-the-art computing infrastructure geared toward data science and AI. He has founded and directs the Data Science and Informatics Core for Cancer Research (DSICCR), which has contributed to over 140 publications. He also directs the Bioinformatics and High-Performance Computing Service Center within the NIH-funded CTSA program – a joint project between UTHealth Houston and MD Anderson Cancer Center.

His current research focuses on AI-driven integrative modeling and data mining to advance therapeutic discovery and treatment response prediction in cancer, Alzheimer’s disease, and other chronic conditions. Dr. Zheng serves on the editorial boards of three bioinformatics journals and is supported by the NIH, DoD, and the Cancer Prevention and Research Institute of Texas.


Education


  • MS, Computer Science, 2000, University of Texas at Dallas
  • PhD,Biochemistry & Molecular Biology, 1997 University of Texas Southwestern Medical Center at Dallas

Areas of Expertise


  1. AI and Machine Learning Methodology for Biomedicine
  2. Biomedical Natural Language Processing and Knowledge Representation
  3. Computational Genomics and Functional Genomics
  4. Translational Data Science and Clinical Informatics
  5. Large-Scale Biomedical Data Infrastructure and High-Performance Computing

Staff Support


Felicia Davis | 713-500-3667


References

  1. Wenjin Zheng, Stephen A. Johnston, Leemor Joshua-Tor: The unique active site of Gal6/bleomycin hydrolase can act as a carboxypeptidase, aminopeptidase and peptide ligase.  Cell, 93:103-109, 1998, PMID: 9546396
  2. Lisha Zhu, W. Jim Zheng: Informatics, Data Science and Artificial Intelligence, JAMA, 320(11):1103-1104, 2018, PMID: 30326503
  3. Guocai Chen, Lam C. Tsoi, Hua Xu, W. Jim Zheng: Predict Effective Drug Combination by Deep Believe Network and Ontology Fingerprints, Journal of Biomedical Informatics, 85:149-154, 2018. PMID: 30081101.
  4. Zihao Yuan, Xuejun Fan, Jay-Jiguang Zhu, Tong-Ming Fu, Jiaqian Wu, Hua Xu, Ningyan Zhang, Zhiqiang An, W. Jim Zheng: Presence of complete murine viral genome sequences in patient-derived xenografts. Nature Communications, 12(1):2031, 2021, PMID: 33795676.
  5. Yuntao Yang, Himansu Kumar, Yuhan Xie, Zhao Li, Rongbin Li, Wenbo Chen, Chiamaka S. Diala, Meer A. Ali, Yi Xu, Albon Wu, Sayed-Rzgar Hosseini, Erfei Bi, Hongyu Zhao, Pora Kim, W. Jim Zheng: ASpdb: an integrative knowledgebase of human protein isoforms from experimental and AI-predicted structures, Nucleic Acid Research, 53(D1):D331-D339, 2025, PMID: 39530217.
  6. Zhao Li, Zaiyi Zheng, Rongbin Li, Wenbo Chen, Yuntao Yang, Meer A Ali, Jundong Li, W. Jim Zheng: CeLLTra: Aligning Cell Names with Gene Expression via a Pathway-Informed Transformer, Bioinformatics, 42(2):btaf655, DOI:10.1093/bioinformatics/btaf655, 2026, PMID: 41652996.
  7. Rachel R. Tindall, Yuntao Yang, Amy Qin, Jiajing Li, Yinjie Zhang, Thomas H. Gomez, Mamoun Younes, Qiang Shen, Jennifer M. Bailey-Lundberg, Zhongming Zhao, Daniel Kraushaar, Patricia Castro, Yanna Cao, W. Jim Zheng, and Tien C. Ko: Aging- and Alcohol-Associated Spatial Transcriptomic Signature in Mouse Acute Pancreatitis Reveals Heterogeneity of Inflammation and Potential Pathogenic Factors, Journal of Molecular Medicine, 102(8):1051-1061, 2024, PMID: 38940937.
  8. Zhao Li, Rongbin Li, Kendall J. Kiser, Luca Giancardo, W. Jim Zheng: Segmenting Thoracic Cavities with Neoplastic Lesions: A Head-to-head Benchmark with Fully Convolutional Neural Networks, ACM-BCB-2021, 2021:33, doi:10.1145/3459930.3469564, 2021, PMID: 35330920.
  9. Zhao Li, Rongbin Li, Yujia Zhou, Laila Rasmy, Degui Zhi, Ping Zhu, Antonio Dono, Xiaoqian Jiang, Hua Xu, Yoshua Esquenazi, W. Jim Zheng: Prediction of Brain Metastases Development in Lung Cancer Patients by Explainable AI from Electronic Health Records, Journal of Clinical Oncology Clinical Cancer Informatics, 7:e2200141, doi:10.1200/CCI.22.00141, 2023, PMID: 37018650.
  10. Zhao Li, Lan Lan, Yujia Zhou, Ruoxing Li, Kenneth D. Chavin, Hua Xu, Liang Li, David J. H. Shih, W. Jim Zheng: Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records, Journal of Biomedical Informatics, 152:104626, 2024, PMID:38014193.
  11. Lam C. Tsoi, Michael Boehnke, Richard Klein, W. Jim Zheng: Evaluation of Genome-wide Association Study Results through Development of Ontology Fingerprint. Bioinformatics, 25(10):1314-20, 2009, PMID: 19349285.
  12. Tingting Qin, Nabil Matmati, Lam C. Tsoi, Bidyut K. Mohanty, Nan Gao, Jijun Tang, Andrew B. Lawson, Yusuf A. Hannun, W. Jim Zheng: Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acid Research, 42(18):e138, 2014, PMID: 25063300.
  13. Team UTHealth-CCBDrugProt BioCreative VII Track - Large Scale Track. (2021).
  14. Team UTHealth SBMINIH/NCATS LitCoin Natural Language Processing Challenge. (2022).
  15. Zhao Li, Qiang Wei, Liang-Chin Huang, Jianfu Li, Yan Hu, Yao-Shun Chuang, Jianping He, Avisha Das, Vipina Kuttichi Keloth, Yuntao Yang, Chiamaka S Diala, Kirk E Roberts, Cui Tao, Xiaoqian Jiang, W. Jim Zheng, Hua Xu: Ensemble pretrained language models to extract biomedical knowledge from literature, Journal of the American Medical Informatics Association, 31(9):1904-1911, 2024, PMID: 38520725.
  16. Thomas M. Asbury, Matt Mitman, Jijun Tang, W. Jim Zheng: Genome3D: A view-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics, 11:444, 2010 (Highly Accessed), PMCID: PMC2941692.
  17. Shikun Wang, Zhao Li, Lan Lan, Jieyi Zhao, W. Jim Zheng, Liang Li: GPU accelerated estimation of a shared random effect joint model for dynamic prediction, Computational Statistics & Data Analysis, 174:107528, doi:10.1016/j.csda.2022.107528, 2022, PMID: 39257897.