Skip to Content
SBMI Horizontal Logo

Interactive Machine Learning Package


Statistical natural language processing methods often require large annotated clinical corpora. However, it is time-consuming and costly to create such corpora in the medical domain. Therefore, methods that can efficiently integrate domain knowledge with machine learning processes to quickly build high-quality statistical models with minimum annotation cost would be highly desirable for clinical text processing.

In this study (Interactive machine learning methods for clinical natural language processing, Grant no: NLM 2R01LM010681-05), we propose to investigate interactive machine learning (IML) methods to address these challenges (identified by us (in a prior grant) and others in the field) in clinical NLP. We will develop an IML framework for three NLP related tasks including word sense disambiguation, named entity recognition, and clinical The major goals of the project are to develop IML methods and systems for clinical word sense disambiguation (WSD) and then, to extend these approaches to named entity recognition and clinical.



Datasets being generated in the project will be listed and made available for download once pertinent papers have been published.



We developed Active LEARNER, an AL-enabled annotation system for NER, which is integrated into CLAMP (



2016, AMIA: Applying Active Learning to Clinical Abbreviation Disambiguation in Real Time

2015, AMIA: Real time active learning study for clinical named entity recognition

2014, AMIA: Applying Active Learning Word Sense Disambiguation in a Real-Time Setting



  1. Moon, S. Cohen, T. Xu, H. Semantic Relatedness and Similarity between Biomedical Concepts. American Medical Informatics Association Annual Symposium, 2016. Accepted
  1. Wei, Q, Chen, C, Moon, S, Cohen, T, Xu, H. A Study of Active Learning for Named Entity Recognition of Clinical Text at the Document Level. Association Annual Symposium, 2016. Accepted
  1. Wu, Y. Xu, J. Zhang, Y. Xu, H. What Can Neural Networks Learn from Unlabeled Clinical Narratives. American Medical Informatics Association Annual Symposium, 2016. Accepted



  1. 2015, ACL-IJCNLP: Clinical Abbreviation Disambiguation Using Neural Word Embeddings.


  1. Xu J, Zhang Y, Wang J, Wu Y, Jiang M, Soysal E, and Xu H. UTH-CCB: The Participation of the SemEval 2015 Challenge – Task 14. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 311–314, Denver, Colorado, June 4-5, 2015. [NIHMSID: NIHMS947088; PMCID Pending]
  2. Jiang M, Huang Y, Fan JW, Tang B, Denny J, Xu H. Parsing clinical text: how good are the state-of-the-art parsers? BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-15-S1-S2. Epub 2015 May 20. PMID: 26045009; PMCID: PMC4460747.
  3. Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Song M, Xu H. A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time. Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. PMID: 26171081; PMCID: PMC4493336.
  4. Wu Y, Xu J, Zhang Y, Xu H. Clinical Abbreviation Disambiguation Using Neural Word Embeddings. Association for Computational Linguistics. BioNLP workshop, 2015. [PMCID Pending]
  5. Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. AMIA Annu Symp Proc. 2015 Nov 5;2015:1326-33. PMID: 26958273; PMCID: PMC4765694.
  6. Xu J, Wu Y, Zhang Y, Wang J, Lee HJ, Xu H. CD-REST: a system for extracting chemical-induced disease relation in literature. Database (Oxford). 2016 Mar 25;2016:baw036. doi: 10.1093/database/baw036. PMID: 27016700; PMCID: PMC4808251.
  7. Xu J, Wu Y, Zhang Y, Wang J, Liu R, Wei Q, and Xu H. UTH-CCB@BioCreative V CDR Task: Identifying Chemical-induced Disease Relations in Biomedical Text. Proceeding of the fifth BioCreative challenge evaluation workshop.2015 [NIHMSID: NIHMS800810; PMCID Pending]
  8. Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. A study of active learning methods for named entity recognition in clinical text. J Biomed Inform. 2015 Dec;58:11-18. doi: 10.1016/j.jbi.2015.09.010. Epub 2015 Sep 15. PMID: 26385377; PMCID: PMC4934373.
  9. Chen Y, Sun J, Huang LC, Xu H, Zhao Z. Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations. Biomed Res Int. 2015;2015:491502. doi: 10.1155/2015/491502. Epub 2015 Oct 11. PMID: 26539502; PMCID: PMC4619847.
  10. Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA, Wang L,  Blanquicett  C,  Soysal E, Xu J, Xu H. A long journey to short abbreviations - developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). JAMIA. 2017 Apr 1;24(e1):e79-e86. PubMed PMID: 27539197. [PMCID Journal – In Progress]
  11. Zhang Y, Jiang M, Wang J, Xu H. Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features. AMIA Annu Symp Proc. 2017 Feb 10;2016:1283-1292. PMID: 28269926; PMCID: PMC5333340.
  12. Hee-Jin Lee, Yaoyun Zhang, Jun Xu, Sungrim Moon, Jingqi Wang, Yonghui Wu and Hua Xu. UTHealth at SemEval-2016 Task 12: an end-to-end system for temporal information extraction from clinical notes. 2016 Presented at: 10th International Workshop on Semantic Evaluation (SemEval-2016); June 16-17, 2016; San Diego, CA, USA. [NO PMCID]
  13. Lee HJ, Zhang Y, Roberts K, Xu H. Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation. AMIA Annu Symp Proc. 2018 Apr 16;2017:1070-1079. PMID: 29854175; PMCID: PMC5977650.
  14. Zhang Y, Xu J, Chen H, Wang J, Wu Y, Prakasam M, Xu H. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database (Oxford). 2016 Apr 17;2016:baw049. doi: 10.1093/database/baw049. PMID: 27087307; PMCID: PMC4834204.
  15. Chen Y, Lask TA, Mei Q, Chen Q, Moon S, Wang J, Nguyen K, Dawodu T, Cohen T, Denny JC, Xu H. An active learning-enabled annotation system for clinical named entity recognition. BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):82. doi: 10.1186/s12911-017-0466-9. PMID: 28699546; PMCID: PMC5506567.
  16. Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical Named Entity Recognition Using Deep Learning Models. AMIA Annu Symp Proc. 2018 Apr 16;2017:1812-1819. PMID: 29854252; PMCID: PMC5977567.
  17. Ji Z, Zhang Y, Xu J, Chen X, Wu Y, Xu H. Comparing Cancer Information Needs for Consumers in the US and China. Stud Health Technology Inform. 2017;245:126-130. PMID: 29295066; PMCID: PMC5805146.
  18. Wang Y, Zheng K, Xu H, Mei Q. Interactive medical word sense disambiguation through informed learning. J Am Med Inform Assoc. 2018 Jul 1;25(7):800-808. doi: 10.1093/jamia/ocy013. PMID: 29584896; PMCID: PMC6658868.
  19. Du J, Zhang Y, Luo J, Jia Y, Wei Q, Tao C, Xu H. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):43. doi: 10.1186/s12911-018-0632-8. PMID: 30066665; PMCID: PMC6069295.
  20. Lee HJ, Zhang Y, Jiang M, Xu J, Tao C, Xu H. Identifying direct temporal relations between time and events from clinical notes. BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):49. doi: 10.1186/s12911-018-0627-5. PMID: 30066643; PMCID: PMC6069692.
  21. Li, J., Zheng, K., Xu, H., Mei, Q. and Wang, Y., 2019, June. The Strength of the Weakest Supervision: Topic Classification Using Class Labels. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (pp. 22-28). [PMCID Pending]
  22. Ji Z, Wei Q, Franklin A, Cohen T, Xu H. Cost-sensitive Active Learning for Phenotyping of Electronic Health Records. AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:829-838. eCollection 2019. PMID: 31259040; PMCID: PMC6568101
  23. Wei Q, Chen Y, Salimi M, Denny JC, Mei Q, Lasko TA, Chen Q, Wu S, Franklin A, Cohen T, Xu H. Cost-aware active learning for named entity recognition in clinical text. J Am Med Inform Assoc. 2019 Jul 11. pii: ocz102. doi: 10.1093/jamia/ocz102. PMID: 31294792; PMCID: PMC6798575
  24. Xu J, Li Z, Wei Q, Wu Y, Xiang Y, Lee HJ, Zhang Y, Wu S, Xu H. Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text. BMC Med Inform Decis Mak. 2019 19(Suppl 5):236. PMID: 31801529; PMCID: PMC6894107
  25. Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. J Am Med Inform Assoc. 2019 May 28. pii: ocz063. doi: 10.1093/jamia/ocz063. PMID: 31135882; PMCID: PMC6913210
  26. Wu Y, Warner JL, Wang L, Jiang M, Xu J, Chen Q, Nian H, Dai Q, Du X, Yang P, Denny JC, Liu H, Xu H. Discovery of Noncancer Drug Effects on Survival in Electronic Health Records of Patients With Cancer: A New Paradigm for Drug Repurposing. JCO Clin Cancer Inform. 2019 May;3:1-9. doi: 10.1200/CCI.19.00001. PMID: 31141421; PMCID: PMC6693869