Center for Computational Biomedicine

Toggle Menu
Center for Computational Biomedicine

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning

This page provides the executive jar file of a chemical named entity recognition system for patent text. 

This system was trained using the merged training and development datasets released by the track on Chemical and Drug Named Entity Recognition from patent text (CHEMDNER-patents) organized by the 2015 BioCreative V challenge (http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/). Our participant systems achieved top performance in the CHEMDNER-patents challenge. The system was based on the algorithm of conditional random fields. In addition to common linguistic features for named entity recognition, features generated from domain knowledge and unsupervised learning (Word embedding and Brown clustering) are also employed. 

The readme.txt file in the same folder with the jar file provides detailed usage information.

Reference:

Zhang Y, Xu J, Wang J, Wu Y, Parkasam M, Xu H. UTH-CCB@ BioCreative V Track 2: Recognizing Chemical Entities in Patents. In Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 2015.