Toggle Menu
Center for Computational Biomedicine

Deep Learning

You can download the package here.


We implemented deep neural networks (DNN) for sequential labeling tasks in NLP by following the methodology paper from Dr. Collobert:

Collobert, Ronan, et al. "Natural language processing (almost) from scratch." The Journal of Machine Learning Research 12 (2011): 2493-2537.

Two deep neural network classifiers were implemented using Java, including a Soft-Max DNN classifier (corresponding to the window approach in the paper) and a Soft-Max-HMM DNN classifier (corresponding to the sentence approach). You can get the stand-alone JAR file and executable training and test scripts from the download link. To use this package with existing word embedding, you can download the word table “words.lst” and embedding file “embeddings.txt” from:

Both the training data file and test data file should be formatted as columns separated by '\t'. Make sure the first column is the token and the second column is the assigned tag. Other columns are external features other than the word embedding (e.g., the capital feature and POS feature).

The Soft-Max DNN classifier achieved word level precision of 96.80% for POS tagging on the WSJ corpus (only use word embedding and capital feature), which is comparable to the performance reported in the paper.


rejects O       CAP$ALL_LOWER   VBZ


call    O       CAP$ALL_LOWER   NN

to      O       CAP$ALL_LOWER   TO

boycott O       CAP$ALL_LOWER   VB


lamb    O       CAP$ALL_LOWER   NN