Resources

Toggle Menu
Center for Computational Biomedicine

Deep Learning

You can download the package here.

Description

We implemented deep neural networks (DNN) for sequential labeling tasks in NLP by following the methodology paper from Dr. Collobert:

Collobert, Ronan, et al. "Natural language processing (almost) from scratch." The Journal of Machine Learning Research 12 (2011): 2493-2537.

Two deep neural network classifiers were implemented using Java, including a Soft-Max DNN classifier (corresponding to the window approach in the paper) and a Soft-Max-HMM DNN classifier (corresponding to the sentence approach). You can get the stand-alone JAR file and executable training and test scripts from the download link. To use this package with existing word embedding, you can download the word table “words.lst” and embedding file “embeddings.txt” from: http://ronan.collobert.com/senna/

Both the training data file and test data file should be formatted as columns separated by '\t'. Make sure the first column is the token and the second column is the assigned tag. Other columns are external features other than the word embedding (e.g., the capital feature and POS feature).

The Soft-Max DNN classifier achieved word level precision of 96.80% for POS tagging on the WSJ corpus (only use word embedding and capital feature), which is comparable to the performance reported in the paper.

EU      I-ORG   CAP$ALL_UPPER   NNP

rejects O       CAP$ALL_LOWER   VBZ

German  I-MISC  CAP$FIRST_UPPER JJ

call    O       CAP$ALL_LOWER   NN

to      O       CAP$ALL_LOWER   TO

boycott O       CAP$ALL_LOWER   VB

British I-MISC  CAP$FIRST_UPPER JJ

lamb    O       CAP$ALL_LOWER   NN