Authors: Bjoern-Toby Berster, BSc
Masters thesis, The University of Texas School of Biomedical Informatics at Houston.
Abstract:
Coping with the ambiguous meanings of words has long been a hurdle for information retrieval and natural language processing systems. This paper present a new word sense disambiguation approach based on high dimensional binary vectors, which encode meanings of words based on the different contexts in which they occur. In our approach, a randomly constructed vector is assigned to each ambiguous term, and another to each sense of this term. In the context of a sense-annotated training set, a reversible vector transformation is used to combine these vectors, such that both the term and the sense assigned to a context in which the term occurs are encoded into vectors representing the surrounding terms in this context. When a new context is encountered, the information required to disambiguate this term is extracted from the trained semantic vectors for the terms in this context by reversing the vector transformation to recover the correct sense of the term. On repeated experiments using ten-fold cross-validation and a standard test set, we obtained results comparable to the best obtained in previous studies. These results demonstrate the potential of our methodology, and suggest directions for future research.