With an increasing number of languages making their way to our desktops everyday via the internet, researchers have come to realise the lack of language knowledge resources for scarcely represented languages. In this paper, we present an unsupervised method for automatic multilingual word sense tagging using parallel corpora as a means of bootstrapping some of the needed knowledge resources. We will describe the method in detail and discuss a preliminary evaluation of its effectiveness for the word sense tagging task. The method is evaluated on the English Brown corpus and its translation into three different languages French, Spanish and German. A preliminary evaluation of the proposed method yielded results of up to 79% accuracy rate.
For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).