UMIACS Computational Linguistics Colloquium, November 5, 1998

A Statistical Translational Model for Comparable Corpora


Mona Diab


University of Maryland


UMIACS Computational Linguistics Colloquium

November 5, 1998, 4pm, AVW Room 2120


Researchers have been investigating the utilization of comparable corpora for creating on line thesauri as well as for information retrieval purposes. Comparable corpora have the interesting characteristic of being more widely available in comparison with parallel corpora. In this talk I will describe a statistical approach that creates translation maps using comparable corpora. The approach yields very favorable results (>93% accuracy). The evaluation served as a baseline start. I will be highlighting further research questions that need to be addressed and ways of improving the technique.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Mari Broman Olsen (molsen@umiacs.umd.edu) or Philip Resnik (resnik@umiacs.umd.edu).