UMIACS Computational Linguistics Colloquium, March 27, 2000

Short practice talk for RIAO'2000:
A Statistical Model of Word-Level Mapping for Comparable Corpora


Mona Diab


University of Maryland


UMIACS Computational Linguistics Colloquium

March 27, 2000,
1:30pm, AVW Room 2120


This will be a 15-minute practice talk followed by feedback from the audience. It is intended primarily for faculty and students within the CLIP and LAMP laboratories but others are welcome to attend.

We present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).