Eliciting a corpus of word-aligned phrases for MT
Lori Levin, Alon
Lavie, Erik Peterson
Language Technologies Institute
We
describe a process for constructing an initial MT system for languages that do
not have sufficient resources for traditional data-driven approaches. The process starts with an elicitation phase
in which a bilingual informant translates phrases and supplies word alignments. The elicitation is supported by an
elicitation tool and an elicitation corpus.
This talk will include a demo of the elicitation tool and a discussion
of the contents of the elicitation corpus.
We will also describe how the elicited data is used for the automatic
learning of syntactic transfer rules.
Alon Lavie is an Associate Research Professor at the
Language Technologies Institute in the
He is currently a co-PI of the DARPA/TIDES MilliRADD
Project and the NSF/ITR funded AVENUE project, which are exploring new
machine-learning-based approaches to MT for languages with limited amounts of
online resources.
For the colloquium series schedule, see the UMD Computational
Linguistics Colloquium Series web page at
http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting
with the speaker, please contact Doug
Oard (oard@umiacs.umd.edu).