The CLIP Colloquium Series presents...


Paraphrasing and Translation

Chris Callison-Burch
(Johns Hopkins University)
November 28, 2007, 11:00am, AVW 2120

In this talk I present a novel method for generating paraphrases that uses bilingual parallel corpora. Many previous methods for automatically generating paraphrases have used multiple translations (for instance, multiple translations of classic French novels into English) as their data source. While the variations across multiple translations contain paraphrase relationships, such data is uncommon. I show how much more common bilingual parallel corpora can be used to extract a greater number of paraphrases for a wider range of language usage. I present experimental results about how various factors affect paraphrase quality, including the amount of available training data, word sense, and word alignment quality.

In the second half of the talk I show how paraphrases trained from bilingual parallel corpora can, in turn, be used to improve the quality of statistical machine translation. Specifically, I show how paraphrases can be used to alleviate problems associated with out-of-vocabulary words and phrases. Statistical translation systems currently perform poorly when they encounter a word that was unseen in the training corpus. Since they have not learned a translation of it, they either reproduce the foreign word untranslated, or delete it. I propose replacing the unknown source phrase with a paraphrase which the model has learned the translation of, and then translating the paraphrase. I show experimental results which indicate that coverage can be increased dramatically, with most of the newly covered items translating accurately.

About the Speaker

Chris Callison-Burch is an assistant research professor at Johns Hopkins University. He completed his PhD at the University of Edinburgh in 2007. While he was a graduate student he co-founded Linear B Ltd. to commercialize research in statistical machine translation.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.