The CLIP Colloquium Series presents...


Exploiting Ambiguous Input in Statistical Machine Translation

Richard Zens (Aachen University)
September 5, 2006, 11:00am, AVW TBA

Machine translation (MT) is often not a stand-alone application, but a part of a larger system. Thus, the input to the MT system is the output of another, usually imperfect, NLP tool. A typical example of this scenario is spoken language translation. Here, the MT system has to translate the output of an automatic speech recognition (ASR) system. The classic approach to this problem is to just translate the best output sentence of the ASR system. This has the disadvantage that the MT system cannot recover from errors of the ASR system. We will present an approach to take alternative ASR hypotheses into account during the translation process. This is done using confusion networks as a representation of these alternatives. This has two advantages: we can take a large number of alternatives into account and the decoding is still efficient. We will present experimental results for the Chinese-English IWSLT/BTEC task and the Spanish-English EPPS task. In both cases, we achieve significant and consistent improvements over the baseline system.

About the Speaker

Richard Zens is a Ph.D. candidate at the Chair of Computer Science 6 at RWTH Aachen University. He holds a degree in computer science (Diplom-Informatiker). His research focus is on phrase-based statistical machine translation.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.