In this talk, I discuss some of the work that I did over two months at IBM Research (Watson) in the improvement of the output of an English-to-Spanish machine translation system. It happens that Spanish inflectional morphology encodes certain distinctions that English does not, such as gender and number on articles and various types of verb forms. This makes it more difficult for the MT system we were using to make the correct choice of Spanish form.
I present a technique I developed with Young-Suk Lee at IBM Research to correct the inflectional morphology in English-to-Spanish MT output. We used a Viterbi decoder algorithm and a dictionary of Spanish morphological alternatives to replace incorrect forms with correct ones based on a word trigram model of Spanish. In addition, we experimented with augmentations of this techique that involved part-of-speech tagging and source word information. I discuss the preliminary results of a few weeks worth of work on this.
Asad Sayeed is working on his PhD in computer science at the University of Maryland, studying within the areas of computational linguistics and theoretical syntax with Prof. Amy Weinberg. He has a master's degree in the same from the University of Ottawa, Canada and a bachelor's degree from Carleton University, Canada.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.