The importance and effectiveness of syntax in machine translation has recently become a point of great debate in the research community. Should we bother with syntactic structures? Should these structures be informed by linguistic annotations? For years the answer from the research community was a strong affirmative. More recently, however, effective corpus-based methods involving either no linguistic information whatsoever or only weak constituency notions have challenged these assumptions.
The first part of the talk investigates the impact of linguistic annotations on MT quality. Several experiments suggest that explicit linguistic annotations can in fact be beneficial in machine translation. Yet we note that the error rate of the linguistic components involved may be a limiting factor. The second part of the talk describes a method of efficiently searching over multiple linguistic analyses in a syntax-directed translation engine.
Chris Quirk is a Researcher in the Natural Language Processing group at Microsoft Research. After studying mathematics, computer science, and a bit of linguistics at Carnegie Mellon University, he began working at Microsoft in 2000. Since he joining the NLP group, his work has focused primarily on machine translation.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.