We begin by describing a set of pruning constraints that are applied in the literature to effectively restrict the search space of synchronous PCFGs intersected with target language model contexts. We apply these constraints to non-binarized grammars with a large number of non-terminals and demonstrate effective parsing within the framework of Wu, 97.
We then present a novel parsing approach that avoids language model context intersection during parsing in favor of language model driven n-best list extraction. The parsing step produces a sentence spanning parse forest which is explored in left to right target order by the N-Best extraction method.
This method avoids lossy pruning during the parsing process, searching a much larger effective parse space than practically possible in the full intersection scenario, and has the important benefit of allowing integration of a high order language within the n-best search process, rather than only in parse re-scoring.
We demonstrate the impact of this parsing approach using a SPCFG approach similar to Galley et al., 04 and compare performance against full intersection.
This is joint work with Andreas Zollmann.
Ashish Venugopal is a Ph.D candidate at the Language Technologies Institute at Carnegie Mellon University, and holds B.S. , M.S degrees from the same institution. He is a Siebel Scholar and has received the annual Graduate Student Teaching Award at Carnegie Mellon. His research focus is on syntax augmented machine translation.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.