Symbolic-to-statistical hybridization: extending generation-heavy machine translation

TitleSymbolic-to-statistical hybridization: extending generation-heavy machine translation
Publication TypeJournal Articles
Year of Publication2009
AuthorsHabash N, Dorr BJ, Monz C
JournalMachine Translation
Pagination23 - 63
Date Published2009///
ISBN Number0922-6567

The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT’s statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system . We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic–English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT—a primarily symbolic system extended with monolingual and bilingual statistical components—has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.