TY - JOUR T1 - Applying automatically generated semantic knowledge: A case study in machine translation JF - NSF Symposium on Semantic Knowledge Discovery, Organization and Use Y1 - 2008 A1 - Madnani,N. A1 - Resnik, Philip A1 - Dorr, Bonnie J A1 - Schwartz,R. AB - In this paper, we discuss how we apply automatically generated semantic knowledge to benefit statisticalmachine translation (SMT). Currently, almost all statistical machine translation systems rely heavily on memorizing translations of phrases. Some systems attempt to go further and generalize these learned phrase translations into templates using empirically derived information about word alignments and a small amount of syntactic information, if at all. There are several issues in a SMT pipeline that could be addressed by the application of semantic knowledge, if such knowledge were easily available. One such issue, an important one, is that of reference sparsity. The fundamental problem that translation systems have to face is that there is no such thing as the correct translation for any sentence. In fact, any given source sentence can often be translated into the target language in many valid ways. Since there can be many “correct answers,” almost all models employed by SMT systems require, in addition to a large bitext, a held-out development set comprised of multiple high-quality, human-authored reference translations in the target language in order to tune their parameters relative to a translation quality metric.1 There are several reasons that this requirement is not an easy one to satisfy. First, with a few exceptions—notably NIST’s annual MT evaluations—most new MT research data sets are provided with only a single reference translation. Second, obtaining multiple reference translations in rapid development, low-density source language scenarios (e.g. (Oard, 2003)) is likely to be severely limited (or made entirely impractical) by limitations of time, cost, and ready availability of qualified translators. ER -