%0 Conference Paper
%B Proceedings of the Fourth Workshop on Statistical Machine Translation
%D 2009
%T Fluency, adequacy, or HTER?: exploring different human judgments with a tunable MT metric
%A Snover,Matthew
%A Madnani,Nitin
%A Dorr, Bonnie J
%A Schwartz,Richard
%X Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human judgments, such as Fluency, Adequacy, and HTER, measure varying aspects of MT performance that can be captured by automatic MT metrics. We explore these differences through the use of a new tunable MT metric: TER-Plus, which extends the Translation Edit Rate evaluation metric with tunable parameters and the incorporation of morphology, synonymy and paraphrases. TER-Plus was shown to be one of the top metrics in NIST's Metrics MATR 2008 Challenge, having the highest average rank in terms of Pearson and Spearman correlation. Optimizing TER-Plus to different types of human judgments yields significantly improved correlations and meaningful changes in the weight of different types of edits, demonstrating significant differences between the types of human judgments.
%B Proceedings of the Fourth Workshop on Statistical Machine Translation
%S StatMT '09
%I Association for Computational Linguistics
%C Stroudsburg, PA, USA
%P 259 - 268
%8 2009///
%G eng
%U http://dl.acm.org/citation.cfm?id=1626431.1626480