The CLIP Colloquium Series presents...


Computing Word-Pair Antonymy

Saif Mohammad CLIP Lab
October 15, 2008, 11 a.m. AVW 2120


Joint work with Bonnie Dorr and Graeme Hirst
Practice Talk for AMTA

Knowing the degree of antonymy between words has widespread applications in natural language processing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We present a new empirical measure of antonymy which combines corpus statistics with the structure of a published thesaurus. The approach is evaluated on a set of closest-opposite questions, obtaining a precision of over 80%. Along the way, we discuss what humans consider antonymous and how antonymy manifests itself in utterances.

About the Speaker

Saif Mohammad is a Research Associate in the Institute of Advanced Computer Studies at the University of Maryland. In 2008, under the supervision of Dr. Graeme Hirst, he got his Ph.D. in Computer Science from the University of Toronto. Saif's interests are in Natural Language Processing, especially Lexical Semantics. His work focuses on developing monolingual and cross-lingual computational models of semantic distance and lexical-semantic relations such as antonymy for the benefit of various natural language applications, including multidocument summarization. He is also interested in combining the state-of-the-art summarization techniques with citation information and visualization techniques to generate readily-consumable technical surveys.

Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization.

Nitin Madnani CLIP Lab
October 15, 2008, 11:30 a.m. AVW 2120



Practice Talk for AMTA

Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In our earlier work (Madnani et al., 2007), we introduced a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and demonstrated that the resulting paraphrases can be used to cut the number of human reference translations needed in half. In this paper, we take the idea a step further, asking how far it is possible to get with just a single good reference translation for each item in the development set. Our analysis suggests that it is necessary to invest in four or more human translations in order to significantly improve on a single translation augmented by monolingual paraphrases.

About the Speaker

Nitin Madnani is a Ph.D. candidate in the Department of Computer Science working with Bonnie Dorr and Philip Resnik. He is interested in charting and studying the overlap between statistical machine translation and automatic paraphrase generation.


This talk is part of the CLIP Colloquium Series. For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.