UMIACS Computational Linguistics Colloquium, October 22, 1998

AMTA-98 Preview Talks, #2



Doug Oard, University of Maryland


UMIACS Computational Linguistics Colloquium

October 22, 1998, 4-6pm, AVW Room 2120


Douglas Oard
A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval

Cross-language retrieval systems use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, a simple bilingual term list and the lexicon of a machine translation system, for cross-language retrieval. We have implemented six query translation techniques that use bilingual term lists and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than the simpler techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Mari Broman Olsen (molsen@umiacs.umd.edu) or Philip Resnik (resnik@umiacs.umd.edu).