Cross-language retrieval systems use queries in one natural language
to guide the retrieval of documents that might be written in another.
Acquisition and representation of translation knowledge plays a
central role in this process. This paper explores the utility of two
sources of manually encoded translation knowledge, a simple bilingual
term list and the lexicon of a machine translation system, for
cross-language retrieval. We have implemented six query translation
techniques that use bilingual term lists and one based on direct use
of the translation output from an existing machine translation system;
these are compared with a document translation technique that uses
output from the same existing translation system. Average precision
measures on portions of the TREC collection suggest that arbitrarily
selecting a single translation from a bilingual dictionary is
typically no less effective than using every translation in the
dictionary, that query translation using an existing machine
translation system can achieve somewhat better effectiveness than the
simpler techniques, and that performing document translation rather
than query translation may result in further improvements in retrieval
effectiveness under some conditions.
For the colloquium series schedule, see the UMD
Computational Linguistics Colloquium Series web page at
http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested
in meeting with the speaker, please contact Mari Broman Olsen (molsen@umiacs.umd.edu) or Philip Resnik (resnik@umiacs.umd.edu).
A Comparative Study of Query and Document Translation for
Cross-Language Information Retrieval