UMIACS Computational Linguistics Colloquium, February 21, 2001

NTCIR-2 Experiments at Maryland: Comparing structured queries and balanced translation for Chinese/English CLIR


Douglas W. Oard and Jianqiang Wang


UMD


UMIACS Computational Linguistics Colloquium

Feb 21, 2001
10am, AVW Room 2120


Pirkola's structured queries have been shown to perform well for word-based cross-language information retrieval in European languages, but word segmentation for Mandarin Chinese is a challenging and error-prone task. It is often found in Chinese retrieval experiments that character n-grams outperform automatically segmented words. During the Mandarin-English Information (MEI) project at the Johns Hopkins Summer 2000 Workshop we compared Pirkola's structured queries with an alternative technique that we call balanced translation, finding that balanced translation coupled with post-translation re-segmentation into n-grams outperformed Pirkola's word-based technique. The Chinese/English CLIR evaluation at the second NTCIR workshop provided the opportunity to run the same experiments on a far larger collection. Curiously, we found that on the NTCIR Chinese collection Pirkola's structured queries outperform balanced translation, even when post-translation re-segmentation is used. In this talk we will summarize the MEI workshop results and our NTCIR experiments, and then present the results of our ongoing analysis into the causes for this observed difference.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).