Automatically Evaluating Answers to Definition Questions
Dina Demner-Fushman
University
ABSTRACT
Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.
|
|
|
For
the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/. If you are interested in meeting with the
speaker, please contact Jimmy Lin <http://www.glue.umd.edu/~jimmylin/> Lin (