Evaluation of resources for question answering evaluation

TitleEvaluation of resources for question answering evaluation
Publication TypeConference Papers
Year of Publication2005
AuthorsJimmy Lin
Conference NameProceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Date Published2005///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number1-59593-034-5
Keywordspooling, Question answering
Abstract

Controlled and reproducible laboratory experiments, enabled by reusable test collections, represent a well-established methodology in modern information retrieval research. In order to confidently draw conclusions about the performance of different retrieval methods using test collections, their reliability and trustworthiness must first be established. Although such studies have been performed for ad hoc test collections, currently available resources for evaluating question answering systems have not been similarly analyzed. This study evaluates the quality of answer patterns and lists of relevant documents currently employed in automatic question answering evaluation, and concludes that they are not suitable for post-hoc experimentation. These resources, created from runs submitted by TREC QA track participants, do not produce fair and reliable assessments of systems that did not participate in the original evaluations. Potential solutions for addressing this evaluation gap and their shortcomings are discussed.

URLhttp://doi.acm.org/10.1145/1076034.1076102
DOI10.1145/1076034.1076102