The CLIP Colloquium Series presents...


Evaluate, Evaluate, Evaluate: Issues and Results in Pyramid Evaluation for DUC 2003, 2005 and 2006

Becky Passonneau (Columbia University)
March 8, 2006, 11:00am, AVW 2120

Slides

The pyramid method, developed by the author in collaboration with Ani Nenkova (Columbia University), is a means to model and quantify the information content in a sample of human summaries of the same source texts. Previous work on evaluation of summary content had foundered on the dilemma that different humans write summaries that share only some of their content. Lacking a specification of what should count as a good summary, we lacked a stable metric to quantify how well machines could approximate human content selection in summaries. The Document Understanding Conference (DUC) administered by NIST applied the pyramid method in 2005 and will do so again in 2006. Concurrently, I have been doing a parallel evaluation of a large subset of the DUC 2003 machine summaries with the goal of evaluating many aspects of the evaluation method. In this talk, I will discuss a paradigmatic reliability evaluation of the DUC 2003 and 2005 data, present contrasts between the two datasets arising from differences in the length of the model summaries, and discuss open-ended qualitative issues in the type of semantic representation the method relies on.

About the Speaker

Rebecca Passonneau recently joined the Center for Computational Learning Systems at Columbia University as a Research Scientist, after several years of working as a consultant for the Columbia University Computer Science Department, the Columbia University Libraries, AT&T Research Labs and ETS. She holds a Ph.D. in Linguistics from the University of Chicago, and has worked in computational linguistics ever since joining the Paoli Research Center, an AI lab dedicated to logic programming applications, in the mid 1980s. Her research is distinguished by a strong focus on the design of data collection and analysis methods, in part because of a continuing interest in discourse and language in context, where empirical work is critical. Her research interests have included nominal and temporal discourse anaphora, discourse structure, the semantics of tense and aspect, and discourse deixis. Most recently she has worked on evaluation methods for diverse NLP technologies, including dialog systems and machine summarization.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.