The CLIP Colloquium Series presents...


Content selection and rewrite for generic multi-document summarization

Ani Nenkova (Columbia University)
December 7, 2005, 11:00am, AVW 2120

Slides

Two of the main qualities expected from an automatic summarizer are the ability to select useful and interesting content and to present it as fluent readable text. In my work, I address both problems of content selection and readability, and validate my approach through empirical analysis of results from previous NIST-run summarization evaluations.

Our analyses of past evaluations show that more progress has been achieved in the area of generic multi-document summarization than in single-document summarization. But what has allowed these good results? In this talk I will discuss two aspects of a summarizer that contribute to good performance in content selection---frequency in the input as a feature and models for context adjustment. I will present a summarizer that uses frequency as a sole feature for estimating sentence importance. With the addition of a simple model for context adjustment, the summarizer performs as well as the state-of-the-art systems and allows us to focus on readability issues.

Analyses of automatic summarizers' output also show that summarizers do poorly in the area of linguistic quality. One of the problematic aspects is the referential clarity of summaries. I will present my experiments in summary rewrite, which specifically aim at improving the clarity of references, by either dropping unnecessary or repetitive information, or including additional descriptive information.

About the Speaker

Ani Nenkova is a doctoral student at the natural language processing group at Columbia University, expecting to complete her degree in Jan 2006. She holds a BS and MS degree from Sofia University (Bulgaria) where she specialized in mathematical logic. She has been actively involved in the NIST-run Document Understanding Conference summarization evaluations as part of the Columbia team, as well as chairing a work group on linguistic quality in 2003/2004, and supervising and organizing the pyramid evaluation in DUC2005.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.