TY - CONF T1 - Measuring variability in sentence ordering for news summarization T2 - Proceedings of the Eleventh European Workshop on Natural Language Generation Y1 - 2007 A1 - Madnani,Nitin A1 - Passonneau,Rebecca A1 - Ayan,Necip Fazil A1 - Conroy,John M. A1 - Dorr, Bonnie J A1 - Klavans,Judith L. A1 - O'Leary, Dianne P. A1 - Schlesinger,Judith D. AB - The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We present results of a sentence reordering experiment with three experimental conditions. Our findings indicate a very high degree of variability in the orderings that the eighteen subjects produce. In addition, the variability of reorderings is significantly greater when the initial ordering seen by subjects is different from the original summary. We conclude that evaluation of sentence ordering should use multiple reference orderings. Our evaluation presents several metrics that might prove useful in assessing against multiple references. We conclude with a deeper set of questions: (a) what sorts of independent assessments of quality of the different reference orderings could be made and (b) whether a large enough test set would obviate the need for such independent means of quality assessment. JA - Proceedings of the Eleventh European Workshop on Natural Language Generation T3 - ENLG '07 PB - Association for Computational Linguistics CY - Stroudsburg, PA, USA UR - http://dl.acm.org/citation.cfm?id=1610163.1610177 ER -