Building an information retrieval test collection for spontaneous conversational speech

TitleBuilding an information retrieval test collection for spontaneous conversational speech
Publication TypeConference Papers
Year of Publication2004
AuthorsOard D, Soergel D, Doermann D, Huang X, Murray CG, Wang J, Ramabhadran B, Franz M, Gustman S, Mayfield J, Kharevych L, Strassel S
Conference NameProceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Date Published2004///
Conference LocationNew York, NY, USA
ISBN Number1-58113-881-4
Keywordsassessment, Automatic speech recognition, oral history, search-guided relevance

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.