Improving text classification for oral history archives with temporal domain knowledge

TitleImproving text classification for oral history archives with temporal domain knowledge
Publication TypeConference Papers
Year of Publication2007
AuthorsOlsson SJ, Oard D
Conference NameProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Date Published2007///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-59593-597-7
Keywordsautomatic topic classification, classifying with domain knowledge, spoken document classification
Abstract

This paper describes two new techniques for increasing the accuracy oftopic label assignment to conversational speech from oral history interviews using supervised machine learning in conjunction with automatic speech recognition. The first, time-shifted classification, leverages local sequence information from the order in which the story is told. The second, temporal label weighting, takes the complementary perspective by using the position within an interview to bias label assignment probabilities. These methods, when used in combination, yield between 6% and 15% relative improvements in classification accuracy using a clipped R-precision measure that models the utility of label sets as segment summaries in interactive speech retrieval applications.

URLhttp://doi.acm.org/10.1145/1277741.1277848
DOI10.1145/1277741.1277848