Categorizing web search results into meaningful and stable categories using fast-feature techniques

TitleCategorizing web search results into meaningful and stable categories using fast-feature techniques
Publication TypeConference Papers
Year of Publication2006
AuthorsKules B, Kustanowitz J, Shneiderman B
Conference NameProceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Date Published2006///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number1-59593-354-9
KeywordsBrowsing, categorization, classification, metadata, open directory, taxonomies
Abstract

When search results against digital libraries and web resources have limited metadata, augmenting them with meaningful and stable category information can enable better overviews and support user exploration. This paper proposes six fast-feature techniques that use only features available in the search result list, such as title, snippet, and URL, to categorize results into meaningful categories. They use credible knowledge resources, including a US government organizational hierarchy, a thematic hierarchy from the Open Directory Project (ODP) web directory, and personal browse histories, to add valuable metadata to search results. In three tests the percent of results categorized for five representative queries was high enough to suggest practical benefits: general web search (76-90%), government web search (39-100%), and the Bureau of Labor Statistics website (48-94%). An additional test submitted 250 TREC queries to a search engine and successfully categorized 66% of the top 100 using the ODP and 61% of the top 350. Fast-feature techniques have been implemented in a prototype search engine. We propose research directions to improve categorization rates and make suggestions about how web site designers could re-organize their sites to support fast categorization of search results.

URLhttp://doi.acm.org/10.1145/1141753.1141801
DOI10.1145/1141753.1141801