WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Plurality: A Context-Aware Personalized Tagging System Rober t Graham Dep't of Computer Science Texas A&M University College Station, TX 77843 Brian Eoff Dep't of Computer Science Texas A&M University College Station, TX 77843 James Caverlee Dep't of Computer Science Texas A&M University College Station, TX 77843 rbgraham@tamu.edu bde@cs.tamu.edu caverlee@cs.tamu.edu ABSTRACT We present the design of Plurality,1 an interactive tagging system. Plurality's modular architecture allows users to automatically generate high-quality tags over Web content, as well as over archival and p ersonal content typically b eyond the reach of existing Web 2.0 social tagging systems. Three of the salient features of Plurality are: (i) its self-learning and feedback-sensitive capabilities based on a user's p ersonalized tagging style; (ii) its leveraging of the collective intelligence of existing social tagging services; and (iii) its context-awareness for optimizing tag suggestions, e.g., based on spatial or temp oral features. Categories and Sub ject Descriptors: H.3.1 Information Storage and Retrieval: Content Analysis and Indexing General Terms: Algorithms, Design Keywords: tags, social annotation, context-sensitive, p ersonalization browsing, tag search, and emerging tag-based information access approaches over documents currently "left out" of the Web 2.0 social tagging phenomenon? For example, few, if any, of the local documents on a user's desktop are exp osed to a Web-scale audience for tagging, and users are typically resistant to go back through their archives to manually apply tags. Similarly, a huge amount of untagged content can b e found in internal company email and document sharing networks, archival content in digital libraries, and even on the Web, where most content has yet to b e tagged. Even for Web content that has already b een tagged, a particular user's p ersonalized view over the Web content may not b e reflected in the existing tags. Crawler Filter RDBMS Tag Ranking Context Analyzer Personalizer Web Interface 1. INTRODUCTION Tags ­ words or phrases that serve as informal metadata for ob jects like Web pages, images, and videos ­ have grown in p opularity and purp ose in the last few years. Tagging as a phenomenon corresp onds with a Web 2.0 mentality that users can create not only content but a richer, more adaptive and resp onsive way to navigate and search b oth existing and new media. Widespread social tagging promises b etter and more intuitive information access through tag-based browsing (e.g., [1]), search (e.g., [4]), and new applications centered around the emergent semantics inherent in the aggregation of the tagging habits of thousands (or millions) of users (e.g., [2]). In contrast to traditional metadata annotation by exp erts, tagging can overcome less precision in individual tags (e.g., through missp ellings, spam tags, and off-topic tags) through the sheer volume of tags that can b e generated for an ob ject. Our research goal is to study how the underlying tag generation processes can b e applied in domains either lacking a wide-scale audience (which are typically assumed in Web 2.0 social tagging contexts) or lacking a tagging-savvy audience. How can we take advantage of new approaches to tag 1 Solr Figure 1: High-level system architecture for Plurality 2. DESIGN AND ARCHITECTURE With these challenges in mind, we introduce Plurality ­ an interactive tagging recommendation system (see Figure 1). Plurality is implemented using Apache's Solr ­ a web services stack built over the Lucene search engine ­ to provide real-time tag suggestions. Solr is finding increasing use and traction among institutional users seeking to create inhouse solutions for indexing large catalogs of content; it is app ealing as a tagging solution since it supp orts web services and lightweight web interfaces for easily adapting Plurality to b oth institutional and p ersonal settings with only slight modifications. Building on previous work studying http://faculty.cs.tamu.edu/caverlee/plurality Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 1165 WWW 2008 / Poster Paper Blog Entry Google is doing an online payment system, but will not be competing with PayPal. Blog Entry Maybe the high price of oil isn't such a bad thing. "When you look closely, it is hard to know what effect, exactly, oil prices have on the economy." April 21-25, 2008 · Beijing, China Suggestions paypal money shopping finance banking ebay financial bank business tools accounts payment online auction Jason Kottke's tags paypal ecommerce google Suggestions oil economics auto car iraq politics automotive cars engine greenspan war motor bush finance news Jason Kottke's tags oil economics Figure 2: Blog entries tagged using Plurality and compared with the blogger's original tags. Obtained with permission from Jason Kottke, kottke.org. the automatic generation of tags (e.g., [3],[5]), Plurality is distinguished by three salient features: 1. Leveraging Existing Tags: To b ootstrap the tagging process, Plurality can b e seeded with already tagged documents from an existing tagging service. In our first prototyp e, we have crawled the p opular social b ookmarking site del.icio.us and collected over 280,000 tags. The crawler is built in Python and is designed to allow for flexible adaptations to other tagging resources. The crawled documents are filtered through a spam detector, normalized via stemming and HTML stripping, and indexed by Lucene. In our initial design, tag recommendations for an untagged document are generated by finding the top-10 most similar documents, ranking their tags based on TF-IDF measures across the tag corpus and on the user's tag profile. After the initial b ootstrapping, the system can p ersist and grow without further use of the crawler. Figure 2 shows an example of Plurality's tag suggestions for two p osts by the prolific blogger Jason Kottke versus the actual tags assigned by Kottke himself. 2. Self-Learning and Feedback: Relying on existing tags provides a baseline which Plurality refines through usersp ecific learning and feedback. A user's p ersonal tagging style can vary in syntax (e.g., tagging a document ab out Al Gore with "al-gore" versus "AlGore"), in viewp oint (e.g., "nob el prize winner" versus "presidential runner-up"), in goal (e.g., "todo", "homework"), and in many other dimensions. In our first prototyp e of Plurality, we apply traditional information retrieval techniques and sp ecialized rule-based heuristics to model a user's tagging style based on a history of the user's tags and the user's interactions with the system. Figure 3 shows a sample user feedback screenshot. Plurality recommends a set of p otentially relevant tags; based on the user's preference, the user can accept the recommended tags, reject the tags, or add their own tags; each of these decisions is reflected in an up date to the user's tag profile. 3. Context-Sensitivity: When suggesting tags for a document, the context of b oth the document and the corpus against which tag suggestions are drawn are critically imp ortant. In Plurality, users can manually select a relevant corpus to compare against ­ e.g., all of the crawled del.icio.us documents, only the user's tagged documents ­ or select custom context filters based on temp oral or spatial features. For example, the Figure 2 blog entry ab out "the high price of oil" was written in 2005. Plurality's tag suggestions in this case are drawn from a recent crawl of del.icio.us, so some of the tag suggestions are temp orally relevant to the Figure 3: Incorporating user feedback. A snippet of text from A List Apart's RSS feed, alistapart.com. original blog entry, e.g., "iraq", "war", and "bush." One of the goals of the Plurality pro ject is to tag archival content; hence, a 1970s document referencing the "high price of oil" could b e tagged "jimmy carter" and "op ec." To capture this contextual information, Plurality uses custom time and location regular-expressions to extract the creation date of a document and the location information (if available). Based on these contextual cues, Plurality supp orts tag suggestions based on a user-sp ecific window around a particular date, or with resp ect to a user-sp ecified geographic region. 3. CLOSING REMARKS We have presented Plurality, an interactive tagging system that couples the collective intelligence of existing tag-based resources with a p ersonalized context and feedback-sensitive interface. In our ongoing work, we have deployed Plurality internally at Texas A&M, and we are collecting usage and tagging data to evaluate the effectiveness of Plurality b oth in terms of application-sp ecific tag quality (e.g., for search or browsing) and in terms of user satisfaction. Our research on Plurality continues along several directions. First, we will continue enhancing the capability and efficiency of the system through the incorp oration of tag and document clustering, as well as the further p ersonalization of results. We are also interested in continuing to explore context and its effects on tagging and tag selection ­ what contextual cues are most imp ortant to users? 4. REFERENCES [1] S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing web search using so cial annotations. In WWW, 2007. [2] C. R. Bro oks and N. Montanez. Improved annotation of the blogosphere via autotagging and hierarchical clustering. In WWW, 2006. [3] P. A. Chirita, S. Costache, S. Handschuh, and W. Nejdl. Ptag: Large scale automatic generation of p ersonalized annotation tags for the web. In WWW, 2007. [4] R. Li, S. Bao, Y. Yu, B. Fei, and Z. Su. Towards effective browsing of large scale so cial annotations. In WWW, 2007. [5] G. Mishne. Autotag: A collab orative approach to automated tag assignment for weblog p osts. In WWW, 2006. 1166