WWW 2008 / Poster Paper April 21-25, 2008 · Beijing, China Context-Based Page Unit Recommendation for Web-Based Sensemaking Tasks Wen-Huang Cheng Graduate Institute of Networking and Multimedia National Taiwan University Taipei 10617, Taiwan, R.O.C. David Gotz IBM T. J. Watson Research Center 19 Skyline Drive Hawthorne, New York 10532, USA wisley@cmlab.csie.ntu.edu.tw ABSTRACT Sensemaking tasks require users to perform complex research behaviors to gather and comprehend information from many sources. Such tasks are common and include, for example, researching vacation destinations or deciding how to invest. In this paper, we present an algorithm and interface that provides context-based page unit recommendation to assist in connection discovery during sensemaking tasks. We exploit the natural note-taking activity common to sensemaking behavior as the basis for a task-specific context model. Each web page visited by a user is dynamically analyzed to determine the most relevant content fragments which are then recommended to the user. Our initial evaluations indicate that our approach improves user performance. dgotz@us.ibm.com Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval; H.4.3 [Information Systems]: Communications Applications--Information Browsers Figure 1: The InsightFinder is a browser sidebar that contains two sections: a note-taking area, and a dynamically computed ranked list of relevant page units. Clicking on an item in the ranked list scrolls the browser to the prop er lo cation and highlights the corresp onding page unit using a red b ox. In our work, we exploit a user's normal note taking behavior to reduce the post-search burden which occupies so much of a user's time during sensemaking tasks. This approach is inspired by related work that examined task-specific notes to recommend previously saved material [2]. General Terms Algorithms, Design, Human Factors Keywords Recommendation, WWW, Search, Sensemaking 1. INTRODUCTION Over the past decade, improvements in search and retrieval technologies have made it extremely easy for people to obtain an accurate list of web pages most relevant to their tasks by typing a short set of keywords [1, 4]. However, studies show that as much as 75% of a web user's time is spent in a post-search phase [3] where they look through an individual web pages or sites to find the specific content that is the target of their task. A user's post-search effort to find relevant information within a web page is even more onerous during sensemaking tasks. During such tasks, users perform complex research behaviors to gather and comprehend information from many sources to address open-ended questions, such as business intelligence, or researching vacation destinations. In these tasks, users often take notes to record discovered data for later review and to help them discover connections between information found at different times. Copyright is held by the author/owner(s). WWW 2008, April 21­25, 2008, Beijing, China. ACM 978-1-60558-085-2/08/04. 2. THE INSIGHTFINDER The InsightFinder, as shown in Figure 1, is an extension to the traditional browser interface that assists in connection discovery during sensemaking tasks by providing dynamic context-based page unit recommendations. The InsightFinder offers two primary features. First, it allows users to easily capture, organize, and revisit information they find while researching a particular topic within a notebook-like sidebar. Second, and most novel, the captured information is used as input to a context-based recommendation algorithm that provides real-time recommendations of relevant page units during the user's future browsing activity. 2.1 Recording Notes The InsightFinder allows users to record notes at several levels of granularity (e.g. individual links, images, or text fragments from a web page) and provides tools for the creation, manipulation, and removal of captured information. This is accomplished via a sidebar interface located next to the traditional browser window. In the background, the InsightFinder maintains a context model based on the struc- 1073 WWW 2008 / Poster Paper ture of the user's notes which is used to provide task-relevant recommendations. When a user begins a new task, they create a new workspace which provides a blank space to record new notes. The workspace allows users to drag-and-drop information from the browser into their notes as shown in Figure 2. Users can drag text, images, links, complex combinations of these types. Users can also record entire web pages by dragging the location bar's link icon to the InsightFinder. In addition to adding new information, users can manipulate and organize existing ob jects within the InsightFinder to clarify their overall notes as their task evolves. This includes tools for folder creation and manipulation, as well as deleting, moving, editing, and revisiting existing notes. The InsightFinder provides more than basic note taking capabilities. Users can save their notes and return to them in future sessions. In addition, users can maintain unique multi-session workspaces for each user task. This is especially critical because users often engage in several interleaved sensemaking tasks over the course of several web browsing sessions. Task-specific workspaces (e.g., one for "Trip to New York" and another for "Investment Research") allow the InsightFinder to provide more accurate task-specific page unit recommendations. Underneath the graphical display of the user's collection of notes is a graph-based data structure, called the context model. Each task t has its own model, Ct , that mirrors the workspace's visual presentation and augments it with additional information required for the page unit recommendation algorithm. Every change to a user's notes is automatically propagated to the underlying context graph. April 21-25, 2008 · Beijing, China Figure 2: Users record notes by dragging content fragments (links, images, text, or entire pages) from the browser to folders in the InsightFinder. (a) (b) Figure 3: (a) The InsightFinder's recommendations are presented as a ranked list. A histogram is included to visually indicate the relative value of each unit's relevance to the context mo del. (b) Clicking on any item in the list automatically scrolls the web page and highlights the page unit in a red rectangle. browser to automatically scroll to display the selected page unit. In addition, as shown in Figure 3(b), the selected unit is highlighted within a red box to clearly indicate where the recommended content can be found. Relevance computation is an important part of the InsightFinder's ability to support sensemaking tasks. It helps users "connect the dots" by highlighting potentially relevant connections between their notes and the information currently on display within their browser. This feature can prove especially useful in quickly uncovering either intended or serendipitous connections which a user would otherwise overlook or obtain only by tediously analyzing the entire page. 2.2 Providing Recommendations The InsightFinder is designed to dynamically recommend the most task-relevant content on each web page visited by a user. This is done by comparing the content on each visited page to the information stored in the user's notes. The results are presented visually to the user to reduce the time spent searching through pages to find relevant information. The recommendation algorithm, expressed in Equation 1, is re-evaluated every time a user navigates to a new web page. The InsightFinder employs a structure-based algorithm to segment each page Pi into a set of individual page units, pi,k , with the goal of creating units that contain semantically consistent data (i.e., the content within a single page unit should share a common topic). pi,k = Rank(S eg ment(Pi ), Ct ) (1) 3. EVALUATION While the InsightFinder is only an early prototype, our initial evaluations show that it is effective in reducing the time required to locate relevant information within a web page. In laboratory experiments, we asked 10 users to perform four information location tasks using the Firefox browser. In all four tasks, users with access to the InsightFinder exhibited a statistically significant reduction (p < 0.01) in the time required to perform their task. These early results motivate our continuing work in this direction. A relevance algorithm, Rank, then compares the information stored in the active context model Ct with the content of each of the page units extracted from the browser's current page, P . The Rank algorithm returns a vector of page units ranked by each one's computed degree of relevance to Ct . Our prototype uses a bag-of-works representation for page units and ob jects in Ct , and employs traditional textoriented measures (e.g., the Jaccard coefficient and pointwise mutual information) within the Rank algorithm. The most relevant page units in pi,k are recommended to the user through the InsightFinder's sidebar. The lower portion of the interface presents a sorted histogram providing an intuitive display of the degree of relevance for each of the recommended units. This interface is shown in Figure 3(a). A user's click on any item in the list causes the 4. REFERENCES [1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of Inter. WWW Conf., 1998. [2] D. Gotz. The scratchpad: Sensemaking support for the web. In Proc. of the Inter. WWW Conf. Posters, 2007. [3] A. W. Lazonder, H. J. A. Biemans, and I. G. J. H. Wopereis. Differences between novice and experienced users in searching information on the world wide web. Jrnl of the American Society for Info. Sci., 51:576­581, 2000. [4] M. Marchiori. The quest for correct information on the web: Hyper search engines. In Proc. of Inter. WWW Conf., 1997. 1074