The CLIP Colloquium Series presents...


Computational and Representational Models of Information Searching Behavior

Hemant Joshi (University of Arkansas at Little Rock)
February 21, 2007, 11:30am, AVW 2120 (NOTE SPECIAL TIME)
INTERNAL TALK

Slides

The study of information seeking behavior is necessary to understand the underlying human thought processes that drive behavior. We study query logs to better understand generic user searching behavior. Query sessions form the context in which query behavior can be generalized. Query data is initially categorized into 35 pre-determined categories from dmoz.org. Categories such as music, sports, technology, etc. are generic by nature. We use WordNet to find semantic distance between the given 35 seed categories and session-based queries. Dictionary-based distance metrics are used to associate synonyms and other related concepts to the given query. The categories are expanded to include queries that are closely associated with each other, thus forming the super concepts. The super concepts are equivalent of Google Sets in that for any given query, we can display related set members. The difference is that the super concepts consist of only those queries that are used by users.

Continuing with the generalization effort of user searching behavior, we categorize most queries into one of the four categories: Music (A), Technology (T), Sports and Recreation (C) and Finance (G). We identify users with unique DNA patterns using the Observed Percentage Difference (OPD) method used in DNA sequence alignment problem. Using a sliding windows technique, we study user behaviors from the point of view of DNA sequences. Finally we calculate co-occurrence matrix of all gene sequences to identify latent behavioral patterns. This study allows us to identify two users who are interested in iPod and plasma TVs (which are 2 different queries not related to each other) as ones with similar behavior over long term.

Personalization requires filtering information to find the most relevant information according to what we already know. Depending on whether the user is a novice or expert in the particular field, the information seeking process changes. We simulate such quantitative models to filter irrelevant information. This model is adaptive and iteratively evolves as the knowledge about a concept grows.

About the Speaker

Hemant Joshi is currently pursuing a Ph.D. at the University of Arkansas at Little Rock, with an expected graduation date of May, 2007. His dissertation topic is "Evolutionary Behavior of Textual Semantics".


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.