Much of the world's knowledge is "buried" in the billions of pages on the web and other document collections. Data mining techniques can uncover implicit structure and regularities in data, which in turn can improve web search in various ways. I will first briefly review my work on learning to extract relationships between entities from large text collections based on a few examples explicitly provided by a user. The extracted information can be queried to return precise answers for specific user information needs. A complementary direction is applying data mining techniques to improve general web search, which is the main focus of the talk. Millions of users interact with web search engines daily, providing billions of examples of successful and unsuccessful information access attempts. I will describe a general and robust method of learning from these examples to discover patterns in web search and browsing behavior. The resulting models can be tuned for specific applications such as predicting web search result preferences, web search ranking, and proposing "best bet" results for navigational queries. This work is a significant step towards harnessing behavior of millions of users to improve the web search and information access experience.
Eugene Agichtein is Postdoc in the Text Mining, Search, and Navigation group at Microsoft Research. Eugene's research interests are in discovering and managing information in large text collections, particularly for improving access to information on the web. He obtained a Ph.D. in Computer Science from Columbia University in May 2005, and a B.S. in Engineering from The Cooper Union. His paper on scaling up information extraction received the "best student paper" award at the IEEE ICDE 2003 conference.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.