UMIACS Computational Linguistics Colloquium Series, March 5, 1998

UMIACS Computational Linguistics Colloquium Series, March 5, 1998


Glean: Using Syntactic Information in Document Filtering
Raman Chandrasekar
Institute for Research in Cognitive Science and Center for the Advanced Study of India, University of Pennsylvania

In this talk, I will describe a system called `Glean', which is predicated on the idea that any coherent text contains significant latent information, such as syntactic structure and patterns of language use, which can be used to enhance the performance of Information Retrieval systems. We show that

Glean can be used to refine documents retrieved with a standard Web search engine or an IR system by selecting relevant information and filtering out irrelevant items. The system has been tested on a large collection of newswire sentences, and achieves recall and precision figures of 88% and 97% for filtering out irrelevant documents. Its performance and modularity makes it a promising postprocessing addition to any Information Retrieval system. A version of the system is available on the Web.

This is joint work with Dr. B. Srinivas, AT&T Labs Research.


Return to the UMD Computational Linguistics Colloquium Series.