Topic hierarchies have long been used to organize, summarize, and access information. This talk explores a new technique for the automatic generation of topic hierarchies. The result is a mutli-document summary that consists of words and phrases selected for their ability to relate the topics and sub-topics of the document set to a user. In addition to this summary, the topic hierarchy also provides a mechanism for a user to navigate through the retrieved set of documents and identify those that are of greatest interest. This work differs from previous attempts to generate topic hierarchies in that it relies on statistical analysis of text and on language modeling to identify descriptive words from the document set and to then organize these words in a hierarchical structure. The talk describes a formal framework developed for selecting terms in the hierarchy and the algorithms used to automatically generate this hierarchy.
Dr. Lawrie earned her PhD from the University of Massachusetts, Amherst, in September 2003. While at UMass, she studied information retrieval as part of the Center for Intelligent Information Retrieval under the direction of Bruce Croft. Dr. Lawrie is currently an assistant professor of Computer Science at Loyola College in Maryland where her research interests continue to focus on the organization of information as well as developing methods for evaluating these organization techniques.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.