The CLIP Colloquium Series presents...


Summarization in the biomedical domain: A case study with extrinsic evaluation measures

Ken Litkowski (CL Research)
October 26, 2005, 11:00am, AVW 2120

Summarization research, exemplified in DUC, has focused on the use of instrinsic evaluation metrics which do not indicate the usefulness of summaries for real-world tasks. In a recently completed demonstration project, CL Research made use of its Knowledge Management System (KMS) to investigate summarization, question-answering, and information extraction issues in the biomedical domain. This project used 200 documents from the primary literature on the oral and inhalational properties of botulinum toxin, a spreadsheet prepared by a biologist categorizing each document (e.g., animal species, toxin subtype, route of exposure, and dose), and a 150-page final report summarizing the properties. All materials were available in XML; texts were further processed into an XML representation of their discourse, syntactic, and semantic structures. The objective of the project was to replicate the biologist's spreadsheet and final report using appropriate summarization, question-answering, information extraction, and document exploration techniques. Summarization experiments included comparing the biologist's notes against document abstracts, general summaries against the abstracts and the biologist's notes, and topic-based summaries against material in the final report. The presentation will describe these experiments and user-modeling issues that emerged during the use of KMS.

About the Speaker

Ken Litkowski is a computational lexicologist and the sole proprietor of CL Research, which provides consulting assistance in lexicon development and the role of lexicons in NLP applications, particularly question-answering, summarization, information extraction, and document modeling and exploration. He works extensively with machine-readable dictionaries and their potential for use in identifying semantic relations. He has participated for several years in NLP evaluations, including Senseval, TREC QA, and DUC, and was responsible for two Senseval tasks on disambiguating WordNet glosses and FrameNet-based semantic role identification. He is currently working on ontological representations of documents and The Preposition Project, which is attempting to develop a definitive characterization of the syntactic and semantic behavior of all English prepositions. He is the webmaster of the ACL Special Interest Group on the Lexicon.


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.