Knowledge-Based Semantic Interpretation for Biomedical Text

Thomas C. Rindflesch

National Library of Medicine


UMIACS Computational Linguistics Colloquium

March 17, 2004, 11:00am, CSIC Room 4122


 

SemRep is a natural language processing system being developed to recover semantic propositions from biomedical text using underspecified syntactic analysis and structured domain knowledge. A large syntactic lexicon of general and medical English and a stochastic tagger support a partial categorial analysis that identifies simple noun phrases and verb groups. Domain knowledge is provided by components of the Unified Medical Language System (UMLS). The Metathesaurus contains biomedical concepts categorized into semantic classes (or types) that serve as arguments of semantic predications stipulated in the Semantic Network. During interpretation, simple noun phrases functioning as referring expressions are mapped to concepts in the Metathesaurus, while syntactic phenomena that “indicate” semantic predicates (including verbs, prepositions, nominalizations, and the head-modifier relation in noun phrases) are mapped to predicates in the Semantic Network. Syntactic constraints on argument identification are controlled by an underspecified dependency grammar and address some aspects of argument coordination, relativization, and negation. Domain restrictions are enforced by a meta-rule that ensures that all semantic propositions identified by SemRep are sanctioned by a predication in the Semantic Network.

 

SemRep serves as the basis for several ongoing research initiatives in biomedical information management, including efforts directed at extracting molecular biology information from text, processing clinical data in patient records, and automatic summarization of the results of PubMed searches.


 About the Speaker:

Thomas C. Rindflesch has a BA in Arabic and a Ph.D. in linguistics from the University of Minnesota. His dissertation focused on theoretical issues concerned with enhancing the efficiency of natural language processing by exploiting information inherent in various syntactic structures. During graduate school he worked for Academic Computing Services at the University of Minnesota, advising faculty researchers on the application of computers in the humanities and social sciences. After receiving his degree he taught courses in general and computational linguistics at the University of Minnesota before joining the Lister Hill Center in 1991. He works in the Natural Language Systems Group focusing on natural language processing for accessing and managing biomedical information. His research concentrates on using the medical domain knowledge in the UMLS Metathesaurus and Semantic Network for extracting semantic knowledge from biomedical text. Additional information is available at http://lhncbc.nlm.nih.gov/cgsb/staff/rindflesch_thomas/

 For the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/.  If you are interested in meeting with the speaker, please contact Doug <http://www.glue.umd.edu/~oard/>  Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu> ).