UMIACS Computational Linguistics Colloquium, October 15, 1998

Local Authors Preview Talks, #1



Nizar Habash, University of Maryland
Late addition, to start at 5pm: Philip Resnik, University of Maryland


UMIACS Computational Linguistics Colloquium

October 15, 1998, 4pm, AVW Room 2120


Nizar Habash (with David Traum)
A Thematic Hierarchy for Efficient Generation from Lexical-Conceptual Structure

This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment.

Late addition, to start at 5pm:

Philip Resnik
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. In this talk I present the necessary first results in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, almost fully language independent, and scalable, and preliminary evaluation results indicate that the method should be accurate enough to apply without human intervention in order to build high quality parallel corpora.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Mari Broman Olsen (molsen@umiacs.umd.edu) or Philip Resnik (resnik@umiacs.umd.edu).