SHORT COURSE ON NLP, Summer 1998

SHORT COURSE ON NLP
Problems and Approaches in Building
Large-Scale, Multilingual Systems

Bonnie Dorr, bonnie@cs.umd.edu
Amy Weinberg, weinberg@umiacs.umd.edu

Date: Wed-Fri, June 3-5, 1998
Time: 9:00-12:00
Location: AV Williams Building, Room TBA


Goal of Course

The course is a intended to provide an overview of how NLP researchers construct large-scale, multilingual operational systems for text processing.

Course Requirements

There is one laboratory assignment for the course. You will have a choice of two topic areas. The first is the design of a Kimmo system for morphological analysis of a subset of Spanish. The second is the design of a context-free parsing system that will accept a set of well-formed sentences, while rejecting a (different) set of ill-formed sentences.

Readings

Two or three papers will be assigned each day. Please read these in advance of the lecture. Those readings that do not appear on the web site may be borrowed from Edna Walker (room AVW 3249, edna@cs.umd.edu), 9-5 on any weekday between now and June 3rd. Please xerox and return readings promptly so that others may borrow them.

Day I: Morphology:

Readings: Richard Sproat (1992). Morphology and Computation. MIT Press. Chapters 2-3; Additional Background: Evan Antworth (1990)---not required.

Day II: Syntax:

Readings: James Allen (1995). Natural Language Understanding, (chapter 4-5, pp. 81-156), Benjamin/Cummings, Redwood City, CA; J. Earley (1986). An Efficient Context-Free Parsing Algorithm, Readings in Natural Language Processing, Morgan-Kaufman, Los Altos, CA, 1986, pp. 25-33 (Originally from CACM 13:2, 1970, pp. 94-102).

Day III: Current Problems and Approaches to Issues in NLP:

Readings (available online):
(1) Steven Abney (1996). Tagging and Partial Parsing;
(2) Eric Brill (1997). Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging;
(3) Bonnie Dorr (1997). Large-Scale Acquisition of LCS-Based Lexicons for FLT.