SHORT COURSE ON NLP, Summer 1998
SHORT COURSE ON NLP
Problems and Approaches in Building
Large-Scale, Multilingual Systems
Bonnie Dorr, bonnie@cs.umd.edu
Amy Weinberg, weinberg@umiacs.umd.edu
Date: Wed-Fri, June 3-5, 1998
Time: 9:00-12:00
Location: AV
Williams Building, Room TBA
Goal of Course
The course is a intended to provide an overview of how NLP researchers
construct large-scale, multilingual operational systems for text
processing.
Course Requirements
There is one laboratory assignment for the course. You will have a
choice of two topic areas. The first is the design of a Kimmo system
for morphological analysis of a subset of Spanish. The second is the
design of a context-free parsing system that will accept a set of
well-formed sentences, while rejecting a (different) set of ill-formed
sentences.
Readings
Two or three papers will be assigned each day. Please read these in
advance of the lecture. Those readings that do not appear on the web
site may be borrowed from Edna Walker (room AVW 3249,
edna@cs.umd.edu), 9-5 on any weekday between now and June 3rd. Please
xerox and return readings promptly so that others may borrow them.
Day I: Morphology:
Readings: Richard Sproat (1992). Morphology and Computation. MIT
Press. Chapters 2-3; Additional Background: Evan Antworth
(1990)---not required.
- What is phonology/morphology? Interaction with phonology/syntax.
Category and semantic sensitivity. Features. What is a finite state
automaton? What is a finite state grammar? Relationship between the two.
Modeling of Morphology.
[Amy Weinberg. 1-1/2 hrs.]
- Morphological Processing. Automata and Grammars, Finite State
Automata and Transducers. Complexity of Kimmo.
[Bonnie Dorr. 1-1/2 hrs.] Click here for notes.
Day II: Syntax:
Readings: James Allen (1995). Natural Language Understanding, (chapter
4-5, pp. 81-156), Benjamin/Cummings, Redwood City, CA; J. Earley
(1986). An Efficient Context-Free Parsing Algorithm, Readings in
Natural Language Processing, Morgan-Kaufman, Los Altos, CA, 1986,
pp. 25-33 (Originally from CACM 13:2, 1970, pp. 94-102).
- Context free parsing. Bottom Up and Top Down parsing
Algorithms and data structures in parsing.
[Amy Weinberg. 1-1/2 hrs.]
- Complexity, and Practical Design Issues in Parsing.
Chomsky Hierarchy, computational complexity. Earley's algorithm.
[Bonnie Dorr. 1-1/2 hrs.] Click here for notes.
Day III: Current Problems and Approaches to Issues in NLP:
Readings (available online):
(1)
Steven Abney (1996). Tagging and Partial Parsing;
(2)
Eric Brill (1997). Unsupervised Learning of Disambiguation Rules for Part of
Speech Tagging;
(3)
Bonnie Dorr (1997). Large-Scale Acquisition of LCS-Based
Lexicons for FLT.
- Partial parsing, Automatic Construction of parsers.
[Amy Weinberg. 1-1/2 hrs.]
- Large-Scale Multilingual Acquisition of Lexical-Semantics.
[Bonnie Dorr. 1-1/2 hrs.] Click here for notes.