UMIACS Computational Linguistics Colloquium, June 12, 2000

Learning Probabilistic and Lexicalized Grammars for Natural Language Processing


Rebecca Hwa


Harvard University


UMIACS Computational Linguistics Colloquium

June 12, 2000,
Special Time/Place: 10:30am, AVW Room 4406


This talk addresses two questions: what are the properties of a good grammar representation for natural language processing applications, and how can such grammars be constructed automatically and efficiently? I shall begin by describing a formalism called the Probabilistic Lexicalized Tree Insertion Grammars (PLTIGs), which has several linguistically motivated properties that are helpful for processing natural languages. Next, I shall present a learning algorithm that automatically induces PLTIGs from human-annotated text corpora. I have conducted empirical studies showing that a trained PLTIG compares favorably with other formalisms on several kinds of efficient. In particular, I want to reduce the dependency of the induction process on human-annotated training data. I will show that by applying a learning technique called sample selection to grammar induction, we can significantly decrease the number of training examples needed, and thereby reducing the human effort spent on annotating training data.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).