UMIACS Computational Linguistics Colloquium, April 25, 2001

Applying sample selection to training parsers


Rebecca Hwa


UMD


UMIACS Computational Linguistics Colloquium

April 25, 2001,
10am, AVW Room 2120


I will talk about some of my ongoing studies in applying sample selection to training parsers. Sample selection is a framework for supervised learning in which the learner selects the data to be used as its training examples (instead of passively receiving the training examples). If the learner could identify good training examples, the teacher (usually human) would not have to annotate uninformative examples. Because training high-performance parsers typically requires large quantities of annotated data as training examples, sample selection may be a promising approach to reduce the amount of training examples needed and the human's effort in annotating them.

In this talk, I will first describe the sample selection framework and give an example of how it can be applied to training a prepositional phrase attachment learner. Next, I will show some preliminary experimental results on applying sample selection to training a statistical lexicalized parser proposed by Collins. Currently, my best figure suggests a 21% reduction in the amount of annotation. I believe that this figure can be improved. I look forward to your input and suggestions.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).