UMIACS Computational Linguistics Colloquium, November 20, 2002

A General Feature Space for Automatic Verb Classification


Suzanne Stevenson


University of Toronto


UMIACS Computational Linguistics Colloquium

November 20, 2002,
3:30pm, AVW Room 2120


Computational linguistics faces a lexical acquisition bottleneck -- if much linguistic knowledge is distributed across individual words, how can we encode all that a system needs to know, when new or unanticipated words (or uses of words) could frequently be encountered? Of particular concern is detailed knowledge of verbs, whose complex properties guide interpretation. Our approach to this problem is to automatically classify verbs into lexical semantic classes, thereby leveraging our encoding of known uses of words, to new words or uses.

We have demonstrated that simple statistics over a text corpus can be used to successfully classify verbs into Levin-type classes (Levin, 1993) that share both semantic and syntactic properties. However, our early work was limited in its applicability because it required detailed linguistic analysis of the target verb classes for determining useful statistical features. Recently, we have developed a general feature space for Levin-type classes in English, which avoids the need for determination of discriminating features on a class-by-class basis. Instead, we determine all _potential_ distinctions among Levin classes, by analyzing possible syntactic alternations and thematic assignments, rather than individual class properties.

Since we analyze verb class distinctions at this more general level, we need only do the linguistic analysis once for the classification structure, rather than having to do the individual analysis for every class that we want to distinguish. The result is a general feature space that, in principle, should be useful for any Levin-type verb classification task. We demonstrate the effectiveness of the feature space on a wider range of verb classes than has been attempted previously, and show that the method achieves comparable performance to sets of features individually chosen for the verb classes being tested.

This is joint work with Eric Joanis (University of Toronto).


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).