UMIACS Computational Linguistics Colloquium, November 13, 2002

Linguistic Metrics and Optimal Parsing Algorithms


Khalil Sima'an


University of Amsterdam


UMIACS Computational Linguistics Colloquium

November 13, 2002,
3:30pm, AVW Room 2120


Given a probabilistic parsing model and an evaluation metric for scoring the match between parse-trees, e.g., PARSEVAL (Black et al. 1991), this paper addresses the problem of how to select the on average best scoring parse-tree for an input sentence. The common method dictates that it is optimal to select the parse with the highest probability according to the model, regardless of the evaluation metric. In contrast, the ``Maximizing Metrics (MM)" method (Goodman 1996,1998;Stolcke et al. 1997; Mangu et al. 2000) proposes that an algorithm that optimizes the expectation of the evaluation metric itself constitutes the optimal choice.

We study the MM method within natural language parsing. First we show that the MM proposition does not always hold for models obtained from tree-banks because often these models are rough approximations of the data. Subsequently, we state an alternative proposition: the optimal algorithm must maximize the metric that scores parse-trees according to linguistically relevant features. We present three algorithms that optimize tree matching metrics that take into account increasingly more linguistic features, and show empirically that the new algorithms compare favorably to the alternatives. Like the MM algorithms, the present algorithms are computationally attractive as they often address tractable reformulations of NP-Complete problems.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).