Putting Formal Grammars to Work
David Chiang
What makes one grammar better than another? Formal
language theory has traditionally answered this question in terms of how a
grammar classifies strings as grammatical or ungrammatical (weak generative capacity),
whereas applications have generally been more interested in more complex
functions on strings -- for example, probability distributions, translations,
grammatical relations. This gap makes it difficult to apply formal-language
theoretic results directly.
In this talk I will outline a program for bridging this
gap. Combining two views of strong generative capacity (SGC) -- Miller's view
of SGC as "the semantics of linguistic formalism" and Joshi's view of
SGC as having to do with derivations of tree-adjoining grammars and related formalisms
-- I will sketch a basic framework in which the formal power of grammars can be
tested in ways that are more directly relevant to their applications. I will
then discuss two application areas.
First, in the area of statistical parsing, I will talk
about how grammars are used as the basis for both generative models and maximum-entropy
models, and argue that extra power for statistical modeling, in general, comes
with a computational cost. But the proof of this result reveals a connection
between lexicalized PCFG models like those of Charniak
and Collins and lexicalized probabilistic TAG models. This suggests that
lexicalized PCFG models should be thought of as defined not over
phrase-structure trees but richer structural descriptions, and a central
problem becomes that of training these models on the incomplete structural
descriptions of the Treebank. I will describe the implementation of a
generative TAG model using two different training methods, with results on both
English and Chinese.
The second application area is that of natural language
translation. A number of synchronous grammar formalisms have been proposed for syntax-based
translation -- for example, inversion transduction grammar and synchronous tree
substitution grammar -- and their relative formal power varies
depending on how we measure it. One, synchronous regular-form TAG, is more
powerful than synchronous CFG in the strictest sense, yet has the same parsing
complexity. With Mark Dras and William Schuler, I
have explored this formalism and shown how it can be used for a tricky case of
Portuguese-English translation. I will conclude by discussing some possible
ways of incorporating sights from both of these areas into full statistical machine-translation
systems.
About the
Speaker:
David Chiang is a PhD candidate in Computer and
Information Science at the
For
the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/. If you are interested in meeting with the
speaker, please contact Doug <http://www.glue.umd.edu/~oard/> Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu> ).