Bootstrapping parsers via syntactic projection across parallel texts

TitleBootstrapping parsers via syntactic projection across parallel texts
Publication TypeJournal Articles
Year of Publication2005
AuthorsHwa R, Resnik P, Weinberg A, Cabezas C, Kolak O
JournalNat. Lang. Eng.
Pagination311 - 325
Date Published2005/09//
ISBN Number1351-3249

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.