Use of OCR for Rapid Construction of Bilingual Lexicons

TitleUse of OCR for Rapid Construction of Bilingual Lexicons
Publication TypeReports
Year of Publication2003
AuthorsKaragol-Ayan B, Doermann D, Dorr BJ
Date Published2003/07//
InstitutionUniversity of Maryland, College Park
Abstract

This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based and an HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better on dictionaries with a simple structure; (2) the stochastic method performs better on dictionaries with an enriched structure; (3) regardless of the degree of dictionary richness, the rule-based method gives better results for phrasal entries than for single-word entries; and (4) Our resulting bilingual lexicons are comprehensive enough to provide reasonable MT results when compared to human-constructed lexicons.