OCR Post-Processing for Low Density Languages

 

 Okan Kolak

University of Maryland, College Park, Maryland


UMIACS Computational Linguistics Colloquium

September 14, 2005, 11:00am, AVW 3258


ABSTRACT

We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted finite state machines. We evalute the technique in a number of scenarios relevant for natural language processing, inlcuidng creation of new OCR capabilities for low density languages, improvement of OCR performance for a native commercial system, acquisition of knowledge from a foreign-language dictionary, creation of a parallel text, and machine translation from OCR output.


 

 

 

 

For the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/.  If you are interested in meeting with the speaker, please contact Jimmy Lin <http://www.glue.umd.edu/~jimmylin/>  Lin (jimmylin(at)umiacs.umd.edu <.