OCR Post-Processing for Low Density Languages
Okan Kolak
University
ABSTRACT
We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted finite state machines. We evalute the technique in a number of scenarios relevant for natural language processing, inlcuidng creation of new OCR capabilities for low density languages, improvement of OCR performance for a native commercial system, acquisition of knowledge from a foreign-language dictionary, creation of a parallel text, and machine translation from OCR output.
|
|
|
For
the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/. If you are interested in meeting with the
speaker, please contact Jimmy Lin <http://www.glue.umd.edu/~jimmylin/> Lin (