Binarization of low quality text using a Markov random field model

TitleBinarization of low quality text using a Markov random field model
Publication TypeConference Papers
Year of Publication2002
AuthorsWolf C, Doermann D
Conference NamePattern Recognition, 2002. Proceedings. 16th International Conference on
Date Published2002///
Keywordsanalysis;, annealing;, Bayes, Bayesian, binarization;, computing;, distributions;, document, documents;, field;, Gibbs, image, low, Markov, method;, methods;, multimedia, optimization;, probability;, processes;, processing;, QUALITY, random, simulated, text

Binarization techniques have been developed in the document analysis community for over 30 years and many algorithms have been used successfully. On the other hand, document analysis tasks are more and more frequently being applied to multimedia documents such as video sequences. Due to low resolution and lossy compression, the binarization of text included in the frames is a non-trivial task. Existing techniques work without a model of the spatial relationships in the image, which makes them less powerful. We introduce a new technique based on a Markov random field model of the document. The model parameters (clique potentials) are learned from training data and the binary image is estimated in a Bayesian framework. The performance is evaluated using commercial OCR software.