%0 Conference Paper %B SDAIR %D 1995 %T Generating Synthetic Data for Text Analysis Systems %A David Doermann %A Yao,S. %X In this paper we describe work on a sys-tem for modeling errors in the output of OCR systems. The project is motivated by the desire to evaluate the performance of various text analysis systems under varying, yet controlled conditions. We describe a set of symbol and page models which are used to degrade an ideal text by introducing er- rors which typically occur during scanning, decomposition and recognition of document images. A rst generation of the software is described which implements the page mod- els and allows the use of transition proba- bilities, either extracted from real data or generated synthetically, to corrupt text. %B SDAIR %P 449 - 467 %8 1995/// %G eng