Script-Independent Text Line Segmentation in Freestyle Handwritten Documents

TitleScript-Independent Text Line Segmentation in Freestyle Handwritten Documents
Publication TypeReports
Year of Publication2006
AuthorsLi Y, Zheng Y, Doermann D, Jaeger S
Date Published2006/11//
InstitutionUniversity of Maryland, College Park

Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike most connected component based methods [1, 2], the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [3, 1, 2]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.