TY - CONF T1 - Image based typographic analysis of documents T2 - Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on Y1 - 1993 A1 - David Doermann A1 - Furuta,R. KW - 2D KW - analysis; KW - attributes; KW - based KW - character KW - commands; KW - component KW - data KW - description KW - document KW - DVI KW - extraction; KW - feature KW - figure KW - file; KW - formatting KW - hierarchical KW - image KW - language; KW - languages; KW - layout; KW - line KW - margins; KW - page KW - placement; KW - processing; KW - read-order; KW - relationships; KW - representation; KW - spacing; KW - spatial KW - structures; KW - syntax; KW - synthesis; KW - typographic KW - understanding; AB - An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement JA - Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on M3 - 10.1109/ICDAR.1993.395624 ER -