Yefeng Zheng
Model Based Line Detection and Its Application to Known Form Processing
Millions of form documents, such as health insurance forms, checks, and bank slips, are being processed everyday.
Some form processing systems have been designed to process a pre-defined set of forms, where a priori information can be stored as templates in the database to guide the later processing.
For an input form, the system first selects the template which matches it best (form
identification). Then some anchors (such as specific marks, form frame lines, etc.) are detected
for registration so the variations produced by scanning (e.g. rotation, translation, and scaling) can be compensated
(form registration). Though special anchors may be available to facilitate the form identification and registration for specially designed forms, more general approaches use features related to frame lines explicitly or implicitly, such as frame lines, form cells, and the cross points of frame lines, etc., for form identification and registration.
Robust detection of frame lines is crucial in these approaches.
We proposed an HMM model based form processing scheme for both form identification and registration. The algorithm has been tested on the NIST Structured Forms Reference Set (NIST Special Database 2). The database consists of 5,590 pages of binary, black-and-white images of synthesized documents. The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together with Schedules A, B, C, D, E, F, and SE. Eight of these forms contain two pages or form faces; therefore, there are 20 different form faces represented in the database.
20 samples of each form faces are used to train HMM models. The remaining samples are used for testing. The form identification accuracy is 100%. The line detection results are good (not quantitatively evaluated). Here are several examples (green color represents detected lines for form identification and registration).
![]() |
![]() |
![]() |
![]() |
| 1040_1 form face | 1040_2 form face | 2106_1 form face | 2106_2 form face |
More samples are available
1040_1 1040_2 2106_1 2106_2 2441 4562_1 4562_2 6251 sch_a sch_b
sch_c_1 sch_c_2 sch_d_1 sch_d_2 sch_e_1 sch_e_2 sch_f_1 sch_f_2 sch_se_1 sch_se_2
Download demos here (under Windows operating systems).
Download source code here. The code should be used for non-commercial purposes only!
Here are two test sample sets. Bank deposit slips and NIST Form.
กก