TY - CONF T1 - Document Image Classification and Labeling using Multiple Instance Learning T2 - Intl. Conf. on Document Analysis and Recognition (ICDAR 11) Y1 - 2011 A1 - Kumar,Jayant A1 - Pillai,Jaishanker A1 - David Doermann AB - The labeling of large sets of images for training or testing analysis systems can be a very costly and time-consuming process. Multiple instance learning (MIL) is a generalization of traditional supervised learning which relaxes the need for exact labels on training instances. Instead, the labels are required only for a set of instances known as bags. In this paper, we apply MIL to the retrieval and localization of signatures and the retrieval of images containing machine-printed text, and show that a gain of 15-20% in performance can be achieved over the supervised learning with weak-labeling. We also compare our approach to supervised learning with fully annotated training data and report a competitive accuracy for MIL. Using our experiments on real-world datasets, we show that MIL is a good alternative when the training data has only document-level annotation. JA - Intl. Conf. on Document Analysis and Recognition (ICDAR 11) ER -