|
An average business computer user generates tens of charts and plots each week. This is in addition to the millions of potentially useful images of data-plots available on the Internet. Organizing these images based on their content can increase productivity by enabling archiving, retrieval and collaborative sharing. As part of our group`s exploration of automated document analysis, we developed a system for classifying computer generated charts. Such a categorization would be useful for semantic analysis of chart images, and image retrieval. Five categories are considered: bar-charts, curve-plots, pie-charts, scatter-plots and surface-plots. The classification task is challenging due to variability in the depicted data and stylistic variations. Consider the case of pie-charts - changes in the number of entities represented in the pie-chart and their relative quantities leads to variations in the structure of the pie-chart. Similarly stylistic variations in terms of the color palette, shading, geometry, etc., make structural analysis of the images difficult. For example, pie-charts can be drawn in 3D with perspective distortion, by ``exploding'' the segments or with images overlayed within the segments. In spite of this variability, each category has a distinctive primitive which is used to depict information. E.g., for bar charts a rectangle and for curve plots a salient curve. We have proposed an approach for classifying chart images based on the primitives depicted in them. Edge grouping and region segmentation are employed to extract salient curves and regions, which are described using local shape descriptors. These features, along with Histograms of Oriented Gradients and SIFT, characterize the statistics of the primitives. An image is classified based on its similarity with examples images of each category, measured by the overlap in the distributions of the features. We have tested the system with a database of more than 650 images collected from the Internet. The results indicate the utility of perceptual grouping in recognition. To view all the images in the database according to their classification result, click on the icons in Table 1. The quantitative results are summarized in Table 2. |
| Table 2: Summary of Classification results | |||||
| |
Bar-charts | Curve-plots | Pie-charts | Scatter-plots | Surface-plots |
| Bar-charts | 112 (90%) | 2 | 2 | 4 | 4 |
| Curve-plots | 7 | 87 (76%) | 8 | 10 | 3 |
| Pie-charts | 2 | 6 | 108 (83%) | 1 | 13 |
| Scatter-plots | 10 | 10 | 0 | 136 (86%) | 2 |
| Surface-plots | 3 | 7 | 6 | 3 | 105 (84%) |