Home Research
Resume

Active Segmentation With Fixation

Q. Does it make sense to segment the entire scene or image?
A. The answer is No.

Let's try to understand it with the example below:
The segmentations of the left most image as given by a normalized cut based algorithm when its parameter (the expected number of regions) is set to 10 and 60 are the middle and the right most image respectively. Now, if we ask, which one of the two segmentations is the most appropriate?

treeAndHorsetreeAndHorsetreeAndHorse

The answer to that question depends on the object of interest in the scene. If the tiny horse is of interest, the segmentation on the right makes more sense as the horse constitutes three segmented regions. However, if the tree is of interest, the segmentation shown in the middle is more appropriate. However, the horse is completely absent in this segmentation map. So, to define the appropriate segmentation of a scene, it is important to identify the object of interest. It may appear to be a chicken-and-egg problem. But it is not.

Human visual system has an attention module that uses low-level information efficiently to quickly find salient locations in the image. The human eye is drawn to these salient points (also called fixations.) This is true even when we see a picture. So, it is only obvious to have the fixation point as a part and parcel of any vision algorithm. We propose a segmentation algorithm which take a fixation as input and outputs a region containing that given fixation point in the scene. For example, if the two green crosses are two fixations points on the two different objects (horse and tree) in the scene (see the images on the left), the corresponding segmentation calculated by our method is shown in the images on the right.

treeAndHorsetreeAndHorse

treeAndHorsetreeAndHorse

Full paper ( pdf )
Download Source code.

Some pertinent questions and answers:
Q1. How is this segmentation algorithm different from all the segmentation algorithms proposed so far?

Ans: The proposed algorithm defines segmentation for the first time in an optimal fashion for a given fixation. So, it is an automatic process of segmentation which takes its only input "fixation" from an attention system that decides where the eye looks at in a scene. Besides, such a defintion is close to how the human visual system appears to work.

All segmentation algorithms so far in the vision literature depend on some kind of user inputs such as expected number of regions, threshold to stop the clustering, a box around the object of interest, seed points on the foreground and the background etc. These inputs can not be determined automatically for a new image/scene. Please read the related work section of the paper for more detailed analysis and references of the previous works.

Q2. Can the boundary edge detector be improved?

Ans: Currently, the output of the berkeley edge detector is being used as the probabilistic estimate of a boundary at any edge pixel, which contains strong edges along both the real boundary and the internal edges. In this work, motion or(and) stereo cues are used to get rid of the internal edges. But a better startegy involving all low-level cues and some high level cues can be used to output the boundary edges of the scene.

Q3. What is the application of this approach?

Ans: Segmentation reduces complexity of visual processing. Now, the robotic systems can use existing attention system to make fixations in the scene and then the proposed method can be used to obtain regions corresponding to those fixations. These regions then be used as a mid-level cue for further processing.