Home Research
Resume

Semantic Robot Vision Challenge 2008
Objective:
Detect, locate, and recognize any object in a real environment.

What's  Given:
1. A list of object names in a text file, say, "objects.txt"
2. A set of 50 or so images containing multiple/single/no objects


How do you learn about these objects?
Step 1: Download images
Search the internet for images of the objects with their names as keywords. Here's the source code for that. Please note that the set of images returned from Google or any other search engines contain a lot of outliers. You may have to come up with ways to weed out the outliers.    

Step 2: Object = its boundary(shape) + its interior (Keypoints)
An object can be represented by its shape and what's inside it. Shape is hard to describe but is very informative. In fact, the only way to learn about category objects like "frying pan", "apple" is by encoding their shape information whereas for specific objects like "pepsi bottle", it's the invariant logo or similar unique pattern on its interior that can identify them.

To encode the information on the interior of the object, you can use SIFT, MSER or any other keypoint detectors. Encoding shape, however, is bit tricky as you first have to extract the object out from the images and use its boundary as its shape. A segmentation strategy that I used is described later.

Each object is now represented by a set of keypoints or a set of shape descriptors extracted from all of its training images.


How do you detect and locate the objects?
Objects in an image exist at different scales and at different locations.  We can use any saliency operator (kadir et. al.) to find salient locations with scales associated with them. We can extract patches centered around these salient points; the size of these patches will be proportional to their scales.

We assume that if a patch contains the object, it must be fully contained in it. Now, segment this patch into regions. The regions that are fully contained in the patch are either the objects or parts of it. However, the regions whose boundaries align with the border of the patch make up the background. We use these background regions to generated color model for the background.


How do you recognize them?
Compare the keypoints of the objects and their shape descriptors.

Download
You can download the entire source code here. It's all in matlab with a few mexfunctions for segmentation.

Some results:
ritzirobotpan
           Ritz                                                  iRobot "DVD"                               Frying Pan