Research Projects
Object Detection in Video Sequences
A good training dataset, representative of the test images expected in a given application, is critical for ensuring good performance of a visual categorization system. Obtaining task specific datasets of visual categories is, however, far more tedious than obtaining a generic dataset of the same classes. We propose an Incremental Multiple Kernel Learning (IMKL) approach to object recognition that initializes on a generic training database and then tunes itself to the classification task at hand. Our system simultaneously updates the training dataset as well as the weights used to combine multiple information sources. We demonstrate our system on a vehicle classification problem in a video stream overlooking a traffic intersection. Our system updates itself with images of vehicles in poses more commonly observed in the scene, as well as with image patches of the background, leading to an increase in performance. A considerable change in the feature combination weights is observed as the system gathers scene specific training data over time. The system is also seen to adapt itself to the illumination change in the scene as day transitions to night.
Publication:
Scene it or not? Incremental Multiple Kernel Learning for Object DetectionAniruddha Kembhavi, Behjat Siddiquie, Roland Miezianko, Scott McCloskey, Larry S. Davis
International Conference on Computer Vision - ICCV 2009
Human Detection using Partial Least Squares Analysis
Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.
Publication:
Human Detection using Partial Least Squares AnalysisWilliam Robson Schwartz, Aniruddha Kembhavi, David Harwood, Larry S. Davis
International Conference on Computer Vision - ICCV 2009 (ORAL)
Detecting Human Actions from Single Images
Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human object interactions. Previous approaches to object and action recognition rely on static shape/appearance feature matching and motion analysis respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information.
Publication:
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for RecognitionAbhinav Gupta, Aniruddha Kembhavi, Larry S. Davis
To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence
(Special Issue on Probabilistic Graphical Models)
Variable Selection for Object Detection Applications
I am currently working on this problem. I shall be putting up some more information in the upcoming weeks. Here is an example result:
Resource Allocation for Multiple Target Tracking
Particle filters have been very widely used to track targets in video sequences. However, they suffer from an exponential rise in the number of particles needed to jointly track multiple targets. On the other hand, using multiple independent filters to track in crowded scenes often leads to erroneous results. We present a new particle filtering framework which uses an intelligent resource allocation scheme allowing us to track a large number of targets using a small set of particles. First, targets with overlapping posterior distributions and similar appearance models are clustered into interaction groups and tracked jointly, but independent of other targets in the scene. Second, different number of particles are allocated to different groups based on the following observations. Groups with higher associations (quantifying spatial proximity and pairwise appearance similarity) are given more particles. Groups with larger number of targets are given a larger number of particles. Finally, groups with ineffective proposal distributions are assigned more particles. Our experiments demonstrate the effectiveness of this framework over the commonly used joint particle filter with Markov Chain Monte Carlo (MCMC) sampling.
Publication:
Resource Allocation for Tracking Multiple Targets using Particle FiltersAniruddha Kembhavi, William Robson Schwartz, Larry S. Davis
Workshop on Visual Surveillance - VS 2008, held at ECCV 2008
The Bowerbird Project
Sociobiologists collect huge volumes of video to study animal behavior (our collaborators work with 30,000 hours of video). The scale of these datasets demands the development of automated video analysis tools. Detecting and tracking animals is a critical first step in this process. However, off-the-shelf methods prove incapable of handling videos characterized by poor quality, drastic illumination changes, non-stationary scenery and foreground objects that become motionless for long stretches of time. We improve on existing approaches by taking advantage of specific aspects of this problem: by using information from the entire video we are able to find animals that become motionless for long intervals of time; we make robust decisions based on regional features; for different parts of the image, we tailor the selection of model features, choosing the features most helpful in differentiating the target animal from the background in that part of the image. We evaluate our method, achieving almost 83% tracking accuracy on a more than 200,000 frame dataset of Satin Bowerbird courtship videos.
Publication:
Tracking Down Under: Following the Satin BowerbirdAniruddha Kembhavi, Ryan Farrell, Yuancheng Luo, David Jacobs, Ramani Duraiswami, Larry S. Davis
Workshop on Applications of Computer Vision - WACV 2008
Motion segmentation and activity representation in crowds
Video surveillance of large facilities, such as airports, rail stations, and casinos, is developing rapidly. Cameras installed at such locations often overlook large crowds, which makes problems such as activity and scene understanding very challenging. Traditional activity understanding techniques, which rely on input from lower level processing units dealing with background subtraction, human detection, and tracking, are unable to cope with frequent occlusions in such scenes. We propose a novel spatiotemporal segmentation and activity recognition framework that bypasses these commonly used low-level modules. We model each local spatiotemporal patch as a dynamic texture. Using a suitable distance metric to compare two local patches based on their estimated dynamic texture parameters, we segment a video into spatiotemporal regions that show similar motion patterns. We are also able to temporally stitch together local regions to form activity streamlines and represent each streamline by its constituent dynamic textures. This allows us to seamlessly perform activity recognition without explicitly detecting individuals in the scene. We demonstrate our framework on multiple datasets and show favorable results compared with the state of the art.
Publication:
Motion segmentation and Activity Representation in CrowdsYunqian Ma, Petr Cisar, Aniruddha Kembhavi
International Journal of Imaging Systems and Technology, Volume 19, Issue 2, 2009
(Special Issue: Contemporary Challenges in Combinatorial Image Analysis)
Unusual Event Detection
Defining and detecting unusual events in general is a very hard problem. What constitutes as being unusual is often scene and context dependent. In this paper we characterize an event in terms of the spatial locations of all objects in the scene over time. This allows us to define an event as being unusual if the interaction between targets (in terms of these locations) has not been observed before. We characterize the locations of all objects at a given time instant by a single binary image marked with their current positions in the scene. Projecting a sequence of binary image onto a lower dimensional subspace yields a representation of an activity in terms of a trajectory in the eigenspace. A particle filter framework is used to incrementally match these temporal trajectories, and build models of all usual activities seen in the past. Using this framework, we classify an observed activity as unusual, if it deviates sufficiently from all the models representing usual activities.







