"Vision is the art of seeing the invisible"
- Jonathan Swift
Research Areas:
2. Object Tracking and Recognition in Videos
3. Activity Analysis and Action Recognition
Click Here to see Complete List of Publications
I am interested in developing sensing, storage, processing and inference methodologies for multidimensional signals. Multidimensional signals abound in the visual domain and the techniques developed here will impact problems in computer vision and pattern recognition from video. My research has focused on these 3 distinct areas- sensing, representation and inference. Firstly, I have designed and built computational Fourier cameras that multiplex 4-dimensional optical signals onto a 2-D sensor empowering cameras to overcome their traditional limitations such as depth of field, focus etc. Secondly, I have developed novel representations for visual data acquired by these cameras which have enabled algorithms for efficient multi-video compression and target tracking for single and multiple cameras. Finally, I have also developed appropriate inference algorithms for several vision tasks such as gait based person identification, simultaneous tracking and recognition, human activity analysis and activity based video mining. This research thrust will have broad impact on several domains including consumer photography, surveillance, visual biometrics such as face and gait, markerless motion capture for medical applications and wide area visual sensor networks.
My research focus will be devoted to these interconnected areas: design of novel computational cameras that can alleviate the limitations of normal perspective cameras, development of sound statistical inference algorithms based on manifold representations and extracting the information content present in the captured videos.
Multidimensional Sensing: Computational and Fourier Cameras
The visual sensors must retain all the scene information in order to reliably extract this information from it. Unfortunately, traditional cameras are projective devices that maps lines in the 3D world to points on the image. The many-to-one nature of this mapping makes standard statistical inference algorithms for many vision tasks to be either intractable or unreliable in the presence of noise. It is desirable to come up with higher dimensional visual representations that make these inference problems tractable. My recent work shows that by capturing 4-D light-fields as opposed to 2-D images one can alleviate many of these problems. The light-field is the radiance as a function of position and directions. For scenes composed of Lambertian objects(surfaces that scatter incident light in an angle-independent manner) this leads to an overcomplete representation of the visual information contained in the scene. Therefore inference algorithms for traditionally hard problems in vision like depth extraction, stereo and separating texture and depth edges etc. become tractable. Towards this end, my research has been involved in the design of novel computational cameras, that go above and beyond the limitations of standard digital cameras. I have designed a heterodyne light-field camera that overcomes the depth-of-field and viewpoint based limitations of normal cameras. This design used
sound principles from signal processing and communication theory
especially, the Fourier domain analysis of light-fields in order to
modulate the incoming light-field so that this 4-D light field can be captured on a 2D sensor. This is one of
the first devices that modulates a 4-D signal to capture the signal at Nyquist rate on a 2-D sensor. This work also led to a theoretical understanding of non-refractive modulation of plenoptic fields and has led to several other designs and received coverage in print media (in US and Japan).
Representations for Multidimensional Visual Signals
As the number of interconnected cameras continue increasing, it becomes essential to develop efficient methods for performing pattern recognition in high-dimensional visual data. Such efforts for developing efficient methodologies for pattern recognition in video, have tremendous impact both for storage and for designing efficient inference algorithms. In this regard, I have developed compression algorithms that use projective geometry to efficiently compress multi-view videos. In surveillance and traffic monitoring, the only information that we seek may be the actual moving targets, their positions and appearance. My research has concentrated on solving many of these problems especially in the context of domain knowledge assisted systems. I have developed single and multiple camera tracking algorithms for a variety of subjects including bees in a hive, humans and faces. Specifically, I have developed a simultaneous tracking and behavior analysis approach where the behaviors of the actors are explicitly modeled as a hierarchical Markov model and the behavior and position estimation is performed by filtering using Monte Carlo methods such as particle filtering.
Visual Inference: Activity Recognition and Shape Sequence Matching
In
order to efficiently interact with its surroundings a machine vision
system must be able to understand the geometric structure of its
environment (Occlusions, 3D structure, moving objects etc.), recognize
objects and understand the actions and behaviors of the subjects. My
research has resulted in algorithms for gait based person
identification, simultaneous tracking and recognition, human activity
analysis, activity based video mining, model based face
recognition and structure from motion estimation in the presence
of visual and non-visual sensors. I have worked on an in-depth
analytical characterization of the manifold on which shape sequences
lie and this study has led to several approaches for modeling and
comparing shape sequences. Methods for comparing shape sequences are
popular and have applications in action recognition, gait recognition,
medical imaging and other related fields. I have also extended the
approach for shape sequence characterization in order to study the
’function space of time
warping’ specifically in the context of activity recognition. My
research has also led to several significant contributions to activity
recognition and mining from videos. I have studied the relative
importance of shape and dynamical cues using shape-dynamical
models. I have developed viewpoint and execution rate invariant
dynamical models for activity recognition and have proposed a method
for activity based unsupervised mining of video sequences. This is
based on an accurate manifold characterization of the observation
sequence using subspace angles between the observability spaces. These
algorithms have significant impact in applications such as
surveillance, traffic monitoring, household monitoring of the elderly
etc.