CfAR WEEKLY SEMINAR

Friday at 3:30pm in AVW 2120 (or as notified)
Spring, 2005

Organizers: Ramani Duraiswami and David Jacobs

Webmaster: Zhiyun Li
 

  Schedule
Date Speaker Title
02/11/05 Haibin Ling Using the Inner-Distance for Classification of Articulated Shapes
02/18/05   No "Cfar Weekly Seminar" today
02/25/05 Bohyung Han Kernel based Bayesian Filtering for Object Tracking
03/04/05 Dr. Rakesh (Teddy) Kumar Creating intelligence by aligning pixels
03/11/05 Yefeng Zheng Robust Point Matching for Non-Rigid Shapes: A Relaxation Labeling Based Approach
03/18/05    
03/25/05   Spring break, no seminar
04/01/05 Gaurav Aggarwal Face: Tracking and Recognition
04/08/05 Naresh Cuntoor  Activity Recognition using HMM-based Event Probability Sequences
04/15/05 Amit Agrawal An Algebraic approach to surface reconstruction from gradient fields
04/22/05 Prof. Ron DeVOre Compression of Digitized Surfaces
04/29/05 Narayanan Ramanathan Age Progression in Human Faces : A Computational Approach
05/06/05 Philip David Object Recognition in High Clutter Images Using Line Features
     
     


Top

  Abstracts

02/11/05 Using the Inner-Distance for Classification of Articulated Shapes
Speaker Haibin Ling, UMD
Abstract We propose using the inner-distance between landmark points to build shape descriptors. The inner-distance is defined as the length of the shortest path between landmark points within the shape silhouette. We show that the inner-distance is articulation insensitive and more effective at capturing complex shapes with part structures than Euclidean distance. To demonstrate this idea, it is used to build a new shape descriptor based on shape contexts. After that, we design a dynamic programming based method for shape matching and comparison. We have tested our approach on a variety of shape databases including an articulated shape dataset, MPEG7 CE-Shape-1, Kimia silhouettes, a Swedish leaf database and a human motion silhouette dataset. In all the experiments, our method demonstrates effective performance compared with other algorithms.

PS: the TR version is available at http://www.cs.umd.edu/~hbling/Research/Inner-Distance/inner-dist-tr.pdf
02/25/05 Kernel based Bayesian Filtering for Object Tracking
Speaker Bohyung Han, UMD
Abstract Particle filtering provides a general framework for propagating probability density functions in non-linear and non-Gaussian systems. However, the algorithm is based on a Monte Carlo approach and sampling is a problematic issue, especially for high dimensional problems. This paper presents a new kernel-based Bayesian filtering framework, which adopts an analytic approach to better approximate and propagate density functions. In this framework, the techniques of density interpolation and density approximation are introduced to represent the likelihood and the posterior densities by Gaussian mixtures, where all parameters such as the number of mixands, their weight, mean, and covariance are automatically determined. The proposed analytic approach is shown to perform sampling more efficiently in high dimensional space. We apply our algorithm to the real-time tracking problem, and demonstrate its performance on real video sequences as well as synthetic examples.
03/04/05 Creating intelligence by aligning pixels
Speaker Dr. Rakesh (Teddy) Kumar, Sarnoff Corporation, Princeton NJ
Abstract The confluence of Video and Vision research at Sarnoff is strongly influenced by real world applications. Over the past decade or so, advances in fundamental algorithms, and successes in creating customer oriented technologies and products have produced a unique R&D environment at Sarnoff. This talk will present a tour of algorithm and application development as it has evolved at Sarnoff in the recent past. Specifically, we will present a framework for video representation within which progressively complex video alignment models reveal the underlying 2D and 3D nature of scenes and objects behind moving video pixels. The framework will be highlighted through applications ranging across video surveillance, mapping, entertainment and ATR.

Alignment is a key tool in the representation and manipulation of the fundamental information content present in motion video and 3D sequences. Global 2D parametric alignment models reveal the 2D nature of video frames from close-by vantage points. These can be used for stabilization of video sequences and construction of video panoramas and aerial maps. Layered representations of motion video in terms of moving object and multiple scene layers can be exploited for tracking, compression and indexing. Alignment of video frames with generic optical flow and 3D models reveals the rigid and non-rigid structure of objects and scenes, and can be used for video enhancement, 3D recovery, modeling and image based rendering. Alignment of videos and 3D data to stored reference models enables object recognition, augmented reality, video insertion, and targeting. The talk will span these and other related applications of alignment.

Finally, we will focus in the key technical area of 3D object recognition. We will present sub-linear techniques for the automatic recognition and classification of 3D objects observed by 3D sensors using a very large database of articulated 3D models. Our methods achieve a probability of correct identification (Pid) of greater than 0.95 for objects in the clear and 0.90 for objects with up to 50% occlusion.

 

Dr. Rakesh “Teddy” Kumar is currently the Technical Director of the Media Vision Laboratory at Sarnoff Corporation, Princeton, New Jersey. Prior to joining Sarnoff, he was employed at IBM. He received his Ph.D. in Computer Science from the University of Massachusetts at Amherst in 1992. He received his MS in ECE from SUNY Buffalo and BTech in EE from IIT-Kanpur. His technical interests are in the areas of computer vision, computer graphics, image processing and multimedia. At Sarnoff, he has been directing and performing commercial and government research and development projects in the areas of video surveillance and monitoring, video and 3D exploitation and analysis, object recognition, immersive tele-presence, 3D modeling, medical image analysis and multi-sensor registration. He has been one of the principal founders from Sarnoff for multiple spin-off and spin-in companies: VideoBrush,  LifeClips and Pyramid Vision Technologies. He is currently the chief technical officer for Pyramid Vision Technologies.

 

Rakesh Kumar received the Sarnoff Technical Achievement awards in 1994 and 1996 for his work in registration of multi-sensor, multi-dimensional medical images and alignment of video to three dimensional scene models respectively. He was an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence from 1999 to 2003. He was an Area Chair for IEEE Computer Vision and Pattern Recognition Conference (CVPR’2004), Program Chair for the first and second IEEE Workshop on Video Registration (IVR’ 2001 and 2004) and the Fourth IEEE Workshop on Applications of Computer Vision (WACV’98) and has served on the program committee of a number of computer vision conferences and National Science Foundation review panels. He has co-authored more than 40 research publications and has received over 17 patents, with numerous others pending.

03/11/05 Robust Point Matching for Non-Rigid Shapes: A Relaxation Labeling Based Approach
Speaker Yefeng Zheng, UMD
Abstract Shape matching or image registration, which is often formulated as a point matching problem, is frequently encountered in image analysis, computer vision, and pattern recognition. Although the problem of registering rigid shapes was widely studied, non-rigid shape matching has recently received more and more attention. For non-rigid shapes, most neighboring points cannot move independently under deformation due to physical constraints. Therefore, though the absolute distance between two points may change significantly, the neighborhood of a point is well preserved in general. Based on this observation, we formulate point matching as a graph matching problem. Each point is a node in the graph, and two nodes are connected by an edge if their Euclidean distance is less than a threshold. The optimal match between two graphs is the one that maximizes the number of matched edges. The shape context distance is used to initialize the graph matching, and relaxation labeling (after enforcing one-to-one matching) is used to refine the matching results. Non-rigid deformation is overcome by bringing one shape closer to the other in each iteration using deformation parameters estimated from the current point correspondence. Experiments on real and synthesized data demonstrate the effectiveness of our approach: it outperforms shape context and TPS-RPM algorithms under non-rigid deformation and noise on a public data set.
04/01/05 Face: Tracking and Recognition
Speaker Gaurav Aggarwal
Abstract The talk will consist of the following two (related) problems :

1) Video-based Face Recognition:

We pose video-to-video face recognition as a dynamical system identification and classification problem. Video-to-video means that both gallery and probe consists of videos. We model a moving face as a linear dynamical system whose appearance changes with pose. An auto-regressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of subspace angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments.

2) Facial Pose Tracking in Uncalibrated Videos:

This paper presents a method to recover the 3D configuration of a face in each frame of a video. The 3D configuration consists of the three translational parameters and the three orientation parameters which correspond to the yaw, pitch and roll of the face. Such information is important for applications like face modeling, recognition, expression analysis, etc. which require head stabilization. The approach combines the structural advantages of geometric modeling with the statistical advantages of a particle-filter based inference. The face is modeled as the curved surface of a cylinder which is free to translate and rotate arbitrarily. The geometric modeling takes care of pose and self-occlusion while the statistical modeling handles moderate occlusion and illumination variations. Experimental results on multiple datasets are provided to show the efficacy of the approach. The insensitivity of our approach to calibration parameters (focal length) is also shown.
 
04/08/05 Activity Recognition using HMM-based Event Probability Sequences
Speaker Naresh Cuntoor
Abstract Many activities may be described by using a few key events that are contained in the motion trajectories. These events have a natural physical interpretation, but are not amenable to direct statistical representation, due to semantic ambiguity and large data variability, i.e., the statistical properties of trajectories corresponding to the same activity may vary significantly. It is desirable, however, to learn compact models from the observed data. The hidden Markov model(HMM) is a popular choice for many recognition tasks including speech and simple visual activities. In this talk, I shall present a method to derive events from the observed trajectories using a subset of the state sequences of the HMM. We define a new variable that measures the event probability at each time instant, assuming that the events are localized in time and space. The event probability sequences act as signatures for the activities. Our experiments demonstrate the application of the event-based activity representation to activity recognition and anomaly detection.
04/15/05 An Algebraic approach to surface reconstruction from gradient fields
Speaker Amit Agrawal
Abstract Several important problems in computer vision such as the Lightness problem, Shape from Shading (SFS), Photometric Stereo (PMS), etc. require reconstructing a surface from estimated gradient field, which is usually non-integrable, i.e. have non-zero curl. We propose a purely algebraic approach to enforce integrability in discrete domain. We first show that enforcing integrability can be formulated as solving a single linear system $Ax=b$ over the image. In general, this system is under-determined. We show conditions under which the system can be solved and a method to get to those conditions based on graph theory. The proposed approach is non-iterative, has the important property of local error confinement and can be applied to several problems. Results on Lightness problem, SFS and PMS demonstrate the applicability of our method.
04/22/05 Compression of Digitized Surfaces
Speaker Ronald A. DeVore, CSCAMM
Abstract Compression of digitized elevation maps (DEMs) has many differences with image compression. Usually there is inherent geometry that needs to be preserved for DEMs. Also the applications of DEMs requires different fidelity metrics than those used for images. This talk will consider the problem of surface encoding at a fundamental level and discuss metrics to measure fidelity, models for surfaces, and geometric preserving encoders.

 
04/29/05 Face verfication across Age Progression
Speaker Narayanan Ramanathan
Abstract Human faces undergo considerable amount of variations with aging. While studies have revealed the extent to which factors such as illumination variations, pose variations, facial expression and occlusions affect face recognition, the role of natural factors such as aging effects in affecting the same are yet to be studied. How does age progression affect the similarity between two images of an individual ? What is the confidence associated with establishing the identity between two age separated face images of an individual ? On a database of pairs of passport images, we study similarity of faces as a function time. We propose a Bayesian age-difference classifier that is built on a probabilistic eigenspaces framework. Since age separated face images invariably differ in illumination and have facial variations due to aging, we propose a method to overcome non uniform illumination across face images. The problem discussed in this paper has direct applications in passport renewal and homeland security.

 
05/06/05 Object Recognition in High Clutter Images Using Line Features
Speaker Philip David
Abstract An object recognition algorithm is described that uses model and image line features to locate complex objects in high clutter environments. Finding correspondences between model and image features is the main challenge in most object recognition systems. In our approach, corresponding line features are determined by a three stage process. The first stage generates a large number of approximate pose hypotheses from correspondences of one or two lines in the model and image. Next, the pose hypotheses from the previous stage are quickly evaluated by comparing local image neighborhoods to the corresponding local model neighborhoods. Fast nearest neighbor algorithms are used to implement a distance measure that is unaffected by clutter and partial occlusion. Finally, a robust pose estimation algorithm is applied for refinement and verification, starting from the few best approximate poses produced by the previous stages. Our algorithm is invariant to changes in image scale, orientation, and partially invariant to affine distortion. Experiments on real images demonstrate robust recognition of partially occluded objects in very high clutter environments.

 

Previous Semesters:  2004-Spring  2004-Fall


Questions/comments: zli(at)cs.umd.edu

Welcome to Zhiyun Li's Homepage! updated:06-May-05 17:08:28 -0400