|
Abstract: Spatio-temporal
patterns abound in the real world, and understanding them
computationally holds the promise of enabling a large class of
applications such as video surveillance, biometrics, computer graphics
and animation. In this dissertation, we study models and algorithms to
describe complex spatio-temporal patterns in videos for a wide range of
applications.
The
spatio-temporal pattern recognition problem involves recognizing an
input video as an instance of a known class. For this problem, we show
that a first order Gauss- Markov process is an appropriate model to
describe the space of primitives. We then show that the space of
primitives is not a Euclidean space but a Riemannian manifold. We use
the geometric properties of this manifold to define distances and
statistics. This then paves the way to model temporal variations of the
primitives. We then show applications of these techniques in the
problem of activity recognition and pattern discovery from long videos.
The
pattern discovery problem on the other hand, requires uncovering
patterns from large datasets in an unsupervised manner for applications
such as automatic indexing and tagging. Most state-of-the-art
techniques index videos according to the global content in the scene
such as color, texture and brightness. In this dissertation, we discuss
the problem of activity based indexing of videos. We examine the
various issues involved in such an effort and describe a general
framework to address the problem. We then design a cascade of dynamical
systems model for clustering videos based on their dynamics. We augment
the traditional dynamical systems model in two ways. Firstly, we
describe activities as a cascade of dynamical systems. This
significantly enhances the expressive power of the model while
retaining many of the computational advantages of using dynamical
models. Secondly, we also derive methods to incorporate view and
rate-invariance into these models so that similar actions are clustered
together irrespective of the viewpoint or the rate of execution of the
activity. We also derive algorithms to learn the model parameters from
a video stream and demonstrate how a given video sequence may be
segmented into different clusters where each cluster represents an
activity.
Finally,
we show the broader impact of the algorithms and tools developed in
this dissertation for several image-based recognition problems that
involve statistical inference over non-Euclidean spaces. We demonstrate
how an understanding of the geometry of the underlying space leads to
methods that are more accurate than traditional approaches. We present
examples in shape analysis, object recognition, video-based face
recognition, and age-estimation from facial features to demonstrate
these ideas.
|