3D Object Tracking



Shape and measurements

An optimal shape operator $h$ to produce an accurate detection and localization of a given shape.

Figure 1: Shape filter: The shape is matched to a circular arc to detect the the eye outline, and the cross-section is designed to detect the intensity change along the boundary.
\begin{figure}\center{
\leavevmode
\epsfxsize = 10.0cm
\epsfbox{shapemask.eps}}
\end{figure}

The response of the local image $s$ of an object to the operator $h_{\xi}$ having geometric configuration $\xi$ is

\begin{displaymath}r^{\xi} = \int h_{\xi}(\textbf{u}) s(\textbf{u}) d\textbf{u} \end{displaymath}

If we assume that the image is corrupted by noise $n(t)$, then the observation $y^{\xi}$ is given by

\begin{displaymath}y^{\xi} = \int h_{\xi}(\textbf{u}) s(\textbf{u}) d\textbf{u} ...
...xtbf{u}) d\textbf{u} \nonumber = r^{\xi} + \tilde{n} \nonumber
\end{displaymath}

where $\tilde{n}$ is the noise response. Since we sample the observations $y^{\xi}$ over the course of time, we denote the observation process by

\begin{displaymath}Y_t = y^{\xi}_t = \int^t_0 h(X_s) ds + V_t \end{displaymath}



The Zakai equation and the branching particle method


The Zakai equation

The state vector $X_t$ representing the geometric parameters of an object is governed by the equation

\begin{displaymath}dX_t = f(X_t)dt + \sigma(X_t)dW_t \end{displaymath}

Here $W_t$ is a Brownian motion, and $\sigma=\sigma(X_t)$ models the state noise structure.

The tracking problem is solved if we can compute the state updates, given information from the observations. We are interested in estimating some statistic $\phi$ of the states, of the form

\begin{displaymath}\pi_t(\phi) \stackrel{\bigtriangleup}{=} E[\phi(X_t)\vert\mathcal{Y}] \end{displaymath}

given the observation history ${\mathcal Y}_t$. Zakai et al. have shown that the unnormalized conditional density $p_t(\phi)$ satisfies a partial differential equation, usually called the Zakai equation:

\begin{displaymath}dp_t(\phi) = p_t(A\phi)dt + p_t(h^*\phi)dY_t \end{displaymath}

where

\begin{displaymath}\pi_t(\phi) = \frac{p_t(\phi)}{p_t(1)} \end{displaymath}


The branching particle algorithm

The unnormalized optimal filter $p_t(\phi)$ which is the solution to the problem is given by

\begin{displaymath}\tilde{E}\left[\phi(X_t)\exp\left.\left(\int^t_0 h^*(X_s)dY_x...
...0 h^*(X_s)h(X_s)ds \right) \right\vert {\mathcal{Y}}_t \right] \end{displaymath}

We construct a sequence of branching particle systems $U_n$ as in [#!Crisan-branching-98!#], which can be proved to approach the solution $p_t$: $\lim_{n\to\infty} U_n(t) = p_t$.

Let $\{U_n(t), {\mathcal{F}}_t; \;\; 0 \leq t \leq 1\}$ be a sequence of branching particle systems on $(\Omega,{\mathcal{F}},\tilde{P})$, the standard measure space on the state space.

Initial condition

0. $U_n(t)$ is the empirical measure of $n$ particles of mass $\frac{1}{n}$, i.e., $U_n(t)=\frac{1}{n}\sum^{n}_{i=1}\delta_{x^n_i}$, where $x^n_i \in E$, for every $i, \; n \in {\bf N}$.

Evolution in the interval $[\frac{i}{n}, \frac{i+1}{n}]$, $i=0,1,...,n-1$

1. At time $\frac{i}{n}$, the process consists of the occupation measure of $m_n(\frac{i}{n})$ particles of mass $\frac{1}{n}$ ($m_n(t)$ denotes the number of particles alive at time $t$).

2. During the interval, the particles move independently with the same law as the signal $X$. Let $V(s)$, $s \in [\frac{i}{n}, \frac{i+1}{n})$ be the trajectory of a generic particle during this interval.

3. At $t=\frac{i+1}{n}$, each particle branches into $\xi^i_n$ particles with a mechanism depending on its trajectory in the interval. The mean number of offspring for a particle given the $\sigma$-field ${\mathcal{F}}_{\frac{i+1}{n}-} = \sigma({\mathcal{F}}_s, \; s < \frac{i+1}{n})$ of events up to time $\frac{i+1}{n}$ is

\begin{displaymath}
E(\xi^i_n) = \mu^i_n \stackrel{\bigtriangleup}{=} \exp \left...
...c{1}{2}\int_{\frac{i}{n}}^{\frac{i+1}{n}} h^*h(V(t))dt \right) \end{displaymath}

so that the variance of $\xi^i_n$ is minimal, consistent with the number of offspring being an integer. More specifically, we determine the number $\xi^i_n$ of offsprings by

\begin{displaymath}
\xi^i_n=\left\{ \begin{array}{ll}
[\mu^i_n] & \rm {with \;...
...\;\; probability} \;\; 1-\mu^i_n+[\mu^i_n] \end{array} \right.
\end{displaymath}

Figure 2: Schematic diagram of branching particle method
\begin{figure}\center{
\leavevmode
\epsfxsize = 15.0cm
\epsfbox{particle_diagram.nomath.eps}}
\end{figure}



Time update of the state

In the discrete-time process:

\begin{displaymath}X_{k+1} = X_k + d_k + \Sigma_k w_k \end{displaymath}

where $d_k$ is the displacement vector which involves the estimation of velocity and acceleration. $d_k$ can be further refined to ensure the maximum observation likelihood:

\begin{displaymath}d_k=\arg\max_{d} \int h(\hat{x}_k + d) ds \end{displaymath}

This seemingly trivial addition of a prediction adjustment is found to achieve a great degree of stability.

The time update step yields the prior estimate of the state and the covariance matrix:

\begin{displaymath}\hat{x}^{-}_{k+1} = \hat{x}_k+ d_k \end{displaymath}


\begin{displaymath}\hat{P}^{-}_{k+1} = \hat{P}_{k}+ \Sigma_k \end{displaymath}

Here $\hat{x}_k$ and $\hat{P}_k$ denote the posterior estimates after the measurement update (the application of the Kalman gain), which is equivalent to the observation and branching steps in the proposed algorithm. The a priori and a posteriori error covariance matrices are formally defined as

\begin{displaymath}\hat{P}^{-}_{k} & = & E[(\hat{x}^{-}_k-x_k)(\hat{x}^{-}_k-x_k)^T] \end{displaymath}


\begin{displaymath}\hat{P}_{k} & = & E[(\hat{x}_k-x_k)(\hat{x}_k-x_k)^T] \end{displaymath}

These matrices are estimated by bootstrapping the particles $x_k$ and the prior/posterior state estimates $(\hat{x}^{-}_k, \; \hat{x}_k)$ into the above expressions. We use the error covariance estimated from the particles at time $k-1$ for the diffusion at time $k$:

\begin{displaymath}\hat{\Sigma}_k = \Sigma_{k-1} = \hat{P}^{-}_{k} - \hat{P}_{k-1} \end{displaymath}



Application: Head tracking


Head model

Figure 3: Rotational motion model of the head
\begin{figure}\center{
\leavevmode
\epsfxsize = 8.0cm
\epsfbox{headellipsoid.eps}}
\end{figure}


Model of facial features

Figure 4: Ellipsoidal head model and the parametrization of facial features
\begin{figure}\center{
\leavevmode
\epsfxsize = 8.0cm
\epsfbox{facialfeature_cut.eps}}
\end{figure}

Figure 5: Three different head poses and tracked features. Upper right: rotation around $x$-axis, Lower left: rotation around $y$-axis, Lower right: rotation around $z$-axis.
\begin{figure}\center{
\leavevmode
\epsfxsize = 12.0cm
\epsfbox{poses.eps}}
\end{figure}



Camera model and filter construction

Figure 6: Perspective projection model of the camera
\begin{figure}\center{
\leavevmode
\epsfxsize = 12.0cm
\epsfbox{projection.eps}}
\end{figure}


\begin{displaymath}
\frac{X}{f} = \frac{x'}{z'} \;\;\; {\rm and} \;\;\;
\frac{Y}{f} = \frac{y'}{z'}
\end{displaymath}

Given $\xi=(C_x,C_y,C_z,\theta_x, \theta_y, \theta_z, \nu)$, the hypothetical geometric parameters of the head and feature (simply denoted by $\nu$), we compute the inverse projection on the ellipsoid, to construct the shape operator. We first compute the inverse rotation and translation to get $(x,y,z)$. Suppose the feature curve on the ellipsoid is the intersection (with the ellipsoid) of the circle $\parallel (x,y,z)-(e^{\xi}_x,e^{\xi}_y,e^{\xi}_z)\parallel^2={R^{\xi}_e}^2$ centered at $(e^{\xi}_x,e^{\xi}_y,e^{\xi}_z)$ which is on the ellipsoid. Let $P=(X,Y)$ be any point in the image. The inverse projection of $P$ is the line defined by the projection equation. The point $(x',y',z')$ on the ellipsoid is computed by solving the projection equation combined with the quadratic equation $E_{R_x,R_y,R_z,C_x,C_y,C_z}(x,y,z) = 1$. This solution exists and is unique, since we seek the solution on the visible side of the ellipsoid. The point $(x,y,z)$ on the reference ellipsoid $E_{0,0,0,C_x,C_y,C_z}(x,y,z) = 1$ is computed using the inverse projection.

If we define the mapping from $(X,Y)$ to $(x,y,z)$ by $\rho(X,Y)\stackrel{\bigtriangleup}{=}(x,y,z)\stackrel{\bigtriangleup}{=}(\rho_x(X,Y),\rho_y(X,Y),\rho_z(X,Y))$ we can construct the shape filter as

\begin{displaymath}
h^{\xi}(X,Y)=h_{\sigma}(\parallel (\rho(X,Y)-(e^{\xi}_x,e^{\xi}_y,e^{\xi}_z)\parallel^2-{R^{\xi}_e}^2)\end{displaymath}

<



Experiments on synthetic data

Figure 7: Sampled frames from a synthetic sequence. The head is moving back and forth (translation) while `shaking' (rotation). The estimated head pose and location, and the facial features, are marked.
\begin{figure}\center{
\leavevmode
\epsfxsize = 12.0cm
\epsfbox{bfshake.eps}}
\vspace{-0.4cm}
\end{figure}

Figure 8: Estimated parameters for synthetic data. (Left column: translational motion, Right column: rotational motion.) The dotted lines are the real parameters used to generate the motion.
\begin{figure}\center{
\leavevmode
\epsfxsize = 16.0cm
\epsfbox{bfshake_plot.eps}}
\vspace{-0.4cm}
\end{figure}



Experiments on real data

An example, where the person repeatedly moves his head left and right, and the rotation of the head is naturally coupled with the translation.

Figure 9: A real human head movement sequence. While the tracking shows some delays when the motion is fast, the tracked features show correct head position and pose estimates.

The contribution of the maximum observation likelihood prediction adjustment and the adaptive perturbation is verified.

Figure 10: Comparison of time update schemes. Top: no prediction adjustment, fixed diffusion, Middle: prediction adjustment only, Bottom: prediction adjustment and adaptive diffusion
\begin{figure}\vspace{-0.5cm}
\center{
\leavevmode
\epsfxsize = 12.0cm
\epsfbox{han_lr.10instances.estplot.eps}}
\vspace{-0.4cm}
\end{figure}

A small translation of the head in the vertical direction can be confused with a `nodding' motion. The following figure depicts the ambiguity present in the same sequence by plotting the projections of particles onto the $T_x-C_y$ plane. Initial distribution shows the correlation between $C_y$ and $T_x$. As more information is provided ($t=14$), the particles show multi-modal concentrations. The concentration is dispersed when the motion is rapid, and shrinks when the head motion is close to one of the two `extreme' points. The parameters eventually settle into a dominant configuration ($t=72$ and $t=210$)

Figure 11: The spread of the particles shows the ambiguity of the translation and motion parameters. As the algorithm receives more data, the uncertainty changes and is finally resolved.
\begin{figure}\vspace{-0.5cm}
\center{
\leavevmode
\epsfxsize = 15.0cm
\epsfbox{han_lr.particles.eps}}
\vspace{-0.4cm}
\end{figure}

The following shows another example in which local feature motion is tracked in addition to global object motion; the motions of the irises and upper eyelids are more carefully tracked, so that squinting and gaze are recognized.

Figure 12: Tracking of independently moving local features. The squinting and iris movement are captured and tracked, as well as the head movement.


About this document ...

3D Object Tracking

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -html_version 3.2[math] -split 0 -nonavigation -antialias trac_zakai_html.tex

The translation was initiated by Hankyu Moon on 2001-02-20


Hankyu Moon 2001-02-20