Face reconstruction from monocular video using uncertainty analysis and a generic model

TitleFace reconstruction from monocular video using uncertainty analysis and a generic model
Publication TypeJournal Articles
Year of Publication2003
AuthorsRoy Chowdhury AK, Chellappa R
JournalComputer Vision and Image Understanding
Pagination188 - 213
Date Published2003/07//
ISBN Number1077-3142
KeywordsEnergy function minimization, Face modeling, Generic model, stochastic approximation, structure from motion, uncertainty analysis

Reconstructing a 3D model of a human face from a monocular video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia, etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One of the reasons is the poor quality of the input video data. Hence, it is important that 3D face reconstruction algorithms take into account the statistics representing the quality of the video. Also, because of the structural similarities of most faces, it is natural that the performance of these algorithms can be improved by using a generic model of a face. Most of the existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. In this paper, we propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. We show that it is possible to obtain a reasonably good 3D SfM estimate purely from the video sequence, provided the quality of the input video is statistically assessed and incorporated into the algorithm. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing the local regions in the two models. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model and it does so even as the quality of the input video varies. The evolution of the 3D model through the various stages of the algorithm and an analysis of its accuracy are presented.