New eyes for building models from video

TitleNew eyes for building models from video
Publication TypeJournal Articles
Year of Publication2000
AuthorsFermüller C, Aloimonos Y, Brodský T
JournalComputational Geometry
Pagination3 - 23
Date Published2000/02//
ISBN Number0925-7721
Keywordsmodel building, shape reconstruction, structure from motion, video analysis

Models of real-world objects and actions for use in graphics, virtual and augmented reality and related fields can only be obtained through the use of visual data and particularly video. This paper examines the question of recovering shape models from video information. Given video of an object or a scene captured by a moving camera, a prerequisite for model building is to recover the three-dimensional (3D) motion of the camera which consists of a rotation and a translation at each instant. It is shown here that a spherical eye (an eye or system of eyes providing panoramic vision) is superior to a camera-type eye (an eye with restricted field of view such as a common video camera) as regards the competence of 3D motion estimation. This result is derived from a geometric/statistical analysis of all the possible computational models that can be used for estimating 3D motion from an image sequence. Regardless of the estimation procedure for a camera-type eye, the parameters of the 3D rigid motion (translation and rotation) contain errors satisfying specific geometric constraints. Thus, translation is always confused with rotation, resulting in inaccurate results. This confusion does not happen for the case of panoramic vision. Insights obtained from this study point to new ways of constructing powerful imaging devices that suit particular tasks in visualization and virtual reality better than conventional cameras, thus leading to a new camera technology. Such new eyes are constructed by putting together multiple existing video cameras in specific ways, thus obtaining eyes from eyes. For a new eye of this kind we describe an implementation for deriving models of scenes from video data, while avoiding the correspondence problem in the video sequence.