Maryland CPU-GPU Cluster Infrastructure

Real-Time Computer Vision

Our work in real-time computer vision applications is based on distributed cameras. By using distributed cameras, traditional problems confronted when using a single camera such as occlusion, disappearance and reappearance of objects, and recovery of 3D motion trajectories of small objects, can be conveniently addressed. The Maryland CPU-GPU cluster is well suited for the highly parallel task of video processing and the tiled display wall enables novel means of visualizing the results.

Flexiview - A 4D Video Surveillance System

Faculty:Rama Chellappa, Larry Davis and Amitabh Varshney
Corporate Collaborators:Art Pope and Tom Strat (SET Corporation)
Graduate Students:Aniruddha Kembhavi, Kaushik Mitra, Rob Patro, Aswin Sankaranarayanan, and Roman Stanchak
Undergraduate Students:John Dickerson

The 3D model and individual video feeds visualized on tiled display

Video surveillance systems are currently limited by the location, orientation, and resolution of the cameras that comprise the system. However, it may be possible to allow the remote observer to view the scene from any vantage point, regardless of where the cameras happen to have been placed. That is, each remote viewer could direct the video stream as if he had complete control of the camera placement -- he could direct the placement of a virtual camera and move it about at will, to obtain the vantage point that best allows him to observe what is most important to him.

FlexiView is a design to fulfill this vision of 4D visualization. The output of multiple video cameras is analyzed to produce a 4D model -- one that captures the full 3-dimensional shape of all scene entities (people, buildings, vehicles, etc.) as well as the dynamics (movement and motions) of all objects in the scene. With that 4D model, novel views of the scene can be synthesized that give the illusion of what would be seen had there actually been a video camera placed at the virtualized location. The remote observer can control this 4D visualization as if he had a 4D version of a TiVo DVR -- he can stop, reverse action, and review in slow motion, but he can also move the camera position and orientation in 3-space, to view the action from novel vantage points.

The technology to support this fully does not exist, but significant research advances have been made that may be combined to allow construction of a reasonable prototype. In particular, structure-from-motion, moving object detection and tracking and image-based rendering techniques provide the foundation upon which to build the 4D visualization FlexiView system. The ultimate goal of this endeavor is to design and demonstrate a prototype of a fully automated system that can accept multiple unconstrained video streams (airborne, ground-based and hand-held) and build the required 4D models and render the scene at the will of the user.

The current prototype places a number of important components of this ultimate goal in place. Using video captured from both fixed and hand held cameras we have been able to determine the 3D position of up to 10 humans on a parking lot plane. Activity analysis of the video is then used to classify the humans' actions into 1 of 8 categories. Finally, we automatically build an appearance model of each human based on the color and texture of their clothing. The resulting 4D spatio-temporal model is then rendered using the OpenSG scene graph library, which seamlessly scales the visualization anywhere from a PDA screen to the tiled display wall of the CPU-GPU cluster.