Military significance: Visual and Acoustic Surveillance and Monitoring, University of Maryland


Our vision of battlefield surveillance systems involves a distributed suite of heterogenous and relocatable sensors monitoring a large geographic area, in the context of a site model, for the entrances, exits, and activities of people and vehicles. Scarce human operators must monitor the outputs of the surveillance system, under constraints of limited bandwidth and possibly severe psychological pressure.

We envision a surveillance system for monitoring the urban battlefield, where the movements and actions of even a small number of individuals and equipment can lead to a great loss of life, and in which one must rely on incomplete and qualitative site modeling to control and focus perception systems.

In order for the surveillance system to monitor an area of any significant geographic extent, it must employ a suite of sensor platforms which, generally, must be relocatable to bring more surveillance power to bear on potentially interesting situations or simply to provide adequate coverage of a large surveillance site. However, to control the costs and complexity of the surveillance system, the number of sensor platforms must be limited. This suggests that the surveillance system should employ multiple levels of analysis of the area. Our research considers two levels - a coarse level in which a significant portion of the area is monitored at low resolution, and a fine level in which a much smaller area is monitored at much higher resolution.

Decades of research on vision systems have demonstrated that if they are to be employed in visually complex scenes, then they must be armed with prior knowledge about the structures in those scenes. In RADIUS this was reflected in the construction and employment of geometrically precise site models, which delineated and described the 3-D geometry of buildings, road networks, etc. In ground-based surveillance of urban areas, both time to deployment and the dynamic nature of the area dictate that these surveillance systems utilize less precise models of the environments in which they must operate. We envision a surveillance system in which site models contain information about building footprints, entrances, and rooflines, and additionally include information about roads, park areas, etc. The models will be used to control the visual and acoustic focus of attention and to provide a spatial context for interpretation.

Finally, a battlefield surveillance system must be able to reasonably infer what people and vehicles are doing from its sensor data analysis. This involves not only the detection and tracking of people and vehicles, but also some way to represent and interpret their individual and collective activities. This is clearly an enormous problem given the wide range of activities that people engage in. Our research focuses on several important classes of actions involving people, vehicles and buildings---entering, leaving, carrying, exchanging and acting suspiciously.