Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing

TitleMicrophone Arrays as Generalized Cameras for Integrated Audio Visual Processing
Publication TypeConference Papers
Year of Publication2007
AuthorsO'Donovan A, Duraiswami R, Neumann J
Conference NameComputer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
Date Published2007/06//
Keywordsalgorithms;generalized, arrays;scene, arrays;sensor, audio, audio-visual, cameras;geometrical, fusion;, fusion;computer-vision, geometry;sound, information, information;geometry, inspired, location;acoustic, processing;array, processing;audio-visual, processing;microphone, sensors;integrated, signal, sources;source, systems;cameras;computer, vision;geometry;microphone, visual

Combinations of microphones and cameras allow the joint audio visual sensing of a scene. Such arrangements of sensors are common in biological organisms and in applications such as meeting recording and surveillance where both modalities are necessary to provide scene understanding. Microphone arrays provide geometrical information on the source location, and allow the sound sources in the scene to be separated and the noise suppressed, while cameras allow the scene geometry and the location and motion of people and other objects to be estimated. In most previous work the fusion of the audio-visual information occurs at a relatively late stage. In contrast, we take the viewpoint that both cameras and microphone arrays are geometry sensors, and treat the microphone arrays as generalized cameras. We employ computer-vision inspired algorithms to treat the combined system of arrays and cameras. In particular, we consider the geometry introduced by a general microphone array and spherical microphone arrays. The latter show a geometry that is very close to central projection cameras, and we show how standard vision based calibration algorithms can be profitably applied to them. Experiments are presented that demonstrate the usefulness of the considered approach.