Sample Application: Vision guided beamforming.

HOME

We now used the epipolar geometry between a spherical microphone array and a camera in a meeting room scenario. The microphone array was used to detect the direction of sound sources in the scene, in this case the speaker in the room. We can now employ a simple face detector along the vicinity of the epipolar line to located the exact position of the speaker in the image. In our system we use a face detector based on Haar wavelets as implemented in OpenCV [1].

In The following video we present an example in which an extremely loud music interference was played from a location to the left of the subject, and below him, after the face was initially detected as above. Once the face rectangle was extracted, a template match was used to detect the mouth region. The epipolar line from the image passing through this region was then constructed on the sound-field image. The right panel of the image shows the sound field image generated, where the distracter can be seen to be extremely bright compared to the source. The location corresponding to the mouth was passed to the beamforming algorithms, and the sound from this location was extracted. The sound achieved by this process is also attached. A further refinement of the algorithm could be to throw an explicit null at the location of the other source [2], though we have not done this yet.

[1]  R. Lienhart, L. Liang, and A. Kuranov. “A detector tree of boosted classifiers for real-time object detection and tracking,” Proceedings IEEE ICME, 2:277– 280, 2003.

[2] B.D. Van Veen and K.M. Buckley. “Beamforming: a versatile approach to spatial filtering,” IEEE Signal Processing Magazine, 5:4-24, 1988.

HOME