Visual Scene Interpretation as a Dialogue between Vision and Language

Title	Visual Scene Interpretation as a Dialogue between Vision and Language
Publication Type	Conference Papers
Year of Publication	2011
Authors	Yu X, Fermüller C, Aloimonos Y
Conference Name	Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence
Date Published	2011/08/24/
Abstract	We present a framework for semantic visual scene interpretation in a system with vision and language. In this framework the system consists of two modules, a language module and a vision module that communicate with each other in a form of a dialogue to actively interpret the scene. The language module is responsible for obtaining domain knowledge from linguistic resources and reasoning on the basis of this knowledge and the visual input. It iteratively creates questions that amount to an attention mechanism for the vision module which in turn shifts its focus to selected parts of the scene and applies selective segmentation and feature extraction. As a formalism for optimizing this dialogue we use information theory. We demonstrate the framework on the problem of recognizing a static scene from its objects and show preliminary results for the problem of human activity recognition from video. Experiments demonstrate the effectiveness of the active paradigm in introducing attention and additional constraints into the sensing process.
URL	http://www.aaai.org/ocs/index.php/WS/AAAIW11/paper/viewPaper/3989

Publications