There has been a
significant interest in the computer vision community on utilizing visual
and contextual models for high level semantic reasoning. There are many
weakly annotated images and videos available on the internet, along with
other rich sources of information such as dictionaries, which can be used to
learn visual and contextual models for recognition. The goal of this
workshop is to investigate how linguistic information available in the form
of captions and other sources can be used to aid in visual and contextual
learning. For example, captions can
help train object detectors; adjectives in captions could be used to
train material recognizers; or written descriptions of objects
could be used to train object recognizers.
This workshop aims to bring together researchers in the fields of contextual modeling in computer vision, machine learning and natural language processing to explore a variety of perspectives on how these datasets can be employed to learn visual appearance and contextual models for recognition. Recent progress in machine learning on scalable learning and modeling of uncertainty in large-scale annotated data has also stimulated the use of data available on websites where annotations are obtained in more uncontrolled environments.
The workshop program will consist of spotlights, posters, invited talks and discussion panels. The list of possible topics includes (but is not limited to) the following:
|Webmaster: Abhinav Gupta|