TY - CONF T1 - A large-scale benchmark dataset for event recognition in surveillance video T2 - Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Y1 - 2011 A1 - Oh,Sangmin A1 - Hoogs, A. A1 - Perera,A. A1 - Cuntoor, N. A1 - Chen,Chia-Chih A1 - Lee,Jong Taek A1 - Mukherjee,S. A1 - Aggarwal, JK A1 - Lee,Hyungtae A1 - Davis, Larry S. A1 - Swears,E. A1 - Wang,Xioyang A1 - Ji,Qiang A1 - Reddy,K. A1 - Shah,M. A1 - Vondrick,C. A1 - Pirsiavash,H. A1 - Ramanan,D. A1 - Yuen,J. A1 - Torralba,A. A1 - Song,Bi A1 - Fong,A. A1 - Roy-Chowdhury, A. A1 - Desai,M. KW - algorithm;evaluation KW - CVER KW - databases; KW - databases;video KW - dataset;moving KW - event KW - metrics;large-scale KW - object KW - recognition KW - recognition;diverse KW - recognition;video KW - scenes;surveillance KW - surveillance;visual KW - tasks;computer KW - tracks;outdoor KW - video KW - video;computer KW - vision;continuous KW - vision;image KW - visual AB - We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead. JA - Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on M3 - 10.1109/CVPR.2011.5995586 ER - TY - CONF T1 - Objects in Action: An Approach for Combining Action Understanding and Object Perception T2 - Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on Y1 - 2007 A1 - Gupta,A. A1 - Davis, Larry S. KW - analysis;image KW - approach;action KW - Bayesian KW - classification;image KW - classification;object KW - framework;Bayes KW - interactions;inference KW - interpretation KW - localization;object KW - methods;gesture KW - MOTION KW - movements;human KW - perception; KW - perception;human-object KW - perception;object KW - process;object KW - processing;visual KW - recognition;image KW - recognition;video KW - segmentation;action KW - segmentation;object KW - signal KW - understanding;human AB - Analysis of videos of human-object interactions involves understanding human movements, locating and recognizing objects and observing the effects of human movements on those objects. While each of these can be conducted independently, recognition improves when interactions between these elements are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which unifies the inference processes involved in object classification and localization, action understanding and perception of object reaction. Traditional approaches for object classification and action understanding have relied on shape features and movement analysis respectively. By placing object classification and localization in a video interpretation framework, we can localize and classify objects which are either hard to localize due to clutter or hard to recognize due to lack of discriminative features. Similarly, by applying context on human movements from the objects on which these movements impinge and the effects of these movements, we can segment and recognize actions which are either too subtle to perceive or too hard to recognize using motion features alone. JA - Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on M3 - 10.1109/CVPR.2007.383331 ER -