Recognizing Actions by Shape-Motion Prototype Trees

Zhe Lin,   Zhuolin Jiang,  Larry S. Davis

Abstract:

A prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, first, an action prototype tree is learned in a joint shape and motion space via hierarchical k-means clustering; then a lookup table of prototype-to-prototype distances is generated. During testing, based on a joint likelihood model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint likelihood, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance matrices used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in very challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 91.07% on a large gesture dataset (with dynamic backgrounds), 100% on the Weizmann action dataset and 95.77% on the KTH action dataset.

The Keck gesture dataset used for the ICCV 2009 paper is only available for noncommercial reserach use.

Dataset Description:

The gesture dataset consisting of 14 different gesture classes, which are a subset of military signals. The following figure shows sample training frames of this dataset. The gesture dataset is collected using a color camera with 640 × 480 resolution. Each of the 14 gestures is performed by three people. In each sequence, the same gesture is repeated three times by each person. Hence there are 3 × 3 × 14 = 126 video sequences for training which are captured using a fixed camera with the person viewed against a simple, static background. There are 4 × 3 × 14 = 168 video sequences for testing which are captured from a moving camera and in the presence of background clutter and other moving objects.

Gesture Classes:

Demo Video:

Dataset Downloads:

All sequences are stored using AVI file format (MPEG V3-compressed version is available on-line). Uncompressed version is available on demand. There are 42 training video files and 56 testing video files. Each file contains about three subsequences used as a sequence in our experiments. The subdivision of each file into sequences in terms of start-frame and end-frame is given in:

If you happen to use the dataset or other files provided by this webpage, please cite one of the following papers:

  • Zhe Lin, Zhuolin Jiang, and Larry S. Davis, "Recognizing Actions by Shape-Motion Prototype Trees, " IEEE 12th International Conference on Computer Vision (ICCV), pp.444-451, 2009. [pdf][slide]
  • Zhuolin Jiang, Zhe Lin, and Larry S. Davis, "Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees". IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 533-547. [pdf

If you have any inquires or questions about this dataset, please contact:

Latest update 12-06-2011