TY - JOUR T1 - Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees JF - Pattern Analysis and Machine Intelligence, IEEE Transactions on Y1 - 2012 A1 - Zhuolin Jiang A1 - Zhe Lin A1 - Davis, Larry S. KW - action prototype KW - actor location KW - brute-force computation KW - CMU action data set KW - distance measures KW - dynamic backgrounds KW - dynamic prototype sequence matching KW - flexible action matching KW - frame-to-frame distances KW - frame-to-prototype correspondence KW - hierarchical k-means clustering KW - human action recognition KW - Image matching KW - image recognition KW - Image sequences KW - joint probability model KW - joint shape KW - KTH action data set KW - large gesture data set KW - learning KW - learning (artificial intelligence) KW - look-up table indexing KW - motion space KW - moving cameras KW - pattern clustering KW - prototype-to-prototype distances KW - shape-motion prototype-based approach KW - table lookup KW - training sequence KW - UCF sports data set KW - Video sequences KW - video signal processing KW - Weizmann action data set AB - A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set. VL - 34 SN - 0162-8828 CP - 3 M3 - 10.1109/TPAMI.2011.147 ER -