TY - JOUR
T1 - Ordered trajectories for human action recognition with large number of classes
AU - Ramana-Murthy, O
AU - GOECKE, Roland
N1 - Publisher Copyright:
© 2015 Elsevier B.V. All rights reserved.
PY - 2015/10/26
Y1 - 2015/10/26
N2 - Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.
AB - Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.
KW - Human Action Recognition
KW - Ordered trajectories
KW - Action recognition
KW - Dense trajectories
KW - Large scale classification
KW - Bag-of-Words
KW - Fisher vector
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=84940187631&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2015.06.009
DO - 10.1016/j.imavis.2015.06.009
M3 - Article
SN - 0262-8856
VL - 42
SP - 22
EP - 34
JO - Image and Vision Computing
JF - Image and Vision Computing
ER -