Ordered trajectories for human action recognition with large number of classes

O Ramana-Murthy, Roland GOECKE

    Research output: Contribution to journalArticle

    11 Citations (Scopus)

    Abstract

    Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.
    Original languageEnglish
    Pages (from-to)22-34
    Number of pages13
    JournalImage and Vision Computing
    Volume42
    DOIs
    Publication statusPublished - 2015

    Fingerprint

    Trajectories
    Optical flows
    Flow fields
    Sampling

    Cite this

    @article{49ad8cad29564bd6aeb5f0c70098964a,
    title = "Ordered trajectories for human action recognition with large number of classes",
    abstract = "Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.",
    keywords = "Human Action Recognition, Ordered trajectories",
    author = "O Ramana-Murthy and Roland GOECKE",
    year = "2015",
    doi = "10.1016/j.imavis.2015.06.009",
    language = "English",
    volume = "42",
    pages = "22--34",
    journal = "Image and Vision Computing",
    issn = "0262-8856",
    publisher = "Elsevier Limited",

    }

    Ordered trajectories for human action recognition with large number of classes. / Ramana-Murthy, O; GOECKE, Roland.

    In: Image and Vision Computing, Vol. 42, 2015, p. 22-34.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Ordered trajectories for human action recognition with large number of classes

    AU - Ramana-Murthy, O

    AU - GOECKE, Roland

    PY - 2015

    Y1 - 2015

    N2 - Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.

    AB - Recently, a video representation based on dense trajectories has been shown to outperform other human action recognition methods on several benchmark datasets. The trajectories capture the motion characteristics of different moving objects in space and temporal dimensions. In dense trajectories, points are sampled at uniform intervals in space and time and then tracked using a dense optical flow field over a fixed length of L frames (optimally 15) spread overlapping over the entire video. However, among these base (dense) trajectories, a few may continue for longer than duration L, capturing motion characteristics of objects that may be more valuable than the information from the base trajectories. Thus, we propose a technique that searches for trajectories with a longer duration and refer to these as 'ordered trajectories'. Experimental results show that ordered trajectories perform much better than the base trajectories, both standalone and when combined. Moreover, the uniform sampling of dense trajectories does not discriminate objects of interest from the background or other objects. Consequently, a lot of information is accumulated, which actually may not be useful. This can especially escalate when there is more data due to an increase in the number of action classes. We observe that our proposed trajectories remove some background clutter, too. We use a Bag-of-Words framework to conduct experiments on the benchmark HMDB51, UCF50 and UCF101 datasets containing the largest number of action classes to date. Further, we also evaluate three state-of-the art feature encoding techniques to study their performance on a common platform.

    KW - Human Action Recognition

    KW - Ordered trajectories

    U2 - 10.1016/j.imavis.2015.06.009

    DO - 10.1016/j.imavis.2015.06.009

    M3 - Article

    VL - 42

    SP - 22

    EP - 34

    JO - Image and Vision Computing

    JF - Image and Vision Computing

    SN - 0262-8856

    ER -