Dimensionality reduction of Fisher vectors for human action recognition

Ramana ORUGANTI, Roland GOECKE

    Research output: Contribution to journalArticle

    Abstract

    Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼ 100-500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up higher-dimensional for larger databases). In this paper, we review and investigate feature projection to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. We hypothesize that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as fusion of high-dimensional features of different types; action recognition in untrimmed videos. We conduct all our experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.
    Original languageEnglish
    Pages (from-to)392-397
    Number of pages6
    JournalIET Computer Vision
    Volume10
    Issue number5
    DOIs
    Publication statusPublished - 2016

    Fingerprint

    Fusion reactions
    Experiments

    Cite this

    @article{486ca7df7d3145168f3f211089e8a104,
    title = "Dimensionality reduction of Fisher vectors for human action recognition",
    abstract = "Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼ 100-500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up higher-dimensional for larger databases). In this paper, we review and investigate feature projection to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. We hypothesize that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as fusion of high-dimensional features of different types; action recognition in untrimmed videos. We conduct all our experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.",
    keywords = "gesture recognition, video signal processing, Human Action Recognition",
    author = "Ramana ORUGANTI and Roland GOECKE",
    year = "2016",
    doi = "10.1049/iet-cvi.2015.0091",
    language = "English",
    volume = "10",
    pages = "392--397",
    journal = "IEE Proceedings: Vision, Image and Signal Processing",
    issn = "1350-245X",
    publisher = "Institution of Engineering and Technology",
    number = "5",

    }

    Dimensionality reduction of Fisher vectors for human action recognition. / ORUGANTI, Ramana; GOECKE, Roland.

    In: IET Computer Vision, Vol. 10, No. 5, 2016, p. 392-397.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Dimensionality reduction of Fisher vectors for human action recognition

    AU - ORUGANTI, Ramana

    AU - GOECKE, Roland

    PY - 2016

    Y1 - 2016

    N2 - Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼ 100-500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up higher-dimensional for larger databases). In this paper, we review and investigate feature projection to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. We hypothesize that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as fusion of high-dimensional features of different types; action recognition in untrimmed videos. We conduct all our experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.

    AB - Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼ 100-500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up higher-dimensional for larger databases). In this paper, we review and investigate feature projection to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. We hypothesize that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as fusion of high-dimensional features of different types; action recognition in untrimmed videos. We conduct all our experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.

    KW - gesture recognition

    KW - video signal processing

    KW - Human Action Recognition

    U2 - 10.1049/iet-cvi.2015.0091

    DO - 10.1049/iet-cvi.2015.0091

    M3 - Article

    VL - 10

    SP - 392

    EP - 397

    JO - IEE Proceedings: Vision, Image and Signal Processing

    JF - IEE Proceedings: Vision, Image and Signal Processing

    SN - 1350-245X

    IS - 5

    ER -