A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification

Munawar Hayat, Salman Khan, Mohammed Bennamoun, Senjian An

    Research output: Contribution to journalArticle

    16 Citations (Scopus)
    2 Downloads (Pure)

    Abstract

    Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called “spatial layout and scale invariant convolutional activations” to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel “spatially unstructured” layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets
    Original languageEnglish
    Pages (from-to)1-15
    Number of pages15
    JournalIEEE Transactions on Image Processing
    Volume1
    Issue number99
    DOIs
    Publication statusPublished - 2016

    Fingerprint

    Invariance
    Sports
    Network architecture
    Chemical activation
    Neural networks
    Experiments

    Cite this

    Hayat, Munawar ; Khan, Salman ; Bennamoun, Mohammed ; An, Senjian. / A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification. In: IEEE Transactions on Image Processing. 2016 ; Vol. 1, No. 99. pp. 1-15.
    @article{8ac1122f9f094b62aafb89d5870e7a43,
    title = "A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification",
    abstract = "Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called “spatial layout and scale invariant convolutional activations” to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel “spatially unstructured” layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets",
    keywords = "indoor-scene-classification, convolution-neural-networks, deep-learning",
    author = "Munawar Hayat and Salman Khan and Mohammed Bennamoun and Senjian An",
    year = "2016",
    doi = "10.1109/tip.2016.2599292",
    language = "English",
    volume = "1",
    pages = "1--15",
    journal = "IEEE Transactions on Image Processing",
    issn = "1057-7149",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",
    number = "99",

    }

    A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification. / Hayat, Munawar; Khan, Salman; Bennamoun, Mohammed; An, Senjian.

    In: IEEE Transactions on Image Processing, Vol. 1, No. 99, 2016, p. 1-15.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification

    AU - Hayat, Munawar

    AU - Khan, Salman

    AU - Bennamoun, Mohammed

    AU - An, Senjian

    PY - 2016

    Y1 - 2016

    N2 - Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called “spatial layout and scale invariant convolutional activations” to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel “spatially unstructured” layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets

    AB - Unlike standard object classification, where the image to be classified contains one or multiple instances of the same object, indoor scene classification is quite different since the image consists of multiple distinct objects. Furthermore, these objects can be of varying sizes and are present across numerous spatial locations in different layouts. For automatic indoor scene categorization, large-scale spatial layout deformations and scale variations are therefore two major challenges and the design of rich feature descriptors which are robust to these challenges is still an open problem. This paper introduces a new learnable feature descriptor called “spatial layout and scale invariant convolutional activations” to deal with these challenges. For this purpose, a new convolutional neural network architecture is designed which incorporates a novel “spatially unstructured” layer to introduce robustness against spatial layout deformations. To achieve scale invariance, we present a pyramidal image representation. For feasible training of the proposed network for images of indoor scenes, this paper proposes a methodology, which efficiently adapts a trained network model (on a large-scale data) for our task with only a limited amount of available training data. The efficacy of the proposed approach is demonstrated through extensive experiments on a number of data sets, including MIT-67, Scene-15, Sports-8, Graz-02, and NYU data sets

    KW - indoor-scene-classification

    KW - convolution-neural-networks

    KW - deep-learning

    U2 - 10.1109/tip.2016.2599292

    DO - 10.1109/tip.2016.2599292

    M3 - Article

    VL - 1

    SP - 1

    EP - 15

    JO - IEEE Transactions on Image Processing

    JF - IEEE Transactions on Image Processing

    SN - 1057-7149

    IS - 99

    ER -