Global-first Training Strategy with Convolutional Neural Networks to Improve Scale Invariance

Dinesh Kumar, Dharmendra Sharma

Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

Abstract

Modelled closely on the feedforward conical structure of the primate vision system - Convolutional Neural Networks (CNNs) learn by adopting a local to global feature extraction strategy. This makes them view-specific models and results in poor invariance encoding within its learnt weights to adequately identify objects whose appearance is altered by various transformations such as rotations, translations, and scale. Recent physiological studies reveal the visual system first views the scene globally for subsequent processing in its ventral stream leading to a global-first response strategy in its recognition function. Conventional CNNs generally use small filters, thus losing the global view of the image. A trainable module proposed by Kumar & Sharma [24] called Stacked Filters Convolution (SFC) models this approach by using a pyramid of large multi-scale filters to extract features from wider areas of the image, which is then trained by a normal CNN. The end-to-end model is referred to as Stacked Filter CNN (SFCNN). In addition to improved test results, SFCNN showed promising results on scale invariance classification. The experiments, however, were performed on small resolution datasets and small CNN as backbone. In this paper, we extend this work and test SFC integrated with the VGG16 network on larger resolution datasets for scale invariance classification. Our results confirm the integration of SFC, and standard CNN also shows promising results on scale invariance on large resolution datasets.
Original languageEnglish
Title of host publicationComputer Vision, Imaging and Computer Graphics Theory and Applications - 16th International Joint Conference, VISIGRAPP 2021, Revised Selected Papers
EditorsA. Augusto de Sousa, Vlastimil Havran, Alexis Paljic, Tabitha Peck, Christophe Hurter, Helen Purchase, Helen Purchase, Giovanni Maria Farinella, Petia Radeva, Kadi Bouatouch
PublisherSpringer
Pages259-278
Number of pages20
Edition1
ISBN (Electronic)9783031254772
ISBN (Print)9783031254765
DOIs
Publication statusPublished - 2023
Event16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021 - Virtual, Online
Duration: 8 Feb 202110 Feb 2021

Publication series

NameCommunications in Computer and Information Science
Volume1691 CCIS

Conference

Conference16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021
CityVirtual, Online
Period8/02/2110/02/21

Fingerprint

Dive into the research topics of 'Global-first Training Strategy with Convolutional Neural Networks to Improve Scale Invariance'. Together they form a unique fingerprint.

Cite this