TY - GEN
T1 - Global-first Training Strategy with Convolutional Neural Networks to Improve Scale Invariance
AU - Kumar, Dinesh
AU - Sharma, Dharmendra
N1 - Kumar, D., Sharma, D. (2023). Global-first Training Strategy with Convolutional Neural Networks to Improve Scale Invariance. In: , et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2021. Communications in Computer and Information Science, vol 1691. Springer, Cham. https://doi.org/10.1007/978-3-031-25477-2_12
Publisher Copyright:
© 2023, Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Modelled closely on the feedforward conical structure of the primate vision system - Convolutional Neural Networks (CNNs) learn by adopting a local to global feature extraction strategy. This makes them view-specific models and results in poor invariance encoding within its learnt weights to adequately identify objects whose appearance is altered by various transformations such as rotations, translations, and scale. Recent physiological studies reveal the visual system first views the scene globally for subsequent processing in its ventral stream leading to a global-first response strategy in its recognition function. Conventional CNNs generally use small filters, thus losing the global view of the image. A trainable module proposed by Kumar & Sharma [24] called Stacked Filters Convolution (SFC) models this approach by using a pyramid of large multi-scale filters to extract features from wider areas of the image, which is then trained by a normal CNN. The end-to-end model is referred to as Stacked Filter CNN (SFCNN). In addition to improved test results, SFCNN showed promising results on scale invariance classification. The experiments, however, were performed on small resolution datasets and small CNN as backbone. In this paper, we extend this work and test SFC integrated with the VGG16 network on larger resolution datasets for scale invariance classification. Our results confirm the integration of SFC, and standard CNN also shows promising results on scale invariance on large resolution datasets.
AB - Modelled closely on the feedforward conical structure of the primate vision system - Convolutional Neural Networks (CNNs) learn by adopting a local to global feature extraction strategy. This makes them view-specific models and results in poor invariance encoding within its learnt weights to adequately identify objects whose appearance is altered by various transformations such as rotations, translations, and scale. Recent physiological studies reveal the visual system first views the scene globally for subsequent processing in its ventral stream leading to a global-first response strategy in its recognition function. Conventional CNNs generally use small filters, thus losing the global view of the image. A trainable module proposed by Kumar & Sharma [24] called Stacked Filters Convolution (SFC) models this approach by using a pyramid of large multi-scale filters to extract features from wider areas of the image, which is then trained by a normal CNN. The end-to-end model is referred to as Stacked Filter CNN (SFCNN). In addition to improved test results, SFCNN showed promising results on scale invariance classification. The experiments, however, were performed on small resolution datasets and small CNN as backbone. In this paper, we extend this work and test SFC integrated with the VGG16 network on larger resolution datasets for scale invariance classification. Our results confirm the integration of SFC, and standard CNN also shows promising results on scale invariance on large resolution datasets.
KW - CNN
KW - Feature map
KW - Filter Pyramid
KW - Global feature
KW - Scale invariance
KW - Visual system
KW - Convolutional neural network
KW - Filter pyramid
UR - http://www.scopus.com/inward/record.url?scp=85149671975&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-25477-2_12
DO - 10.1007/978-3-031-25477-2_12
M3 - Conference contribution
SN - 9783031254765
T3 - Communications in Computer and Information Science
SP - 259
EP - 278
BT - Computer Vision, Imaging and Computer Graphics Theory and Applications - 16th International Joint Conference, VISIGRAPP 2021, Revised Selected Papers
A2 - de Sousa, A. Augusto
A2 - Havran, Vlastimil
A2 - Paljic, Alexis
A2 - Peck, Tabitha
A2 - Hurter, Christophe
A2 - Purchase, Helen
A2 - Purchase, Helen
A2 - Farinella, Giovanni Maria
A2 - Radeva, Petia
A2 - Bouatouch, Kadi
PB - Springer
T2 - 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021
Y2 - 8 February 2021 through 10 February 2021
ER -