TY - GEN
T1 - Multi-modal Information Extraction and Fusion with Convolutional Neural Networks
AU - Kumar, Dinesh
AU - Sharma, Dharmendra
PY - 2020/9/30
Y1 - 2020/9/30
N2 - Developing computational algorithms to model the biological vision system has challenged researchers in the computer vision field for several decades. As a result, state-of-the-art algorithms such as the Convolutional Neural Network (CNN) have emerged for image classification and recognition tasks with promising results. CNNs however remain view-specific, producing good results when the variation between test and train data is small. Making CNNs learn invariant features to effectively recognise objects that undergo appearance changes as a result of transformations such as scaling remains a technical challenge. Recent physiological studies of the visual system are suggesting new paradigms. Firstly, our visual system uses both local features and global features in its recognition function. Secondly, cells tuned to global features respond quickly to visual stimuli for recognising objects. Thirdly, information from modalities that handle local features, global features and color are integrated in the brain for performing recognition tasks. While CNNs rely on aggregation of local features for recognition, these theories provide the potential for using global features to solve transformation invariance problems in CNNs. In this paper we realise these paradigms into a computational model, named as global features improved CNN (GCNN), and test it on classification of scaled images. We experiment combining Histogram of Gradients (HOG) global features, CNN local features and color information and test our technique on benchmark data sets. Our results show GCNN outperforms traditional CNN on classification of scaled images indicating potential effectiveness of our model towards improving scale-invariance in CNN based networks.
AB - Developing computational algorithms to model the biological vision system has challenged researchers in the computer vision field for several decades. As a result, state-of-the-art algorithms such as the Convolutional Neural Network (CNN) have emerged for image classification and recognition tasks with promising results. CNNs however remain view-specific, producing good results when the variation between test and train data is small. Making CNNs learn invariant features to effectively recognise objects that undergo appearance changes as a result of transformations such as scaling remains a technical challenge. Recent physiological studies of the visual system are suggesting new paradigms. Firstly, our visual system uses both local features and global features in its recognition function. Secondly, cells tuned to global features respond quickly to visual stimuli for recognising objects. Thirdly, information from modalities that handle local features, global features and color are integrated in the brain for performing recognition tasks. While CNNs rely on aggregation of local features for recognition, these theories provide the potential for using global features to solve transformation invariance problems in CNNs. In this paper we realise these paradigms into a computational model, named as global features improved CNN (GCNN), and test it on classification of scaled images. We experiment combining Histogram of Gradients (HOG) global features, CNN local features and color information and test our technique on benchmark data sets. Our results show GCNN outperforms traditional CNN on classification of scaled images indicating potential effectiveness of our model towards improving scale-invariance in CNN based networks.
KW - CNN
KW - scale invariance
KW - invariant features
KW - global features
KW - local features
KW - histogram of gradients
KW - color histogram
KW - convolutional neural network
UR - http://www.scopus.com/inward/record.url?scp=85093837245&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/db7116a7-130d-307f-977f-a50bad09d343/
UR - https://wcci2020.org/
U2 - 10.1109/IJCNN48605.2020.9206803
DO - 10.1109/IJCNN48605.2020.9206803
M3 - Conference contribution
SN - 9781728169262
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 1
EP - 9
BT - Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN)
A2 - Roy, Anish
PB - IEEE, Institute of Electrical and Electronics Engineers
CY - United States
T2 - 2020 International Joint Conference on Neural Networks (IJCNN)
Y2 - 19 July 2020 through 24 July 2020
ER -