AbstractDeveloping computational algorithms to model the biological vision system has challenged researchers
in the computer vision field for several decades. As a result, state-of-the-art Deep Learning (DL) algorithms
such as the Convolutional Neural Network (CNN) have emerged for image classification and
recognition tasks with promising results. CNNs, however, remain view-specific, producing good results
when the variation between test and train data is small. Making CNNs learn invariant features to effectively
recognise objects that undergo appearance changes as a result of transformations such as scaling
remains a technical challenge. Recent bio-inspired studies of the visual system are suggesting three
new paradigms. Firstly, our visual system uses both local features and global features in its recognition
function. Secondly, cells tuned to detecting global features respond to visual stimuli prior to cells tuned
on local features leading to quicker response times in recognising objects. Thirdly, information from
modalities that handle local features, global features and color are integrated in the brain for performing
recognition tasks. While CNNs rely on an aggregation of local features into global features for recognition,
these research outcomes motivate global feature extraction and with established local features to
improve the efficiency and CNN model application to solve transformation invariance problems.
The main goals of the current research include an investigation and development of relevant models
for classification of scaled images using both local and global features with CNNs. To improve the
performance of the current CNN model towards classification of scaled images, this work has performed
investigations on different techniques: (i) exploration of (global) high-level, low-resolution CNN featuremap
augmentation, (ii) examination of fusion of CNN features with global features from non-trainable
global feature descriptors, (iii) color histogram as global features, (iii) examination of fusion of CNN
features with spatial features using large kernels in a multi-scale filter pyramid setting, (v) examination
of brain-inspired distributed multi-modal information extraction and integration model, and (vi) development
of a zoom-in convolution algorithm.
For improving classification of scaled images, this thesis has proposed two specific techniques. The
first technique exploits the automatic feature extraction in CNN convolution layers and proposes augmentation
of (global) high-level low-resolution feature maps as a cheap and effective way to improve
classification of scaled images. The second technique proposes an architecture supported by physiological
evidence that allows multi-modal information extraction and fusion of DL models for combining global features and CNN local features. This architecture allows parallel extraction and processing of
CNN and global features from input image data. To extract global image features, both non-trainable and
trainable feature extraction methods are investigated. Global feature descriptors - Histogram of Gradients
(HOG) and color information - are used as non-trainable methods. A technique using multi-scale filter
banks containing large kernels are used as trainable method to cover more spatial areas of the image. The
idea of using large kernels and multi-scale filter banks is extended to develop a new lightweight zoom-in
convolution technique that allows the model capture more spatial areas in relation to the center of the
image, assuming the object of interest is generally centered in the middle of the image. This technique
called DeepZoom inspects multi-scale slices of an image beginning with a set of center pixels and progressively
extending the area of each slice until the final slice covers the entire image. To fuse global,
local and color features, a simple feature map concatenation technique is compared with a brain-inspired
distribution information integration model. Four datasets consisting of different sized images in each are
used to validate the models.
Experiments on a) (global) high-level low-resolution feature map augmentation, b) fusion of CNN local
features with global features from various non-trainable global feature descriptors methods, c) fusion
of CNN local features with spatial features from using large kernels, and d) adjusting the convolution
technique in DL models, have shown the developed models compared to CNN only based models i)
obtained comparatively similar if not better training test accuracies and ii) obtained higher classification
accuracies for scaled test images. Whilst global feature extraction or manipulation methods differed,
in general the results are promising for classification of scaled images. In all the cases, the developed
models are evaluated against established benchmark results from benchmark CNNs.
Finally, this thesis presents skin cancer classification as an application where handling scale is important.
It shows application of developed DL models on detection of skin cancer using skin lesion images
on mobile phones. By investigating the different models, a suitable DL model has been presented for
classification of skin lesion images in real time and provides an implementation on mobile devices as an
early warning diagnosis tool for skin cancer.
The thesis concludes with a summary of research outcomes against each identified research question.
Several questions emanating from the thesis research are also identified to extend the research presented
as future work.
|Date of Award||2020|
|Supervisor||Dharmendra Sharma AM PhD (Supervisor), Dat Tran (Supervisor) & Roland Goecke (Supervisor)|