TY - JOUR
T1 - An improved binary sparrow search algorithm for feature selection in data classification
AU - Gad, Ahmed G.
AU - Sallam, Karam M.
AU - Chakrabortty, Ripon K.
AU - Ryan, Michael J.
AU - Abohany, Amr A.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/9
Y1 - 2022/9
N2 - Feature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially high-dimensional data) by eliminating irrelevant and redundant features, thus reducing the potential curse of dimensionality of a given large dataset. Consequently, FS is arguably a combinatorial NP-hard problem in which the computational time increases exponentially with an increase in problem complexity. To tackle such a problem type, meta-heuristic techniques have been opted by an increasing number of scholars. Herein, a novel meta-heuristic algorithm, called Sparrow Search Algorithm (SSA), is presented. The SSA still performs poorly on exploratory behavior and exploration-exploitation trade-off because it does not duly stimulate the search within feasible regions, and the exploitation process suffers noticeable stagnation. Therefore, we improve SSA by adopting: i) a strategy for Random Re-positioning of Roaming Agents (3RA); and ii) a novel Local Search Algorithm (LSA), which are algorithmically incorporated into the original SSA structure. To the FS problem, SSA is improved and cloned as a binary variant, namely, the improved Binary SSA (iBSSA), which would strive to select the optimal or near-optimal features from a given dataset while keeping the classification accuracy maximized. For binary conversion, the iBSSA was primarily validated against nine common S-shaped and V-shaped Transfer Functions (TFs), thus producing nine iBSSA variants. To verify the robustness of these variants, three well-known classification techniques, including k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest (RF) were adopted as fitness evaluators with the proposed iBSSA approach and many other competing algorithms, on 18 multifaceted, multi-scale benchmark datasets from the University of California Irvine (UCI) data repository. Then, the overall best-performing iBSSA variant for each of the three classifiers was compared with binary variants of 12 different well-known meta-heuristic algorithms, including the original SSA (BSSA), Artificial Bee Colony (BABC), Particle Swarm Optimization (BPSO), Bat Algorithm (BBA), Grey Wolf Optimization (BGWO), Whale Optimization Algorithm (BWOA), Grasshopper Optimization Algorithm (BGOA) SailFish Optimizer (BSFO), Harris Hawks Optimization (BHHO), Bird Swarm Algorithm (BBSA), Atom Search Optimization (BASO), and Henry Gas Solubility Optimization (BHGSO). Based on a Wilcoxon’s non-parametric statistical test (α=0.05), the superiority of iBSSA with the three classifiers was very evident against counterparts across the vast majority of the selected datasets, achieving a feature size reduction of up to 92% along with up to 100% classification accuracy on some of those datasets.
AB - Feature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially high-dimensional data) by eliminating irrelevant and redundant features, thus reducing the potential curse of dimensionality of a given large dataset. Consequently, FS is arguably a combinatorial NP-hard problem in which the computational time increases exponentially with an increase in problem complexity. To tackle such a problem type, meta-heuristic techniques have been opted by an increasing number of scholars. Herein, a novel meta-heuristic algorithm, called Sparrow Search Algorithm (SSA), is presented. The SSA still performs poorly on exploratory behavior and exploration-exploitation trade-off because it does not duly stimulate the search within feasible regions, and the exploitation process suffers noticeable stagnation. Therefore, we improve SSA by adopting: i) a strategy for Random Re-positioning of Roaming Agents (3RA); and ii) a novel Local Search Algorithm (LSA), which are algorithmically incorporated into the original SSA structure. To the FS problem, SSA is improved and cloned as a binary variant, namely, the improved Binary SSA (iBSSA), which would strive to select the optimal or near-optimal features from a given dataset while keeping the classification accuracy maximized. For binary conversion, the iBSSA was primarily validated against nine common S-shaped and V-shaped Transfer Functions (TFs), thus producing nine iBSSA variants. To verify the robustness of these variants, three well-known classification techniques, including k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), and Random Forest (RF) were adopted as fitness evaluators with the proposed iBSSA approach and many other competing algorithms, on 18 multifaceted, multi-scale benchmark datasets from the University of California Irvine (UCI) data repository. Then, the overall best-performing iBSSA variant for each of the three classifiers was compared with binary variants of 12 different well-known meta-heuristic algorithms, including the original SSA (BSSA), Artificial Bee Colony (BABC), Particle Swarm Optimization (BPSO), Bat Algorithm (BBA), Grey Wolf Optimization (BGWO), Whale Optimization Algorithm (BWOA), Grasshopper Optimization Algorithm (BGOA) SailFish Optimizer (BSFO), Harris Hawks Optimization (BHHO), Bird Swarm Algorithm (BBSA), Atom Search Optimization (BASO), and Henry Gas Solubility Optimization (BHGSO). Based on a Wilcoxon’s non-parametric statistical test (α=0.05), the superiority of iBSSA with the three classifiers was very evident against counterparts across the vast majority of the selected datasets, achieving a feature size reduction of up to 92% along with up to 100% classification accuracy on some of those datasets.
KW - Binary optimization
KW - Classification
KW - Combinatorial optimization
KW - Data mining
KW - Feature selection
KW - Machine learning
KW - Meta-heuristics
KW - Pattern recognition
KW - Sparrow search algorithm
KW - Swarm-based optimization
UR - http://www.scopus.com/inward/record.url?scp=85128818804&partnerID=8YFLogxK
UR - https://doi.org/10.1007/s00521-022-07546-1
U2 - 10.1007/s00521-022-07203-7
DO - 10.1007/s00521-022-07203-7
M3 - Article
AN - SCOPUS:85128818804
SN - 0941-0643
VL - 34
SP - 15705
EP - 15752
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 18
ER -