Typically, Feature Selection (FS) is adopted as a critical preprocessing step in most pattern recognition and data mining tasks. It helps to avoid the acute impact of irrelevant and redundant features on the performance of the classification model under consideration. To tackle this problem, researchers have proposed different methods for selecting the most significant features to improve the overall classification accuracy for a given dataset by extracting relevant information. Realistically, for a given dataset with a large number of features, conventional methods usually struggle to find good solutions. Therefore, in this study, a meta-heuristic algorithm called the Wind Driven Optimization (WDO) is enhanced and then cloned into a binary variant, the improved Binary Adaptive WDO (iBAWDO). The proposed iBAWDO would manage to select the most relevant (near-optimal) features while reducing the computational cost and enhancing (or even maintaining) the final classification accuracy. An evolutionary crossover technique as well as the Simulated Annealing algorithm (SA) are incorporated into the original WDO algorithm to enhance its search ability for feasible regions as well as exploitation within these regions, respectively. To assess the relevance of the features selected, two popular classifiers, k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM), are adopted as fitness evaluators. The proposed iBAWDO algorithm was validated on 18 multi-scale benchmark datasets against binary versions of 11 well-known meta-heuristic approaches: Binary version of the original WDO (BWDO), Binary Particle Swarm Optimization (BPSO), Binary Bat Algorithm (BBA), Binary Grey Wolf Optimization (BGWO), Binary Whale Optimization Algorithm (BWOA), Binary Grasshopper Optimization Algorithm (BGOA), Binary Sailfish Optimizer (BSFO), Binary Harris Hawks optimization (BHHO), Binary Bird Swarm Algorithm (BBSA), Binary Atom Search Optimization (BASO), and Binary Henry Gas Solubility Optimization (BHGSO). A Wilcoxon's rank-sum non-parametric test was conducted at a 5% significance level to statistically affirm the high competitiveness of the proposed method. Overall, the experimental results revealed that the proposed method is significantly effective on both small- and large-dimensional datasets.