This paper proposes improvements to the binary grey-wolf optimizer (BGWO) to solve the feature selection (FS) problem associated with high data dimensionality, irrelevant, noisy, and redundant data that will then allow machine learning algorithms to attain better classification/clustering accuracy in less training time. We propose three variants of BGWO in addition to the standard variant, applying different transfer functions to tackle the FS problem. Because BGWO generates continuous values and FS needs discrete values, a number of V-shaped, S-shaped, and U-shaped transfer functions were investigated for incorporation with BGWO to convert their continuous values to binary. After investigation, we note that the performance of BGWO is affected by the selection of the transfer function. Then, in the first variant, we look to reduce the local minima problem by integrating an exploration capability to update the position of the grey wolf randomly within the search space with a certain probability; this variant was abbreviated as IBGWO. Consequently, a novel mutation strategy is proposed to select a number of the worst grey wolves in the population which are updated toward the best solution and randomly within the search space based on a certain probability to determine if the update is either toward the best or randomly. The number of the worst grey wolf selected by this strategy is linearly increased with the iteration. Finally, this strategy is combined with IBGWO to produce the second variant of BGWO that was abbreviated as LIBGWO. In the last variant, simulated annealing (SA) was integrated with LIBGWO to search around the best-so-far solution at the end of each iteration in order to identify better solutions. The performance of the proposed variants was validated on 32 datasets taken from the UCI repository and compared with six wrapper feature selection methods. The experiments show the superiority of the proposed improved variants in producing better classification accuracy than the other selected wrapper feature selection algorithms.