FCFilter: Feature selection based on clustering and genetic algorithms

Charles Ferreira, Deborah de Medeiros, Fabiana SANTANA

Research output: A Conference proceeding or a Chapter in BookConference contributionpeer-review

4 Citations (Scopus)

Abstract

The search for patterns in big amounts of textual data, or text mining, can be at once rewarding and challenging. The patterns can reveal tendencies, similarities and predictions, but the information is usually implicit and difficult to be validated. Classification is one of the most relevant research areas in text mining, and it usually consists of predicting the class of a textual document based on a set of documents previously organized into different classes, such as author or topic. Choosing the words to compose the feature set is crucial to a proper classification. A well selected feature set can improve the performance of the classification method and enlighten the interpretation of the classification model adjusted to the data. This paper introduces the Feature Cluster Filter (FCFilter) method for feature selection. FCFilter eliminates the need to input or optimize the number of clusters by grouping the words in a sufficiently high number of clusters. Genetic algorithms are applied to optimize the combination of groups that will provide the final feature set. The method is based on the selection of features that are good predictors for text classification by clustering features and selecting only the suitable clusters. Experiments performed to evaluate the FCFilter with the Reuters-21578, SCY-Genes and SCY-Clusters datasets showed a significant reduction in the feature-value table dimensionality with slight improvements in the classification accuracy when compared to the baselines. The results are very promising, indicating potential improvements in the research on feature selection for text mining
Original languageEnglish
Title of host publication2016 IEEE Congress on Evolutionary Computation (CEC)
EditorsYew Soon Ong
Place of PublicationVancouver, Canada
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages2106-2113
Number of pages8
ISBN (Electronic)9781509006229
ISBN (Print)9781509006236
DOIs
Publication statusPublished - 2016
Event2016 IEEE Congress on Evolutionary Computation (CEC) - Vancouver, Vancouver, Canada
Duration: 24 Jul 201629 Jul 2016

Publication series

Name2016 IEEE Congress on Evolutionary Computation, CEC 2016

Conference

Conference2016 IEEE Congress on Evolutionary Computation (CEC)
Abbreviated titleCEC 2016
Country/TerritoryCanada
CityVancouver
Period24/07/1629/07/16

Fingerprint

Dive into the research topics of 'FCFilter: Feature selection based on clustering and genetic algorithms'. Together they form a unique fingerprint.

Cite this