Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers

Yu Xue* (Corresponding Author), Tao Tang, Wei Pang, Alex X Liu

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

Feature selection has been widely used in classification for improving classification accuracy and reducing computational complexity. Recently, evolutionary computation (EC) has become an important approach for solving feature selection problems. However, firstly, as the datasets processed by classifiers become increasingly large and complex, more and more irrelevant and redundant features may exist and there may be more local optima in the large-scale feature space. Therefore, traditional EC algorithms which have only one candidate solution generation strategy (CSGS) with fixed parameter values may not perform well in searching for the optimal feature subsets for large-scale feature selection problems. Secondly, many existing studies usually use only one classifier to evaluate feature subsets. To show the effectiveness of evolutionary algorithms for feature selection problems, more classifiers should be tested. Thus, in order to efficiently solve large-scale feature selection problems and to show whether the EC-based feature selection method is efficient for more classifiers, a self-adaptive parameter and strategy based particle swarm optimization (SPS-PSO) algorithm is proposed in this paper using multiple classifiers. In SPS-PSO, a representation scheme of solutions and five CSGSs have been used. To automatically adjust the CSGSs and their parameter values during the evolutionary process, a strategy self-adaptive mechanism and a parameter self-adaptive mechanism are employed in the framework of particle swarm optimization (PSO). By using the self-adaptive mechanisms, the SPS-PSO can adjust both CSGSs and their parameter values when solving different large-scale feature selection problems. Therefore, SPS-PSO has good global and local search ability when dealing with these large-scale problems. Moreover, four classifiers, i.e., k-nearest neighbor (KNN), linear discriminant analysis (LDA), extreme learning machine (ELM), and support vector machine (SVM), are individually used as the evaluation functions for testing the effectiveness of feature subsets generated by SPS-PSO. Nine datasets from the UCI Machine Learning Repository and Causality Workbench are used in the experiments. All the nine datasets have more than 600 dimensions, and two of them have more than 5,000 dimensions. The experimental results show that the strategy and parameter self-adaptive mechanisms can improve the performance of the evolutionary algorithms, and that SPS-PSO can achieve higher classification accuracy and obtain more concise solutions than those of the other algorithms on the large-scale feature problems selected in this research. In addition, feature selection can improve the classification accuracy and reduce computational time for various classifiers. Furthermore, KNN is a better surrogate model compared with the other classifiers used in these experiments.
Original languageEnglish
Article number106031
JournalApplied Soft Computing
Volume88
Early online date23 Dec 2019
DOIs
Publication statusPublished - Mar 2020

    Fingerprint

Keywords

  • particle swarm optimization
  • feature selection
  • large-scale problems
  • self-adaptive
  • classification

Cite this