Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

Aliyu Usman Ahmad, Andrew Starkey

Research output: Contribution to journalArticle

9 Downloads (Pure)

Abstract

The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.
Original languageEnglish
Pages (from-to)317-328
Number of pages12
JournalNeural Computing and Applications
Volume29
Issue number7
Early online date22 Apr 2017
DOIs
Publication statusPublished - Apr 2018

Fingerprint

Self organizing maps
Feature extraction
Cost functions
Learning systems
Statistical methods
Neural networks
Processing

Keywords

  • clustering
  • self-organising neural network map
  • feature selection
  • automation

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Application of feature selection methods for automated clustering analysis : a review on synthetic datasets. / Ahmad, Aliyu Usman; Starkey, Andrew.

In: Neural Computing and Applications, Vol. 29, No. 7, 04.2018, p. 317-328.

Research output: Contribution to journalArticle

@article{358ab5b9b1d246c2bb92c6ca292e7240,
title = "Application of feature selection methods for automated clustering analysis: a review on synthetic datasets",
abstract = "The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.",
keywords = "clustering, self-organising neural network map, feature selection, automation",
author = "Ahmad, {Aliyu Usman} and Andrew Starkey",
note = "Open via Springer Compact Agreement",
year = "2018",
month = "4",
doi = "10.1007/s00521-017-3005-9",
language = "English",
volume = "29",
pages = "317--328",
journal = "Neural Computing and Applications",
issn = "0941-0643",
publisher = "Springer London",
number = "7",

}

TY - JOUR

T1 - Application of feature selection methods for automated clustering analysis

T2 - a review on synthetic datasets

AU - Ahmad, Aliyu Usman

AU - Starkey, Andrew

N1 - Open via Springer Compact Agreement

PY - 2018/4

Y1 - 2018/4

N2 - The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.

AB - The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.

KW - clustering

KW - self-organising neural network map

KW - feature selection

KW - automation

U2 - 10.1007/s00521-017-3005-9

DO - 10.1007/s00521-017-3005-9

M3 - Article

VL - 29

SP - 317

EP - 328

JO - Neural Computing and Applications

JF - Neural Computing and Applications

SN - 0941-0643

IS - 7

ER -