A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance

Jinchao Ji; Wei Pang; Yanlin Zheng; Zhe Wang; Zhiqiang Ma; Libiao Zhang

doi:10.12785/amis/090621

A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance

Jinchao Ji, Wei Pang, Yanlin Zheng, Zhe Wang, Zhiqiang Ma, Libiao Zhang

Research output: Contribution to journal › Article › peer-review

19 Citations (Scopus)

Abstract

The k-prototypes algorithms are well known for their efficiency to cluster mixed numeric and categorical data. In kprototypes type algorithms the initial cluster centers are often determined in a random manner. It is acknowledged that the initial placement of cluster centers has a direct impact on the performance of the k-prototypes algorithms. However, most of the existing initialization approaches are designed for the k-means or k-modes algorithms, which can only deal with either pure numeric or categorical data, but not the mixture of both. In this paper, we propose a novel cluster center initialization method for the k-prototypes algorithms to address this issue. In the proposed method, the centrality of data objects is introduced based on the concept of neighborset, and then both the centrality and distance are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments in comparison with that of traditional random initialization method.

Original language	English
Pages (from-to)	2933-2942
Number of pages	10
Journal	Applied Mathematics & Information Sciences
Volume	9
Issue number	6
Early online date	1 Nov 2015
DOIs	https://doi.org/10.12785/amis/090621
Publication status	Published - 2015

Bibliographical note

Acknowledgements
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. (21127010, 61202309), China Postdoctoral Science Foundation under Grant No. 2013M530956, Science and Technology Development Plan of Jilin province under Grant No. 20140520068JH, Fundamental Research Funds for the Central Universities under No. 14QNJJ028, the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University under Grant No. 93K172014K07, the 2014 Industrial Technology Research and Development Special Project of Jilin Province, the 2015 Department of Education 12th Five-Year Science and Technology Research Planning Projects of Jilin Province. The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that improved this paper

Keywords

clustering
data mining
mixed numeric and categorical data
cluster center initialization

Access to Document

10.12785/amis/090621Licence: Unspecified

http://naturalspublishing.com/Article.asp?ArtcID=10026

Cite this

@article{b3411dc09fcb43fd94baaa267fa5c46c,

title = "A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance",

abstract = "The k-prototypes algorithms are well known for their efficiency to cluster mixed numeric and categorical data. In kprototypes type algorithms the initial cluster centers are often determined in a random manner. It is acknowledged that the initial placement of cluster centers has a direct impact on the performance of the k-prototypes algorithms. However, most of the existing initialization approaches are designed for the k-means or k-modes algorithms, which can only deal with either pure numeric or categorical data, but not the mixture of both. In this paper, we propose a novel cluster center initialization method for the k-prototypes algorithms to address this issue. In the proposed method, the centrality of data objects is introduced based on the concept of neighborset, and then both the centrality and distance are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments in comparison with that of traditional random initialization method. ",

keywords = "clustering, data mining, mixed numeric and categorical data, cluster center initialization",

author = "Jinchao Ji and Wei Pang and Yanlin Zheng and Zhe Wang and Zhiqiang Ma and Libiao Zhang",

note = "Acknowledgements This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. (21127010, 61202309), China Postdoctoral Science Foundation under Grant No. 2013M530956, Science and Technology Development Plan of Jilin province under Grant No. 20140520068JH, Fundamental Research Funds for the Central Universities under No. 14QNJJ028, the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University under Grant No. 93K172014K07, the 2014 Industrial Technology Research and Development Special Project of Jilin Province, the 2015 Department of Education 12th Five-Year Science and Technology Research Planning Projects of Jilin Province. The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that improved this paper ",

year = "2015",

doi = "10.12785/amis/090621",

language = "English",

volume = "9",

pages = "2933--2942",

journal = "Applied Mathematics & Information Sciences",

issn = "1935-0090",

publisher = "Natural Sciences Publishing Corporation",

number = "6",

}

TY - JOUR

T1 - A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance

AU - Ji, Jinchao

AU - Pang, Wei

AU - Zheng, Yanlin

AU - Wang, Zhe

AU - Ma, Zhiqiang

AU - Zhang, Libiao

N1 - Acknowledgements This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. (21127010, 61202309), China Postdoctoral Science Foundation under Grant No. 2013M530956, Science and Technology Development Plan of Jilin province under Grant No. 20140520068JH, Fundamental Research Funds for the Central Universities under No. 14QNJJ028, the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University under Grant No. 93K172014K07, the 2014 Industrial Technology Research and Development Special Project of Jilin Province, the 2015 Department of Education 12th Five-Year Science and Technology Research Planning Projects of Jilin Province. The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that improved this paper

PY - 2015

Y1 - 2015

N2 - The k-prototypes algorithms are well known for their efficiency to cluster mixed numeric and categorical data. In kprototypes type algorithms the initial cluster centers are often determined in a random manner. It is acknowledged that the initial placement of cluster centers has a direct impact on the performance of the k-prototypes algorithms. However, most of the existing initialization approaches are designed for the k-means or k-modes algorithms, which can only deal with either pure numeric or categorical data, but not the mixture of both. In this paper, we propose a novel cluster center initialization method for the k-prototypes algorithms to address this issue. In the proposed method, the centrality of data objects is introduced based on the concept of neighborset, and then both the centrality and distance are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments in comparison with that of traditional random initialization method.

AB - The k-prototypes algorithms are well known for their efficiency to cluster mixed numeric and categorical data. In kprototypes type algorithms the initial cluster centers are often determined in a random manner. It is acknowledged that the initial placement of cluster centers has a direct impact on the performance of the k-prototypes algorithms. However, most of the existing initialization approaches are designed for the k-means or k-modes algorithms, which can only deal with either pure numeric or categorical data, but not the mixture of both. In this paper, we propose a novel cluster center initialization method for the k-prototypes algorithms to address this issue. In the proposed method, the centrality of data objects is introduced based on the concept of neighborset, and then both the centrality and distance are exploited together to determine initial cluster centers. The performance of the proposed method is demonstrated by a series of experiments in comparison with that of traditional random initialization method.

KW - clustering

KW - data mining

KW - mixed numeric and categorical data

KW - cluster center initialization

U2 - 10.12785/amis/090621

DO - 10.12785/amis/090621

M3 - Article

SN - 1935-0090

VL - 9

SP - 2933

EP - 2942

JO - Applied Mathematics & Information Sciences

JF - Applied Mathematics & Information Sciences

IS - 6

ER -

A novel cluster center initialization method for the k-prototypes algorithms using centrality and distance

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this