TY - JOUR
T1 - A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data
AU - Ji, Jinchao
AU - Pang, Wei
AU - Han, Xiao
AU - Zhou, Chunguang
AU - Wang, Zhe
N1 - A paid open access option is available for this journal.
Voluntary deposit by author of pre-print allowed on Institutions open scholarly website and pre-print servers
Voluntary deposit by author of authors post-print allowed on institutions open scholarly website including Institutional Repository
Deposit due to Funding Body, Institutional and Governmental mandate only allowed where separate agreement between repository and publisher exists
Set statement to accompany deposit
Published source must be acknowledged
Must link to journal home page or articles' DOI
Publisher's version/PDF cannot be used
Articles in some journals can be made Open Access on payment of additional charge
NIH Authors articles will be submitted to PubMed Central after 12 months
Authors who are required to deposit in subject-based repositories may also use Sponsorship Option
PY - 2012/6
Y1 - 2012/6
N2 - In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.
AB - In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.
KW - fuzzy clustering
KW - data mining
KW - mixed data
KW - dissimilarity measure
KW - attribute significance
UR - https://doi.org/10.1016/j.knosys.2012.04.029
U2 - 10.1016/j.knosys.2012.01.006
DO - 10.1016/j.knosys.2012.01.006
M3 - Article
VL - 30
SP - 129
EP - 135
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
SN - 0950-7051
ER -