A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Jinchao Ji, Wei Pang, Xiao Han, Chunguang Zhou, Zhe Wang

Research output: Contribution to journalArticle

87 Citations (Scopus)

Abstract

In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.
Original languageEnglish
Pages (from-to)129-135
Number of pages7
JournalKnowledge-Based Systems
Volume30
Early online date21 Jan 2012
DOIs
Publication statusPublished - Jun 2012

Fingerprint

Clustering algorithms
Clustering
Clustering algorithm
Categorical data
Prototype
Experiments
Dissimilarity

Keywords

  • fuzzy clustering
  • data mining
  • mixed data
  • dissimilarity measure
  • attribute significance

Cite this

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. / Ji, Jinchao; Pang, Wei; Han, Xiao; Zhou, Chunguang ; Wang, Zhe.

In: Knowledge-Based Systems, Vol. 30, 06.2012, p. 129-135.

Research output: Contribution to journalArticle

Ji, Jinchao ; Pang, Wei ; Han, Xiao ; Zhou, Chunguang ; Wang, Zhe. / A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. In: Knowledge-Based Systems. 2012 ; Vol. 30. pp. 129-135.
@article{aededf6768c948fe94de816152d5ab57,
title = "A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data",
abstract = "In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.",
keywords = "fuzzy clustering, data mining, mixed data, dissimilarity measure, attribute significance",
author = "Jinchao Ji and Wei Pang and Xiao Han and Chunguang Zhou and Zhe Wang",
note = "A paid open access option is available for this journal. Voluntary deposit by author of pre-print allowed on Institutions open scholarly website and pre-print servers Voluntary deposit by author of authors post-print allowed on institutions open scholarly website including Institutional Repository Deposit due to Funding Body, Institutional and Governmental mandate only allowed where separate agreement between repository and publisher exists Set statement to accompany deposit Published source must be acknowledged Must link to journal home page or articles' DOI Publisher's version/PDF cannot be used Articles in some journals can be made Open Access on payment of additional charge NIH Authors articles will be submitted to PubMed Central after 12 months Authors who are required to deposit in subject-based repositories may also use Sponsorship Option",
year = "2012",
month = "6",
doi = "10.1016/j.knosys.2012.01.006",
language = "English",
volume = "30",
pages = "129--135",
journal = "Knowledge-Based Systems",
issn = "0950-7051",
publisher = "Elsevier",

}

TY - JOUR

T1 - A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

AU - Ji, Jinchao

AU - Pang, Wei

AU - Han, Xiao

AU - Zhou, Chunguang

AU - Wang, Zhe

N1 - A paid open access option is available for this journal. Voluntary deposit by author of pre-print allowed on Institutions open scholarly website and pre-print servers Voluntary deposit by author of authors post-print allowed on institutions open scholarly website including Institutional Repository Deposit due to Funding Body, Institutional and Governmental mandate only allowed where separate agreement between repository and publisher exists Set statement to accompany deposit Published source must be acknowledged Must link to journal home page or articles' DOI Publisher's version/PDF cannot be used Articles in some journals can be made Open Access on payment of additional charge NIH Authors articles will be submitted to PubMed Central after 12 months Authors who are required to deposit in subject-based repositories may also use Sponsorship Option

PY - 2012/6

Y1 - 2012/6

N2 - In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.

AB - In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.

KW - fuzzy clustering

KW - data mining

KW - mixed data

KW - dissimilarity measure

KW - attribute significance

UR - https://doi.org/10.1016/j.knosys.2012.04.029

U2 - 10.1016/j.knosys.2012.01.006

DO - 10.1016/j.knosys.2012.01.006

M3 - Article

VL - 30

SP - 129

EP - 135

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

SN - 0950-7051

ER -