A study on Two-Stage Mixed Attribute Data Clustering Based on Density
Peaks
Shihua Liu, Hao
Zhang, and Xianghua Liu
Department of Information
Technology, Wenzhou Polytechnic, China
Abstract: A Two-stage clustering framework and a clustering
algorithm for mixed attribute data based on density peaks and Goodall distance
are proposed. Firstly, the subset of numerical attributes of the dataset is clustered, and then the result is
mapped into one-dimensional categorical attribute and added to the subset of categorical
attribute data. Finally, the new dataset is clustered by the density peaks clustering
algorithm to obtain the final
result. Experiments on three commonly used UCI datasets show that this algorithm can effectively realize mixed
attribute clustering and produce better clustering results than the traditional
K-prototypes algorithm do. The clustering accuracy on the Acute, Heart and
Credit datasets are 17%, 24%, and 21% higher on average than that of the
K-prototypes, respectively.
Keywords: Mixed data clustering,
density peaks, k-prototypes algorithm, validity index.
Received July 4, 2019; accepted September 27, 2020