Comparison of Dimension Reduction Techniques on
High Dimensional Datasets
Kazim Yildiz1, Yilmaz Camurcu2, and Buket
Dogan1
1Deparment of Computer Engineering, Marmara Unıversity, Turkey
2Department of Computer Engineering, Fatih Sultan Mehmet Waqf University,
Turkey
Abstract: High dimensional data becomes very common with the rapid growth of data that
has been stored in databases or other information areas. Thus clustering process
became an urgent problem. The well-known clustering algorithms are not adequate
for the high dimensional space because of the problem that is called curse of
dimensionality. So dimensionality reduction techniques have been used for
accurate clustering results and improve the clustering time in high dimensional
space. In this work different dimensionality reduction techniques were combined
with Fuzzy C-Means clustering algorithm. It is aimed to reduce the complexity
of high dimensional datasets and to generate more accurate clustering results.
The results were compared in terms of cluster purity, cluster entropy and
mutual info. Dimension reduction techniques are compared with current Central
Processing Unit (CPU), current memory and elapsed CPU time. The experiments
showed that the proposed work produces promising results on high dimensional
space.
Keywords: High
dimensional data, clustering, dimensionality reduction, data mining.
|