An Improved Clustering Algorithm for Text Mining: Multi-cluster Spherical K-means

An Improved Clustering Algorithm for Text Mining: Multi-cluster Spherical K-means

Volkan Tunali1, Turgay Bilgin1 and Ali Camurcu2
1 Department of Software Engineering, Maltepe University, Turkey

2 Department of Computer Engineering, Fatih Sultan Mehmet Waqf University, Turkey

  Abstract: Thanks to advances in information and communication technologies, there is a prominent increase in the amount of information produced specifically in the form of text documents. In order to, effectively deal with this “information explosion” problem and utilize the huge amount of text databases, efficient and scalable tools and techniques are indispensable. In this study, text clustering which is one of the most important techniques of text mining that aims at extracting useful information by processing data in textual form is addressed. An improved variant of Spherical K-means algorithm named multi-cluster spherical K-means is developed for clustering high dimensional document collections with high performance and efficiency. Experiments were performed on several document data sets and it is shown that the new algorithm provides significant increase in clustering quality without causing considerable difference in CPU time usage when compared to Spherical K-means algorithm.

Keywords: Data mining, text mining, document clustering, spherical k-means.

Received February 10, 2013; accepted March 17, 2014

Full Text

  

Read 1741 times Last modified on Thursday, 28 April 2016 04:42
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…