Print this page
Evaluation of Text Clustering  Methods Using WordNet

Evaluation of Text Clustering Methods Using WordNet

Evaluation of Text Clustering
Methods Using WordNet

Abdelmalek Amine1,2, Zakaria Elberrichi1, and Michel Simonet3‎
‎1EEDIS Laboratory, Department of Computer Science, UDL University, Algeria‎
‎2Department of Computer Science, UTMS University, Algeria‎
‎3IN3S, Joseph Fourier University, France



Abstract: The increasing number of digitized texts presently available notably on the Web has developed an acute need in text mining ‎techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extract ‎knowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highly ‎confusing to a user to choose the algorithm that best suits its target dataset. Actually, it is very hard to define which algorithms ‎work the best, since results depend considerably on the application and on the kinds of data at hand. In this paper, we propose, ‎study and compare three text clustering methods: an ascending hierarchical clustering method, a SOM-based clustering ‎method and an ant-based clustering method, all of these based on the synsets of WordNet as terms for the representation of ‎textual documents. The effects of these methods are examined in several experiments using 3 similarity measurements: the ‎cosine distance, the Euclidean distance and the manhattan distance. The reuters-21578 corpus is used for evaluation. The ‎evaluation was done, by using the F-measure. The results obtained show that the SOM-based clustering method using the ‎cosine distance provides the best results.‎

Keywords: Text clustering, similarity, WordNet, reuter-21578, and F-measure.‎

Received February 24, 2009; accepted August 3, 2009‎

Full Text
Read 3257 times Last modified on Wednesday, 13 October 2010 05:17
Share
Super User

Latest from Super User

We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…