Evaluation of Text Clustering Methods Using WordNet

Evaluation of Text Clustering
Methods Using WordNet

Abdelmalek Amine1,2, Zakaria Elberrichi1, and Michel Simonet3‎
‎1EEDIS Laboratory, Department of Computer Science, UDL University, Algeria‎
‎2Department of Computer Science, UTMS University, Algeria‎
‎3IN3S, Joseph Fourier University, France



Abstract: The increasing number of digitized texts presently available notably on the Web has developed an acute need in text mining ‎techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extract ‎knowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highly ‎confusing to a user to choose the algorithm that best suits its target dataset. Actually, it is very hard to define which algorithms ‎work the best, since results depend considerably on the application and on the kinds of data at hand. In this paper, we propose, ‎study and compare three text clustering methods: an ascending hierarchical clustering method, a SOM-based clustering ‎method and an ant-based clustering method, all of these based on the synsets of WordNet as terms for the representation of ‎textual documents. The effects of these methods are examined in several experiments using 3 similarity measurements: the ‎cosine distance, the Euclidean distance and the manhattan distance. The reuters-21578 corpus is used for evaluation. The ‎evaluation was done, by using the F-measure. The results obtained show that the SOM-based clustering method using the ‎cosine distance provides the best results.‎

Keywords: Text clustering, similarity, WordNet, reuter-21578, and F-measure.‎

Received February 24, 2009; accepted August 3, 2009‎

Full Text
Read 3861 times Last modified on Wednesday, 13 October 2010 05:17
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…