Preceding Document Clustering by Graph Mining Based Maximal Frequent Termsets Preservation

Preceding Document Clustering by Graph Mining Based Maximal Frequent Termsets Preservation

Syed Shah and Mohammad Amjad

Department of Computer Engineering, Jamia Millia Islamia, India

Abstract: This paper presents an approach to cluster documents. It introduces a novel graph mining based algorithm to find frequent termsets present in a document set. The document set is initially mapped onto a bipartite graph. Based on the results of our algorithm, the document set is modified to reduce its dimensionality. Then, Bisecting K-means algorithm is executed over the modified document set to obtain a set of very meaningful clusters. It has been shown that the proposed approach, Clustering preceded by Graph Mining based Maximal Frequent Termsets Preservation (CGFTP), produces better quality clusters than produced by some classical document clustering algorithm(s). It has also been shown that the produced clusters are easily interpretable. The quality of clusters has been measured in terms of their F-measure.

Keywords: Bipartite graph, graph mining, frequent termsets mining, bisecting K-means.

Received June 18, 2016; accepted June 29, 2017
 
Read 1207 times
Share

Upcoming courses

  • Diploma Courses
  • Business and Enterprise
  • Digital Literacy & IT
  • Health Literacy
  • Business Literacy

Free courses

Starting from Jun. 14 2016

the degree finder

in 3 easy steps
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…