A Novel Approach of Clustering Documents: Minimizing Computational Complexities in Accessing Database Systems

  • Ghadeer Written by
  • Update: 30/06/2022

A Novel Approach of Clustering Documents: Minimizing Computational Complexities in Accessing Database Systems

Mohammed Alghobiri

Department of Management Information Systems

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Khalid Mohiuddin

Department of Management Information Systems

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Mohammed Abdul Khaleel

Department of Computer Science

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Mohammad Islam

Department of Management Information Systems

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Samreen Shahwar

Department of Information Systems

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Osman Nasr

Department of Management Information Systems

King Khalid University, Saudi Arabia

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: This study addresses the real-time issue of managing an academic program's documents in a university environment. In practice, document classification from a corpus is challenging when the dataset size is large, and the complexity increases if to meet some specific document management requirements. This study presents a practical approach to grouping documents based on a content similarity measure. The approach analyzes the state-of-the-art clustering algorithms performance, considers Hamiltonian graph properties and a distance function. The distance function measures (1) the content similarity between the documents and (2) the distances between the produced clusters. The proposed algorithm improves clusters’ quality by applying Hamiltonian graph properties. One of the significant characteristics of the proposed function is that it determines document types from the corpus. Hence, this does not require the initial assumption of cluster number before the algorithm execution. This approach omits the arbitrary primordial option of k-centroids of the k-means algorithm, reduces computational complexities, and overcomes some limitations of commonly practicing clustering algorithms. The proposed approach enables an effective way of document organization opportunities to the information systems developers when designing document management systems.

Keywords: Clustering algorithms, document categorization, document clustering, hamiltonian graph, similarity measure.

Received July 11, 2020; accepted February 21, 2021
https://doi.org/10.34028/iajit/19/4/6

Full text

Read 481 times
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…