Print this page
Direct Text Classifier for Thematic Arabic Discourse Documents

Direct Text Classifier for Thematic Arabic Discourse Documents

Direct Text Classifier for Thematic Arabic Discourse Documents

Khalid Nahar1, Ra’ed Al-Khatib1, Moy'awiah Al-Shannaq1, Mohammad Daradkeh2, and Rami Malkawi3

1Department of Computer Sciences, Yarmouk University, Jordan

2Department of Management Information System, Yarmouk University, Jordan

3Department of Computer Information System, Yarmouk University, Jordan

Abstract: Maintaining the topical coherence while writing a discourse is a major challenge confronting novice and non-novice writers alike. This challenge is even more intense with Arabic discourse because of the complex morphology and the widespread of synonyms in Arabic language. In this research, we present a direct classification of Arabic discourse document while writing. This prescriptive proposed framework consists of the following stages: data collection, pre-processing, construction of Language Model (LM), topics identification, topics classification, and topic notification. To prove and demonstrate our proposed framework, we designed a system and applied it on a corpus of 2800 Arabic discourse documents synthesized into four predefined topics related to: Culture, Economy, Sport, and Religion. System performance was analysed, in terms of accuracy, recall, precision, and F-measure. The results demonstrated that the proposed topic modeling-based decision framework is able to classify topics while writing a discourse with accuracy of 91.0%.

Keywords: Text mining, Arabic discourse; text classification, topic modling, n-gram language model, topical coherence.

Received February 24, 2018; accepted August 13, 2018
https://doi.org/10.34028/iajit/17/3/13
Read 2521 times Last modified on Thursday, 30 April 2020 10:24
Share
Super User

Latest from Super User

We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…