Direct Text Classifier for Thematic Arabic Discourse Documents

Direct Text Classifier for Thematic Arabic Discourse Documents

Khalid Nahar1, Ra’ed Al-Khatib1, Moy'awiah Al-Shannaq1, Mohammad Daradkeh2, and Rami Malkawi3

1Department of Computer Sciences, Yarmouk University, Jordan

2Department of Management Information System, Yarmouk University, Jordan

3Department of Computer Information System, Yarmouk University, Jordan

Abstract: Maintaining the topical coherence while writing a discourse is a major challenge confronting novice and non-novice writers alike. This challenge is even more intense with Arabic discourse because of the complex morphology and the widespread of synonyms in Arabic language. In this research, we present a direct classification of Arabic discourse document while writing. This prescriptive proposed framework consists of the following stages: data collection, pre-processing, construction of Language Model (LM), topics identification, topics classification, and topic notification. To prove and demonstrate our proposed framework, we designed a system and applied it on a corpus of 2800 Arabic discourse documents synthesized into four predefined topics related to: Culture, Economy, Sport, and Religion. System performance was analysed, in terms of accuracy, recall, precision, and F-measure. The results demonstrated that the proposed topic modeling-based decision framework is able to classify topics while writing a discourse with accuracy of 91.0%.

Keywords: Text mining, Arabic discourse; text classification, topic modling, n-gram language model, topical coherence.

Received February 24, 2018; accepted August 13, 2018
https://doi.org/10.34028/iajit/17/3/13
Read 3731 times Last modified on Thursday, 30 April 2020 10:24
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…