An Enhanced Corpus for Arabic Newspapers
Comments
2LIRE Laboratory, University of Constantine 2,
Algeria
Abstract: In this paper, we propose our enhanced approach to create a dedicated corpus
for Algerian Arabic newspapers comments. The developed approach has to enhance
an existing approach by the enrichment of the available corpus and the
inclusion of the annotation step by following the Model Annotate Train Test
Evaluate Revise (MATTER) approach. A corpus is created by collecting comments
from web sites of three well know Algerian newspapers. Three classifiers,
support vector machines, naïve Bayes, and k-nearest neighbors, were used for
classification of comments into positive and negative classes. To identify the
influence of the stemming in the obtained results, the classification was tested
with and without stemming. Obtained results show that stemming does not enhance
considerably the classification due to the nature of Algerian comments tied to
Algerian Arabic Dialect. The promising results constitute a motivation for us
to improve our approach especially in dealing with non Arabic sentences, especially
Dialectal and French ones.
Keywords: Opinion mining, sentiment analysis, K-Nearest Neighbours,
Naïve Bayes, Support Vector Machines, Arabic, comment.
Received December 22, 2017; accepted June
18, 2019
https://doi.org/10.34028/iajit/17/5/12