Arabic Text Categorization

Arabic Text Categorization

Rehab Duwairi

Department of Computer Information Systems, Jordan University of Science and Technology, Jordan

 Abstract: In this paper, we compare the performance of three classifiers for Arabic text categorization. In particular, the naïve Bayes, k-nearest-neighbors (knn), and distance-based classifiers were used. Unclassified documents were preprocessed by removing punctuation marks and stopwords. Each document is then represented as a vector of words (or of words and their frequencies as in the case of the naïve Bayes classifier). Stemming was used to reduce the dimensionality of feature vectors of documents. The accuracy of the classifiers is compared using recall, precision, error rate and fallout. The results of the experimentations that were carried out on an in-house collected Arabic text show that the naïve Bayes classifier outperforms the other two.  

Keywords: Text categorization, naïve Bayes, knn, distance-based classifier, Arabic language. 

Received October 10, 2005; accepted March 1, 2006
Read 6707 times Last modified on Wednesday, 20 January 2010 02:39
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…