Arabic Text Categorization: A comparative Study of Different Representation Modes

Arabic Text Categorization: A comparative Study of Different Representation Modes

Zakaria Elberrichi and Karima Abidi 
EEDIS Laboratory, Department of computer science, Algeria
 
 
Abstract: The quantity of accessible information on Internet is phenomenal, and its categorization remains one of the most important problems. A lot of work is currently, focused on English rightly since; it is the dominant language of the Web. However, a need arises for the other languages, because the Web is each day more multilingual. The need is much more pressing for the Arabic language. Our research is on the categorization of the Arabic texts, its originality relates to the use of a conceptual representation of the text. For that we will use Arabic WordNet (AWN) as a lexical and semantic resource. To comprehend its effect, we incorporate it in a comparative study with the other usual modes of representation (bag of words and N-grams), and we use the K-NN learning scheme with different similarity measures. The results show the benefits and advantages of this representation compared to the more conventional methods, and demonstrate that the addition of the semantic dimension is one of the most promising ways for the automatic categorization of Arabic texts.




Keywords:
Categorisation, Arabic texts, Arabic wordnet, bag of words, ngrams, and concepts.


Received May 27, 2010; accepted August 10, 2010

Read 3267 times Last modified on Tuesday, 15 November 2011 07:55
Share

Upcoming courses

  • Diploma Courses
  • Business and Enterprise
  • Digital Literacy & IT
  • Health Literacy
  • Business Literacy

Free courses

Starting from Jun. 14 2016

the degree finder

in 3 easy steps
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…