A Markovian Approach for Arabic Root Extraction

A Markovian Approach for Arabic
Root Extraction

Abderrahim Boudlal1, Rachid Belahbib2, Abdelhak Lakhouaja3, Azzeddine Mazroui3,
Abdelouafi Meziane3, and Mohamed Bebah3
1Faculty of Letters and Human Sciences, University Mohamed I, Morocco
2College of Arts and Sciences, Qatar University, Qatar
3Department of Mathematics and Computer Sciences, University Mohamed I, Morocco

Abstract: In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible roots. For that, we adopt the segmentation of the word into three parts (prefix, stem,  suffix). In the second module we use the context to identify the correct root among all the possible roots of the word. For this purpose, we use a Hidden Markov Models approach, where the observations are the words and the possible roots represent the hidden states. We validate the approach using the NEMLAR Arabic writing corpus consisting of 500,000 words. The system gives the correct root in more than 98% of the training set, and in almost 94% of the words in the testing set.

Keywords: Arabic NLP, morphological analysis, root extraction, hidden Markov models, and Viterbi algorithm.

Received February 21, 2009; accepted August 3, 2009

 

Full Text

 

Read 3759 times Last modified on Sunday, 05 December 2010 02:05
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…