Parallel HMM-Based Approach for Arabic Part of
Speech Tagging
Ayoub Kadim and Azzeddine Lazrek
Department of Computer Science, Faculty of
Science, Cadi Ayyad University, Morocco
Abstract: In this paper we try to go beyond the classical
use of the Hidden Markov Model for Part Of Speech Tagging, particularly for the
Arabic language. In fact, most available Arabic tagging systems and tagsets are
derived from English and do not make use of the linguistic richness of Arabic.
Our new proposed tagging system will consist of two Hidden Markov Models
working in parallel: In addition to the main model, a second model is added to
serve as a reference for low probabilities tags. Of course, a dual corpus is
required to train both models. To do so, we restructure the Nemlar Arabic
corpus and extract a new tagset from diacritics and grammatical rules. The
approach is implemented by using Java programming environment and several
experimentations are conducted to evaluate it. The results of this approach,
which are promising, as well as its limitations, are deeply discussed and
future possible enhancements are also highlighted. This work will open the door
for new promising research perspectives, particularly for the Arabic language
processing, and more generally for the applications of Hidden Markov Models.
Keywords: Part of speech tagging, hidden Markov model, Viterbi
algorithm, natural language processing, corpus, arabic language.
|