Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

Mohamed Ben Mohamed, Souheyl Mallat, Mohamed Nahdi and Mounir Zrigui

LaTICE Laboratory, Faculty of Sciences of Monastir, Tunisia

 

Abstract: Arabic is known for its sparseness, which explains the difficulty of its automatic processing. The arabic language is based on schemes; lemmas are produced using derivation based on roots and schemes. This latter character presents two major advantages: First, this “hidden side” of the arabic language composed of schemes suffers much less from sparseness since it represents a finite set, second, schemes keep a large number of features of the language in a much reduced vocabulary size. Schemes present a very great perspective and have great potential in building accurate natural language processing tools for arabic. In this work we tried to explore this potential by building some NLP tools while relying entirely on schemes. The work is related to text classification and a Probabilistic Context Free Grammar (PCFG) parsing.

 Keywords: Arabic language, schemes, roots, derivation, text classification, PCFG, parsing

Received August 18, 2013; accepted May 10, 2014

 

Full Text

 

Read 1895 times Last modified on Sunday, 19 August 2018 05:00
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…