Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

Exploring the Potential of Schemes in Building NLP Tools for Arabic Language

Mohamed Ben Mohamed, Souheyl Mallat, Mohamed Nahdi and Mounir Zrigui

LaTICE Laboratory, Faculty of Sciences of Monastir, Tunisia

 

Abstract: Arabic is known for its sparseness, which explains the difficulty of its automatic processing. The arabic language is based on schemes; lemmas are produced using derivation based on roots and schemes. This latter character presents two major advantages: First, this “hidden side” of the arabic language composed of schemes suffers much less from sparseness since it represents a finite set, second, schemes keep a large number of features of the language in a much reduced vocabulary size. Schemes present a very great perspective and have great potential in building accurate natural language processing tools for arabic. In this work we tried to explore this potential by building some NLP tools while relying entirely on schemes. The work is related to text classification and a Probabilistic Context Free Grammar (PCFG) parsing.

 Keywords: Arabic language, schemes, roots, derivation, text classification, PCFG, parsing

Received August 18, 2013; accepted May 10, 2014

 

Full Text

 

Read 1917 times Last modified on Sunday, 19 August 2018 05:00
Share

Upcoming courses

  • Diploma Courses
  • Business and Enterprise
  • Digital Literacy & IT
  • Health Literacy
  • Business Literacy

Free courses

Starting from Jun. 14 2016

the degree finder

in 3 easy steps
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…