Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognitio

Using Language Independent and Language Specific Features to Enhance Arabic Named
 Entity Recognition

Yassine Benajiba2, Mona Diab2,  and Paolo Rosso1
1Natural Language Engineering Laboratory, ELiRF, Universidad Politécnica Valencia, Spain
2Center of Computational Learning Systems, Columbia University, USA


Abstract:The Named entity recognition task has been garnering significant attention as it has been shown to help improve the performance of many natural language processing applications. More recently, we are starting to see a surge in developing named entity recognition systems for languages other than English. With the relative abundance of resources for the Arabic language and a certain degree of maturation in the state of the art for processing Arabic, it is natural to see interest in developing NER systems for the language.  In this paper, we investigate the impact of using different sets of features that are both language independent and language specific in a discriminative machine learning framework, namely, Support Vector Machines. We explore lexical, contextual and morphological features and nine data-sets of different genres and annotations. We systematically measure the impact of the different features in isolation and combined. We achieve the highest performance using a combination of all features, F1=82.71. Essentially combining language independent features with language specific ones yields the best performance on all the genres of text we investigate. However, on a class level, we observe that the different classes of named entities benefit differently from the morphological features employed.

Keywords: Arabic natural language processing, classification, information extraction, named entity recognition.

Received December 18, 2008; accepted June 21, 2009

Read 5199 times Last modified on Wednesday, 20 January 2010 00:30
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…