Exploiting Multilingual Wikipedia to Improve Arabic Named Entity Resources

Exploiting Multilingual Wikipedia to Improve Arabic Named Entity Resources

Mariam Biltawi, Arafat Awajan, Sara Tedmori, and Akram Al-Kouz

King Hussein Faculty of Computing Sciences, Princess Sumaya University for Technology, Jordan

Abstract: This paper focuses on the creation of Arabic named entity gazetteers, by exploiting Wikipedia and using the Naïve Bayes classifier to classify the named entities into the three main categories: person, location, and organization. The process of building the gazetteer starts with automatically creating the datasets. The dataset for the training is constructed using only Arabic text, whereas, the testing dataset is derived from an English text using the Stanford name entity recognizer. A Wikipedia title existence check of these English name entities is then performed. Next, if the named entity exists as a Wikipedia page title, a check for Arabic parallel pages is conducted. Finally, the Naïve Bayes classifier is applied to verify or assign new name entity tag to the Arabic name entity. Due to the lack of available resources, the proposed system is evaluated manually by calculating accuracy, recall, and precision. Results show an accuracy of 53%.

Keywords: Arabic name entity resources; naïve bayes classifier; wikipedia.

Received February 7, 2017; accepted May 10, 2017

Read 2071 times Last modified on Wednesday, 12 July 2017 04:05
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…