Arabic Quran Verses Authentication Using Deep Learning and Word Embeddings
Zineb Touati-Hamad Laboratory of Mathematics, Informatics and Systems, University Larbi Tebessi, Algeria This email address is being protected from spambots. You need JavaScript enabled to view it. |
Mohamed Ridda Laouar Laboratory of Mathematics, Informatics and Systems, University Larbi Tebessi, Algeria This email address is being protected from spambots. You need JavaScript enabled to view it. |
Issam Bendib Laboratory of Mathematics, Informatics and Systems, University Larbi Tebessi, Algeria This email address is being protected from spambots. You need JavaScript enabled to view it. |
Saqib Hakak Faculty of Computer Science, University of New Brunswick, Canada This email address is being protected from spambots. You need JavaScript enabled to view it. |
Abstract: Nowadays, with the developments witnessed by the Internet, algorithms have come to control all aspects of digital content. Due to its Arabic roots, it is ironic to find that Arabic Quranic content is still thirsty to benefit from computer linguistics, especially with the advent of artificial intelligence algorithms. The massive spread of Islamic-typed websites and applications has led to a widespread of digital Quranic content. Unfortunately, such content lacks censorship and can rarely match resourcefulness. It is quite difficult, especially for a non-native speaker of the Arabic language, to distinguish and authenticate the provided Quranic verses from the non-Quranic Arabic texts. Text processing techniques classified outside the field of Natural Language Processing (NLP) give less qualified results, especially with Arabic texts. To address this problem, we propose to explore Word Embeddings (WE) with Deep Learning (DL) techniques to identify Quranic verses in Arabic textual content. The proposed work is evaluated using twelve different word embeddings models with two popular classifiers for binary classification, namely: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The experimental results showed the superiority of the proposed approach over traditional methods in distinguishing between the Quranic verses and the Arabic text with an accuracy of 98.33%.
Keywords: Arabic text, Quranic verse, Authentication, NLP, Word Embeddings, Word2vec, DL, CNN, LSTM.
Received February 3, 2021; accepted October 10, 2021