New Language Models for Spelling Correction
Saida Laaroussi IT, Logistics and Mathematics, Ibn Tofail University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Si Lhoussain Aouragh IT and Decision Support System, Mohamed V University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Abdellah Yousfi Department of Economics and Management, Mohamed V University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Mohamed Nejja Department of Software Engineering, Mohamed V University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Hicham Geddah Department of Computer Science, Mohamed V University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Said Ouatik El Alaoui IT, Logistics and Mathematics, Ibn Tofail University, Morocco This email address is being protected from spambots. You need JavaScript enabled to view it. |
Abstract: Correcting spelling errors based on the context is a fairly significant problem in Natural Language Processing (NLP) applications. The majority of the work carried out to introduce the context into the process of spelling correction uses the n-gram language models. However, these models fail in several cases to give adequate probabilities for the suggested solutions of a misspelled word in a given context. To resolve this issue, we propose two new language models inspired by stochastic language models combined with edit distance. A first phase consists in finding the words of the lexicon orthographically close to the erroneous word and a second phase consists in ranking and limiting these suggestions. We have applied the new approach to Arabic language taking into account its specificity of having strong contextual connections between distant words in a sentence. To evaluate our approach, we have developed textual data processing applications, namely the extraction of distant transition dictionaries. The correction accuracy obtained exceeds 98% for the first 10 suggestions. Our approach has the advantage of simplifying the parameters to be estimated with a higher correction accuracy compared to n-gram language models. Hence the need to use such an approach.
Keywords: Spelling correction, contextual correction, n-gram language models, edit distance, NLP.
Received January 1, 2021; accepted January 19, 2022