New Language Models for Spelling Correction

  • Ghadeer Written by
  • Update: 03/11/2022

New Language Models for Spelling Correction

 

Saida Laaroussi

IT, Logistics and Mathematics, Ibn Tofail University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

Si Lhoussain Aouragh

IT and Decision Support System, Mohamed V University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abdellah Yousfi

Department of Economics and Management, Mohamed V University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

              

Mohamed Nejja

Department of Software Engineering, Mohamed V University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

Hicham Geddah

Department of Computer Science, Mohamed V University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

Said Ouatik El Alaoui

IT, Logistics and Mathematics, Ibn Tofail University, Morocco

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: Correcting spelling errors based on the context is a fairly significant problem in Natural Language Processing (NLP) applications. The majority of the work carried out to introduce the context into the process of spelling correction uses the n-gram language models. However, these models fail in several cases to give adequate probabilities for the suggested solutions of a misspelled word in a given context. To resolve this issue, we propose two new language models inspired by stochastic language models combined with edit distance. A first phase consists in finding the words of the lexicon orthographically close to the erroneous word and a second phase consists in ranking and limiting these suggestions. We have applied the new approach to Arabic language taking into account its specificity of having strong contextual connections between distant words in a sentence. To evaluate our approach, we have developed textual data processing applications, namely the extraction of distant transition dictionaries. The correction accuracy obtained exceeds 98% for the first 10 suggestions. Our approach has the advantage of simplifying the parameters to be estimated with a higher correction accuracy compared to n-gram language models. Hence the need to use such an approach.

Keywords: Spelling correction, contextual correction, n-gram language models, edit distance, NLP.

Received January 1, 2021; accepted January 19, 2022

https://doi.org/10.34028/iajit/19/6/12

Full text

Read 615 times Last modified on Thursday, 03 November 2022 10:23
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…