GovdeTurk: A Novel Turkish Natural Language Processing
Tool for Stemming, Morphological Labelling and Verb Negation
Sait Yucebas1 and Rabia Tintin2
1Computer Engineering Department, Canakkale Onsekiz Mart University,
Turkey
2Department
of Student Affairs, Canakkale Onsekiz Mart University, Turkey
Abstract: GovdeTurk is a
tool for stemming, morphological labeling and verb negation for Turkish
language. We designed comprehensive finite automata to represent Turkish
grammar rules. Based on these automata, GovdeTurk finds the stem of the word by
removing the inflectional suffixes in a longest match strategy. Levenshtein
Distance is used to correct spelling errors that may occur during suffix
removal. Morphological labeling identifies the functionality of a given token.
Nine different dictionaries are constructed for each specific word type. These
dictionaries are used in the stemming and morphological labeling. Verb negation
module is developed for lexicon based sentiment analysis. GovdeTurk is tested
on a dataset of one million words. The results are compared with Zemberek and
Turkish Snowball Algorithm. While the closest competitor, Zemberek, in the
stemming step has an accuracy of 80%, GovdeTurk gives 97.3% of accuracy.
Morphological labeling accuracy of GovdeTurk is 93.6%. With outperforming
results, our model becomes foremost among its competitors.
Keywords: Natural language processing, stemming,
morphological analysis, Turkish language.
Received June 18, 2019;
accepted April 18, 2020