A New Approach to Automatically Find and Fix
Erroneous Labels in Dependency Parsing Treebanks
Metin Bilgin
Department of Computer Engineering, Bursa Uludağ
University, Turkey
Abstract: Dependency Parsing (DP) is the
existence of sub-term/upper-term relations between the words that make up that
sentence for each sentence in the text. DP serves to produce meaningful
information for high-level applications. Correct labeling of the text corpus
used in DP studies is very important. There will be mistakes in the results of
the studies that will be performed with the wrongly-labeled text corpus. If
text corpus is labeled manually or automatically by human beings, then faulty
cases will occur. As a result of the cases that may arise from human factors or
annotations used for labeling, faulty labels will be on treebanks. In order to
prevent these errors, detection, and correction of possible faulty labeling is
very important in terms of increasing the accuracy of the studies to be carried
out. Manual correction of possible faulty labels requires great effort and
time. The purpose of this study is to create a model that automatically finds
possible faulty labels and offers new label suggestions for faulty labels. With
the help of the proposed model, it is aimed to detect and correct possible
faulty labels that are included in a text corpus, and to increase consistency
among the text corpus of the same language. With the help of the developed
model, suggesting new labels for faulty labels by a language expert will be a
great convenient for the specialist. Another advantage of the model is that the
developed model provides a language-independent structure. It has succeeded in
obtaining successful results in finding and correcting potentially faulty
labels in experimental studies for Turkish. An increase in accuracy has been
detected in studies carried out for languages other than Turkish. In
investigating the accuracy of the results obtained by the system, the results
were analyzed with the help of 10 different language experts.
Keywords: Natural language processing, dependency parsing,
universal dependency, error detection, treebank consistency.
Received July 27, 2020;
accepted January 19, 2021