Machine Translation Infrastructure for Turkic Languages (MT-Turk)
Emel Alkım and Yalçın Çebi
Department of Computer Engineering, Dokuz Eylul University,
Turkey
Abstract: In this study, a multilingual, extensible machine
translation infrastructure for grammatically similar Turkic languages “MT-Turk”
is presented. MT-Turk infrastructure has multi-word support and is designed
using a combined rule-based translation approach thatunites the strengths of
interlingual and transfer approaches. This resulted in achieving ease of extensibility
by adding new Turkic languages. The new language can be used both as
destination and as source language achieving two-way extensibility. In addition,
the infrastructure is strengthened with the ability of learning from previous
translations and using the suggestions of previous users for disambiguation. Finally,
the success of MT-Turk for three Turkic languages -Turkish, Kirghiz and Kazan-
is evaluated using BiLingual Evaluation Understudy (BLEU) metric and it is seen
that the suggestion system improved the success by 43.66% in average. Although
the lack of linguistic resources affected the success of the system negatively,
this study led to the introduction of an extensible infrastructure that can
learn from previous translations.
Keywords: Rule-based machine translation, Turkic
languages, semi-language specific interlingua and disambiguation by suggestions.