Lossless Text Compression Technique Using Syllable Based Morphology

Lossless Text Compression Technique Using Syllable Based Morphology

Ibrahim Akman1, Hakan Bayindir1, Serkan Ozleme2, Zehra Akin3, and Sanjay Misra1
1Computer Engineering Department, Atilim University, Turkey
2Parana Vision Image Processing Technologies and Solutions Consultancy Corporation, Turkey
3Meteksan Systems and Computer Technologies Corporation, Turkey


Abstract: In this paper, we present a new lossless text compression technique which utilizes syllable-based morphology of multi-syllabic languages. The proposed algorithm is designed to partition words into its syllables and then to produce their shorter bit representations for compression. The method has six main components namely source file, filtering unit, syllable unit, compression unit, dictionary file and target file. The number of bits in coding syllables depends on the number of entries in the dictionary file. The proposed algorithm is implemented and tested using 20 different texts of different lengths collected from different fields. The results indicated a compression of up to 43%.

Keywords: Algorithm, text compression technique, syllable, multi-syllabic languages.

Received December 15, 2008; accepted August 3, 2010

Full Text
Read 2985 times Last modified on Sunday, 05 December 2010 01:58
Share
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…