Detection and Compensation of Undesirable Discontinuities within the Farsi/Arabic Subwords
Majid Ziaratban and Karim Faez
Electrical Engineering Department, Amirkabir University of Technology, Iran
Electrical Engineering Department, Amirkabir University of Technology, Iran
Abstract: In this paper, an unexplored subject in the domains of Farsi/Arabic handwritten word preprocessing is introduced. Subwords play a vital role in many applications such as cheque amount recognition, text recognition, lexicon reduction and subword-based word recognition. Correcting the faults occurred in subwords will improve the overall performance of these applications. A subword is a connected-component in the main body of a word. The occurrence of a discontinuity in a subword, divides the subword into two isolated parts. These parts are detected as two incorrect subwords. In our algorithm, before correcting these faults, the baseline of each subword is corrected using the proposed baseline correction method. Then, to limit the exploration area in matching stage, the dots are removed. Undesirable discontinuities in subwords are detected by using a template matching algorithm. Disconnected parts of a subword are joined together by using three different methods. Experiments show that the cubic polynomial-based compensation method causes the best results and 2.87 % improvement in the subword recognition rate.
Keywords: Detection, compensation, Farsi/Arabic subword, and cubic polynomial curve fitting.
Received May 12, 2009; accepted November 5, 2009