International Journal on Advanced Science, Engineering and Information Technology, Vol. 8 (2018) No. 4-2: Special Issue on Empowering the Nation via 4IR (The Fourth Industrial Revolution)., pages: 1528-1533, Chief Editor: Khairuddin Omar | Editorial Boards : Shahnorbanun Sahran Hassan, Nor Samsiah Sani, Heuiseok Lim & Danial Hoosyar, DOI:10.18517/ijaseit.8.4-2.6959

Multi-Classifier Jawi Handwritten Sub-Word Recognition

Anton Heryanto Hasan, Khairuddin Omar, Muhammad Faidzul Nasrudin

Abstract

The problems and challenges in Jawi handwritten recognition are inherited from Arabic script which consists of cursive natures, large variety of writing styles due to its morphologically rich, ligature, overlapping characters, dialects and the low quality of the manuscripts images. The word segmentation is difficult because the existence of sub words due to the presence of space within words when contain disconnect characters. The performance of previous Jawi handwritten recognition still consider sub-par. There are three main problem of previous approach. First, the recognizer consist of multiple independent components where the improvement of performance in one component not shared across the systems. Secondly, the features extraction using features engineering approach only works on specific subsets of training data and is less capable to handle broader variants of testing data. Finally, the classifier used implicit segmentation where target class is sub-word with limited lexicon. This paper propose use of Deep Learning approach to address the first problem where training is conducted end-to-end from input to class output which enable the improvement of each component to improve overall performance. Secondly, Convolutional Network is use as learning features optimizes the data representation through end-to-end training of the parameters from raw input data to target class. Finally, A multi-classifier implicitly segments the sub-word into sequences of characters are proposed. The classifiers consists of one sub-word length classifier and seven character classifiers. This approach is lexicon-free to address absent of lexicon data. Experiments conducted on a Jawi handwritten standard dataset showed an accuracy of up to 92.20% and suggest that the approach used is superior to state-of-the-art methods of Jawi handwriting recognition.

Keywords:

jawi; handwritten recognition; sub-word; end-to-end learning, learning features; convolutional network.

Viewed: 145 times (since Sept 4, 2017)

cite this paper     download