Research and Development of Feature Extraction from Myanmar Palm Leaf Manuscripts for the Myanmar Character Recognition System

Nwe Nwe Soe (1), Win Htay (2)
(1) Faculty of Computer Science Department, University of Computer Studies (Thaton), Thaton, 12043, Myanmar
(2) Principal, University of Computer Studies (Thaton), Thaton, 12043, Myanmar
Fulltext View | Download
How to cite (IJASEIT) :
Soe, Nwe Nwe, and Win Htay. “Research and Development of Feature Extraction from Myanmar Palm Leaf Manuscripts for the Myanmar Character Recognition System”. International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 6, Dec. 2019, pp. 2216-22, doi:10.18517/ijaseit.9.6.9001.
This paper proposed Myanmar palm leaf manuscript handwriting OCR system. Each text area in the Myanmar palm-leaf manuscript is segmented. This segmented character text image is needed to be recognized to transform to Myanmar handwritten characters which express Myanmar’s precious historical and invaluable information. This paper involves two essential steps: preprocessing and feature extraction. The preprocessing is carried out to extract the attractive palm-leaf manuscript region from the Images automatically are taken by the camera and to support the enhanced images for subsequence processes of Myanmar character recognition from Myanmar palm leaves. The one-dimensional segmentation approach is used to crop leaf area in the image which is taken with high resolution. Line count analysis is also done to extract the region for using enough line count. After that, line segmentation is carried out using Object Frequency Histogram along the horizontal lines which can find the best optimal points between the lines. Similarly, the same technique but vertically is used to get each character or smallest group of characters. Totally 18 features are extracted to recognize the Myanmar palm-leaf manuscript characters. Although the experimental results are good enough but some difficulties are still needed to take account related to the connected components. 

Alahakoon, C. N. K., “Identification of physical problems of major palm leaf manuscripts collections”, Sri Lanka. J. Univ. Libr. Assoc. Sri Lanka, 2006, October, pp.54-65.

Nwe Nwe Soe, Win Htay, “Finding region of interest and automatic cropping from Palm leaf manuscripts by using one-dimensional segmentation”, 14th ICCA Conference, 2016, February.

Nwe Nwe Soe, Win Htay, “Syllabus segmentation from Palm leaf manuscripts”, 16th ICCA Conference, 2018, February.

Nwe Nwe Soe, “Syllabus Line Segmentation from Palm Leaf Manuscripts by using Vector Neural Network”, Journal of Applied Informatics and Technology (JIT), Thailand, 2018, Volume-1, Number 1, January - June.

Kumar, Neethu S., Dwivedi Sanjeet Kumar, S. Swathikiran, and Alex Pappachen James. "Ancient Indian document analysis using cognitive memory network." In Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on, 2014, pp. 2665- 2668. IEEE.

Likforman-Sulem, Laurence, Abderrazak Zahour, Bruno Taconet. "Text line segmentation of historical documents: a survey." International Journal of Document Analysis and Recognition (IJDAR) 9, No. 2-4, 2007, pp.123-138.

Lakshmi, T. R., Panyam Narahari Sastry, Ramakrishnan Krishnan, N. V. Rao, and T. V. Rajinikanth. "Analysis of Telugu Palm Leaf Character Recognition Using 3D Feature." In Computational Intelligence and Networks (CINE), 2015 International Conference on, 2015, pp. 36-41. IEEE.

Soumya, A., G. Hemantha Kumar. "Fourier Features for the Recognition of Ancient Kannada Text." In Computational Intelligence in Data Mining—Volume 1, 2016, pp. 421-428, Springer India.

R. Manmatha and J. L. Rothfeder, "A scale space approach for automatically segmenting words from historical handwritten documents," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27,2005, pp. 1212-1225.

V. Lavrenko, et al., "Holistic word recognition for handwritten historical documents", in Document Image Analysis for Libraries, 2004. Proceedings, First International Workshop on, pp. 278-287.

A. Zahour, et al., "Arabic hand-written text-line extraction," in Proceedings. Sixth International Conference on Document Analysis and Recognition, 2001, pp. 281-285.

Y. H. Tseng and H. J. Lee, "Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm," Pattern Recognition Letters, vol. 20, pp. 791-806, 1999.

O. Surinta, "Optimization of line segmentation techniques for Thai handwritten documents," in Eighth International Symposium on Natural Language Processing, 2009, pp. 180-183.

M. Arivazhagan, et al., "A statistical approach to line segmentation in handwritten documents," in Proc. SPIE on Document Recognition and Retrieval XIV, CA, USA, 2007.

R. Chamchong and C. C. Fung, "Character segmentation from ancient palm leaf manuscripts in Thailand," in Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, Beijing, China, 2011.

N. Tripathy and U. Pal, "Handwriting Segmentation of Unconstrained Oriya Text," presented at the Ninth International Workshop on Frontiers in Handwriting Recognition (IWFHR'04), 2005.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).