An Efficient Phase-Based Binarization Method for Degraded Historical Documents

Alaa Sulaiman (1), Khairuddin Omar (2), Mohammad F. Nasrudin (3)
(1) Pattern Recognition Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), 43600, Bangi, Selangor, Malaysia
(2) Pattern Recognition Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), 43600, Bangi, Selangor, Malaysia
(3) Pattern Recognition Research Group, Centre for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), 43600, Bangi, Selangor, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
Sulaiman, Alaa, et al. “An Efficient Phase-Based Binarization Method for Degraded Historical Documents”. International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 6, Dec. 2019, pp. 2207-15, doi:10.18517/ijaseit.9.6.7774.
Document image binarization is the first essential step in digitalizing images and is considered an essential technique in both document image analysis applications and optical character recognition operations, the binarization process is used to obtain a binary image from the original image, binary image is the proper presentation for image segmentation, recognition, and restoration as underlined by several studies which assure that the next step of document image analysis applications depends on the binarization result.  However, old and historical document images mainly suffering from several types of degradations, such as bleeding through the blur, uneven illumination and other types of degradations which makes the binarization process a difficult task. Therefore, extracting of foreground from a degraded background relies on the degradation, furthermore it also depends on the type of used paper and document age. Developed binarization methods are necessary to decrease the impact of the degradation in document background. To resolve this difficulty, this paper proposes an effective, enhanced binarization technique for degraded and historical document images. The proposed method is based on enhancing an existing binarization method by modifying parameters and adding a post-processing stage, thus improving the resulting binary images. This proposed technique is also robust, as there is no need for parameter tuning. After using document image binarization Contest (DIBCO) datasets to evaluate this proposed technique, our findings show that the proposed method efficiency is promising, producing better results than those obtained by some of the winners in the DIBCO.

Sulaiman, A., Omar, K. and Nasrudin, M.F., 2019. Degraded Historical Document Binarization: A Review on Issues, Challenges, Techniques, and Future Directions. Journal of Imaging, 5(4), p.48.

Susan, Seba, and KM Rachna Devi. "Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues." Pattern Analysis and Applications (2019): 1-13.

H.S. Baird, “The state of the art of document image degradation modeling,” In Proc. 4 IAPR Int. Workshop Doc. Anal. Syst. 2000, pp. 1-16.

A. Sulaiman et al., “A database for degraded Arabic historical manuscripts,” in ICEEI 2017 6th Int. Conf., pp. 1-6, 2017.

K. Ntirogiannis et al., “ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014),” ICFHR, pp. 809-813, 2014.

T. Kalaiselvi, “A comparative study on thresholding techniques for gray image binarization,” IJARCS, vol. 8, no. 7, pp. 1168-1172, 2017.

W. Niblack, An Introduction to Digital Image Processing, Strandberg Publishing Company, 1985.

J. Sauvola and M. Pietikí¤inen, “Adaptive document image binarization,” Pattern Recognition, vol. 33, no. 2, pp. 225-236, 2000.

L. P. Saxena, “Niblack’s binarization method and its modifications to real-time applications: a review,” ArtificialIntell.Review,2017.

Yahya, Sitti Rachmawati, et al. "Image enhancement background for high damage Malay manuscripts using adaptive Threshold Binarization." International Journal on Advanced Science, Engineering and Information Technology 8.4-2 (2018): 1552-1564.

I. Pratikakis et al., “ICFHR 2016 handwritten document image binarization contest,” Proc. ICFHR, 2016, pp. 619-623.

I. Pratikakis et al., “ICDAR2017 competition on document image binarization,” 2017 14th IAPR ICDAR, 2017, pp.1395-1403.

S.M. Ayatollahi and H. Ziaei Nafchi, “Persian heritage image binarization competition (PHIBC 2012),” in 1st Iranian Conf. Pattern Recognit. Image Anal. PRIA, 2013.

N. R. Howe, “Document binarization with automatic parameter tuning,” Proc. Int. Conf. Doc. Anal. Recog., vol. 16, no. 3, pp. 247-258, 2013.

N. R. Howe, “A Laplacian energy for document binarization,” Proc. ICDAR, pp. 6-10, 2011.

C. Tensmeyer and T. Martinez, “Document image binarization with fully convolutional neural networks,” 2017.

B. Su et al., “Binarization of historical document images using the local maximum and minimum,” Proc. 9th IAPR Int. Workshop on Doc. Anal. Syst., 2010, pp. 159-166.

J. Burie, et al., “ICFHR 2016 competition on the analysis of handwritten text in images of Balinese palm leaf manuscripts,” Proc. of Int. Conf. Frontiers Handwriting Recog.ICFHR,2016, pp.0-5,

B. Gatos et al., “DIBCO 2009: document image binarization contest,” Int. J. Doc. Anal. Recog, vol. 14., no. 1, pp. 35-44, 2011.

I. Pratikakis et al., “H-DIBCO 2010 - handwritten document image binarization competition,” Proc. 12th ICFHR,2010, pp.727-732.

I. Pratikakis et al., “ICDAR 2011 document image binarization contest,” ICDAR, 2010, pp. 727-732,

I. Pratikakis et al., “ICFHR 2012 competition on handwritten document image binarization,” ICFHR, pp. 12, 18-20, 2012.

I. Pratikakis et al., “ICDAR 2013 document image binarization contest (DIBCO 2013),” Proc. ICDAR, 2013, pp.1471-1476.

N. Otsu, “A threshold selection method from gray-level histograms”,IEEE Trans. Sys., Man., Cyber, vol. 9, Pp. 62-66, 1979.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).