Intelligent Deep Learning Empowered Text Detection Model from Natural Scene Images

S. Kiruthika Devi (1), Subalalitha CN (2)
(1) Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, 603203, India
(2) Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, 603203, India
Fulltext View | Download
How to cite (IJASEIT) :
Devi, S. Kiruthika, and Subalalitha CN. “Intelligent Deep Learning Empowered Text Detection Model from Natural Scene Images”. International Journal on Advanced Science, Engineering and Information Technology, vol. 12, no. 3, June 2022, pp. 1263-8, doi:10.18517/ijaseit.12.3.15771.
The scene Text Recognition process has become a hot research topic and a challenging task owing to the complicated background, varying light intensities, colors, font styles, and sizes. Text extraction from natural scene images encompasses two main processes: text detection and text recognition. The latest advancements in Machine Learning (ML) and Deep Learning (DL) concepts can effectually automate the text detection and recognition process by training the model properly. In this view, this paper presents an Automated DL empowered Text Detection model from Natural Scene Images (ADLTD-NSI). The ADLTD-NSI technique includes two important processes: text detection and text recognition. Firstly, a single shot detector (SSD) with Inception-v2 as a baseline model is employed for text detection, an object detector based on the VGG-16 framework for feature map extraction followed by six convolution layers. Secondly, Convolutional Recurrent Neural Network (CRNN) technique is utilized for the text recognition process. Besides, the recurrent layers in the CRNN model utilize long short-term memory (LSTM) for encoding the sequence of feature vectors. Lastly, Connectionist Temporal Classification (CTC) loss is applied to predict text labels equivalent to the sequences from the recurrent layers. A wide range of experiments was carried out on benchmark COCO datasets, and the results are examined in several aspects. The experimental outcomes showcased the better performance of the ADLTD-NSI technique over the other compared methods with a maximum accuracy of 96.78%.

M. Ghosh, S. Chatterjee, H. Mukherjee, S. Sen, and S. M. Obaidullah, "Text/Non-text Scene Image Classification Using Deep Ensemble Network," in Proceedings of International Conference on Advanced Computing Applications, 2022, pp. 561-570.

L. M. Francis and N. Sreenath, "TEDLESS - Text detection using least-square SVM from natural scene," Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 3, pp. 287-299, 2020, doi: 10.1016/j.jksuci.2017.09.001.

J. Diaz-Escobar and V. Kober, "Natural Scene Text Detection and Segmentation Using Phase-Based Regions and Character Retrieval," Mathematical Problems in Engineering, vol. 2020, 2020, doi: 10.1155/2020/7067251.

X. Zhang, X. Gao, and C. Tian, "Text detection in natural scene images based on color prior guided MSER," Neurocomputing, vol. 307, pp. 61-71, 2018, doi: 10.1016/j.neucom.2018.03.070.

S. Y. Arafat and M. J. Iqbal, "Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning," IEEE Access, vol. 8, no. June, pp. 96787-96803, 2020, doi: 10.1109/ACCESS.2020.2994214.

M. Liao et al., "Scene text recognition from two-dimensional perspective," 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pp. 8714-8721, 2019, doi: 10.1609/aaai.v33i01.33018714.

X. Zhu, J. Wang, Z. Hong, T. Xia, and J. Xiao, "Federated learning of unsegmented chinese text recognition model," Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, vol. 2019-November, no. 2018, pp. 1341-1345, 2019, doi: 10.1109/ICTAI.2019.00186.

H. Zhang, Q. Yao, M. Yang, Y. Xu, and X. Bai, "AutoSTR: Efficient Backbone Search for Scene Text Recognition," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12369 LNCS, pp. 751-767, 2020, doi: 10.1007/978-3-030-58586-0_44.

J. Zhang, C. Luo, L. Jin, T. Wang, Z. Li, and W. Zhou, "SaHAN: Scale-aware hierarchical attention network for scene text recognition," Pattern Recognition Letters, vol. 136, pp. 205-211, 2020, doi: 10.1016/j.patrec.2020.06.009.

D. V. Sang and L. T. B. Cuong, "Improving CRNN with EfficientNet-like feature extractor and multi-head atention for text recognition," ACM International Conference Proceeding Series, no. December, pp. 285-290, 2019, doi: 10.1145/3368926.3369689.

Q. Lin, C. Luo, L. Jin, and S. Lai, "STAN: A sequential transformation attention-based network for scene text recognition," Pattern Recognition, vol. 111, p. 107692, 2021, doi: 10.1016/j.patcog.2020.107692.

A. Mirza, O. Zeshan, M. Atif, and I. Siddiqi, "Detection and recognition of cursive text from video frames," Eurasip Journal on Image and Video Processing, vol. 2020, no. 1, 2020, doi: 10.1186/s13640-020-00523-5.

A. Aberdam et al., "Sequence-to-Sequence Contrastive Learning for Text Recognition," 2020.

R. Harizi, R. Walha, F. Drira, and M. Zaied, "Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition," Multimedia Tools and Applications, 2021, doi: https://doi.org/10.1007/s11042-021-10663-z.

C. Luo, L. Jin, and Z. Sun, "MORAN: A Multi-Object Rectified Attention Network for scene text recognition," Pattern Recognition, vol. 90, pp. 109-118, 2019, doi: 10.1016/j.patcog.2019.01.020.

X. Zhou et al., "EAST: An efficient and accurate scene text detector," Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2642-2651, 2017, doi: 10.1109/CVPR.2017.283.

U. Alganci, M. Soydas, and E. Sertel, "Comparative research on deep learning approaches for airplane detection from very high-resolution satellite images," Remote Sensing, vol. 12, no. 3, 2020, doi: 10.3390/rs12030458.

R. Suresh and N. Keshava, "A Survey of Popular Image and Text analysis Techniques," CSITSS 2019 - 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution, Proceedings, 2019, doi: 10.1109/CSITSS47250.2019.9031023.

F. Zhang, J. Luan, Z. Xu, and W. Chen, "DetReco: Object-Text Detection and Recognition Based on Deep Neural Network," Mathematical Problems in Engineering, vol. 2020, 2020, doi: 10.1155/2020/2365076.

Y. Liu, Z. Wang, H. Jin, and I. Wassell, "Synthetically Supervised Feature Learning for Scene Text Recognition," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11209 LNCS, pp. 449-465, 2018, doi: 10.1007/978-3-030-01228-1_27.

L. Chen and S. Li, "Improvement research and application of text recognition algorithm based on CRNn," ACM International Conference Proceeding Series, pp. 166-170, 2018, doi: 10.1145/3297067.3297073.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).