Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (3): 473-481.DOI: 10.11996/JG.j.2095-302X.2023030473

Previous Articles     Next Articles

Natural scene text detection based on attention mechanism and deep multi-scale feature fusion

LI Yu1(), YAN Tian-tian1, ZHOU Dong-sheng1,2(), WEI Xiao-peng2   

  1. 1. National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, Dalian Liaoning 116622, China
    2. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Received:2022-10-27 Accepted:2023-01-12 Online:2023-06-30 Published:2023-06-30
  • Contact: ZHOU Dong-sheng (1978-), professor, Ph.D. His main research interests cover computer graphics and vision, human-robot interaction, etc. E-mail:zhouds@dlu.edu.cn
  • About author:

    LI Yu (1997-), master student. Her main research interest covers computer vision. E-mail:y18337275282@163.com

  • Supported by:
    Key Program of National Natural Science Foundation of China(U21A20491);Special Project of Central Government Guiding Local Science and Technology Development(2021JH6/10500140);Program for Innovative Research Team in University of Liaoning Province(LT2020015);Support Plan for Key Field Innovation Team of Dalian(2021RT06)

Abstract:

A scene text detection method based on attention mechanism and deep multi-scale feature fusion was proposed to address the issue that existing scene text detection methods cannot deeply mine and fully fuse discriminative multi-scale text instance features. The ResNeSt50 network with attention enhancement served as the backbone network to extract more discriminative feature representation related to text instance across different scales. Furthermore, a deep multi-scale feature fusion module was designed to interact with feature information related to feature maps of different scales. This module adaptively learned the corresponding weight matrix related to feature maps of different scales, which were used to further mine and fuse discriminative feature information about text instances on feature maps of different scales, thus yielding a robust multi-scale fusion feature map. Finally, an adaptive binarization post-processing module was adopted to generate a more accurate text area bounding box. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on ICDAR2015, ICDAR2013, and CTW1500 datasets. The results demonstrated that the proposed method achieved competitive detection results compared with other advanced detection methods and presented excellent robustness and generalization ability.

Key words: natural scene text detection, attention mechanism, multi-scale feature fusion, binarization, adaptive

CLC Number: