Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 56-64.DOI: 10.11996/JG.j.2095-302X.2024010056

• Image Processing and Computer Vision • Previous Articles     Next Articles

Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature

LV Ling1(), LI Hua1(), WANG Wu2   

  1. 1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun Jilin 130000, China
    2. North Navigation Control Technology Co., Ltd, Beijing 100000, China
  • Received:2023-07-18 Accepted:2023-10-25 Online:2024-02-29 Published:2024-02-29
  • Contact: LI Hua (1977-), professor, Ph.D. Her main research interests cover computer vision and virtual reality. E-mail:lihua@cust.edu.cn
  • About author:

    LV Ling (1999-), master student. Her main research interest covers computer vision. E-mail:15382351657@163.com

  • Supported by:
    Jilin Natural Science Foundation(20210101412JC)

Abstract:

A text detection method was proposed based on an enhanced feature extraction network and semantic feature fusion, thus addressing the challenges such as variable length and oblique angle of scene text. An enhanced dilated residual module (EDRM) was designed by combining deformable convolution with atrous convolution for the layers conv4_x and conv5_x of ResNet18. This module served as the backbone network, enhancing the capability of feature extraction while increasing the feature map resolution and reducing the loss of spatial information. Secondly, to address the inadequacies of the existing algorithms in extracting text semantic features, bi-directional long short-term memory (BiLSTM) was applied to the feature fusion section, enhancing the representation ability of fusion feature map for scene text, the correlation of feature sequences, and the text localization ability of the model. The model was evaluated on the multi-directional text dataset ICDAR2015 and the long text dataset MSRA-TD500. The results demonstrated that compared with the current efficient DBNet algorithm, the F value of the proposed algorithm increased by 1.8% and 3.3 %, respectively, showing strong competitiveness.

Key words: deformable convolution, atrous convolution, text detection, semantic feature, bi-directional long short-term memory

CLC Number: