欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (1): 56-64.DOI: 10.11996/JG.j.2095-302X.2024010056

• 图像处理与计算机视觉 • 上一篇    下一篇

基于增强特征提取网络与语义特征融合的多方向文本检测

吕伶1(), 李华1(), 王武2   

  1. 1.长春理工大学计算机科学技术学院,吉林 长春 130000
    2.北方导航控制技术股份有限公司,北京 100000
  • 收稿日期:2023-07-18 接受日期:2023-10-25 出版日期:2024-02-29 发布日期:2024-02-29
  • 通讯作者:李华(1977-),女,教授,博士。主要研究方向为计算机视觉、虚拟现实技术。E-mail:lihua@cust.edu.cn
  • 第一作者:吕伶(1999-),女,硕士研究生。主要研究方向为计算机视觉。E-mail:15382351657@163.com
  • 基金资助:
    吉林省自然科学基金项目(20210101412JC)

Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature

LV Ling1(), LI Hua1(), WANG Wu2   

  1. 1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun Jilin 130000, China
    2. North Navigation Control Technology Co., Ltd, Beijing 100000, China
  • Received:2023-07-18 Accepted:2023-10-25 Published:2024-02-29 Online:2024-02-29
  • First author:LV Ling (1999-), master student. Her main research interest covers computer vision. E-mail:15382351657@163.com
  • Supported by:
    Jilin Natural Science Foundation(20210101412JC)

摘要:

针对自然场景文本长度不定、角度倾斜等难题,提出了一种基于增强特征提取网络与语义特征融合的文本检测方法。通过结合可变形卷积与空洞卷积,设计了一种增强扩张残差模块EDRM (Enhanced Dilated Residual Module),将其应用于ResNet18的conv4_x与conv5_x层,并以此作为骨干网络,在改善网络特征提取能力的同时提高特征图像分辨率,减少空间信息丢失。其次,针对现有算法提取文本语义特征仍不充分的问题,将双向长短期记忆网络BiLSTM (Bi-directional Long Short-Term Memory)引入特征融合部分,增强融合特征图对自然场景文本的表征能力以及特征序列的关联性,同时提高模型的文本定位能力。在多方向文本数据集ICDAR2015、长文本数据集MSRA-TD500上对模型展开评估,实验结果表明,该算法与当下高效的DBNet算法相比,F值分别提升1.8%、3.3%,表现出良好的竞争力。

长春理工大学李华教授及其学生吕伶等提出一种基于增强特征提取网络与语义特征融合的文本检测方法。通过结合可变形卷积与空洞卷积,设计一种增强扩张残差模块,将其应用于ResNet18conv4_xconv5_x层,以此改善网络特征提取能力,减少空间信息丢失。与此同时,将双向长短期记忆网络引入特征融合部分,增强融合特征图对自然场景文本的表征能力以及特征序列的关联性,提高对文本的定位能力。

关键词: 可变形卷积, 空洞卷积, 文本检测, 语义特征, 双向长短期记忆网络

Abstract:

A text detection method was proposed based on an enhanced feature extraction network and semantic feature fusion, thus addressing the challenges such as variable length and oblique angle of scene text. An enhanced dilated residual module (EDRM) was designed by combining deformable convolution with atrous convolution for the layers conv4_x and conv5_x of ResNet18. This module served as the backbone network, enhancing the capability of feature extraction while increasing the feature map resolution and reducing the loss of spatial information. Secondly, to address the inadequacies of the existing algorithms in extracting text semantic features, bi-directional long short-term memory (BiLSTM) was applied to the feature fusion section, enhancing the representation ability of fusion feature map for scene text, the correlation of feature sequences, and the text localization ability of the model. The model was evaluated on the multi-directional text dataset ICDAR2015 and the long text dataset MSRA-TD500. The results demonstrated that compared with the current efficient DBNet algorithm, the F value of the proposed algorithm increased by 1.8% and 3.3 %, respectively, showing strong competitiveness.

Key words: deformable convolution, atrous convolution, text detection, semantic feature, bi-directional long short-term memory

中图分类号: