欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (3): 473-481.DOI: 10.11996/JG.j.2095-302X.2023030473

• 图像处理与计算机视觉 • 上一篇    下一篇

基于注意力机制与深度多尺度特征融合的自然场景文本检测

李雨1(), 闫甜甜1, 周东生1,2(), 魏小鹏2   

  1. 1.大连大学软件工程学院计算机辅助设计国家地方联合工程实验室,辽宁 大连 116622
    2.大连理工大学计算机科学与技术学院,辽宁 大连116024
  • 收稿日期:2022-10-27 接受日期:2023-01-12 出版日期:2023-06-30 发布日期:2023-06-30
  • 通讯作者: 周东生(1978-),男,教授,博士。主要研究方向为计算机图形学及视觉、人机交互等。E-mail:zhouds@dlu.edu.cn
  • 作者简介:

    李雨(1997-),女,硕士研究生。主要研究方向为计算机视觉。E-mail:y18337275282@163.com

  • 基金资助:
    国家自然科学基金重点项目(U21A20491);辽宁省中央引导地方科技发展专项(2021JH6/10500140);辽宁省高等学校创新团队支持计划项目(LT2020015);大连市重点领域创新团队支持计划项目(2021RT06)

Natural scene text detection based on attention mechanism and deep multi-scale feature fusion

LI Yu1(), YAN Tian-tian1, ZHOU Dong-sheng1,2(), WEI Xiao-peng2   

  1. 1. National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, Dalian Liaoning 116622, China
    2. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Received:2022-10-27 Accepted:2023-01-12 Online:2023-06-30 Published:2023-06-30
  • Contact: ZHOU Dong-sheng (1978-), professor, Ph.D. His main research interests cover computer graphics and vision, human-robot interaction, etc. E-mail:zhouds@dlu.edu.cn
  • About author:

    LI Yu (1997-), master student. Her main research interest covers computer vision. E-mail:y18337275282@163.com

  • Supported by:
    Key Program of National Natural Science Foundation of China(U21A20491);Special Project of Central Government Guiding Local Science and Technology Development(2021JH6/10500140);Program for Innovative Research Team in University of Liaoning Province(LT2020015);Support Plan for Key Field Innovation Team of Dalian(2021RT06)

摘要:

针对现有场景文本检测方法不能深入挖掘并充分融合多尺度文本实例判别性特征的问题,提出一种基于注意力机制与深度多尺度特征融合的自然场景文本检测方法。首先采用带有注意力增强的ResNeSt50作为骨干网络,提取文本实例在不同尺度上更具判别力的特征表示;然后设计深度多尺度特征融合模块,将不同尺度的特征信息进行交互,自适应地学习不同尺度特征图对应的权重矩阵,用于融合文本实例在不同尺度特征图上具有判别力的特征信息,从而获得更具鲁棒性的多尺度融合特征图;最后利用自适应的二值化后处理模块生成更加精确的文本区域边界框。为评估其有效性,大量实验在ICDAR2015,ICDAR2013和CTW1500数据集上进行验证,结果表明该方法相较于其他先进的检测方法取得了有竞争力的检测结果,展现出良好的鲁棒性和泛化能力。

关键词: 自然场景文本检测, 注意力机制, 多尺度特征融合, 二值化, 自适应

Abstract:

A scene text detection method based on attention mechanism and deep multi-scale feature fusion was proposed to address the issue that existing scene text detection methods cannot deeply mine and fully fuse discriminative multi-scale text instance features. The ResNeSt50 network with attention enhancement served as the backbone network to extract more discriminative feature representation related to text instance across different scales. Furthermore, a deep multi-scale feature fusion module was designed to interact with feature information related to feature maps of different scales. This module adaptively learned the corresponding weight matrix related to feature maps of different scales, which were used to further mine and fuse discriminative feature information about text instances on feature maps of different scales, thus yielding a robust multi-scale fusion feature map. Finally, an adaptive binarization post-processing module was adopted to generate a more accurate text area bounding box. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on ICDAR2015, ICDAR2013, and CTW1500 datasets. The results demonstrated that the proposed method achieved competitive detection results compared with other advanced detection methods and presented excellent robustness and generalization ability.

Key words: natural scene text detection, attention mechanism, multi-scale feature fusion, binarization, adaptive

中图分类号: