Journal of Graphics ›› 2023, Vol. 44 ›› Issue (3): 473-481.DOI: 10.11996/JG.j.2095-302X.2023030473
Previous Articles Next Articles
LI Yu1(), YAN Tian-tian1, ZHOU Dong-sheng1,2(
), WEI Xiao-peng2
Received:
2022-10-27
Accepted:
2023-01-12
Online:
2023-06-30
Published:
2023-06-30
Contact:
ZHOU Dong-sheng (1978-), professor, Ph.D. His main research interests cover computer graphics and vision, human-robot interaction, etc. E-mail:zhouds@dlu.edu.cn
About author:
LI Yu (1997-), master student. Her main research interest covers computer vision. E-mail:y18337275282@163.com
Supported by:
CLC Number:
LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion[J]. Journal of Graphics, 2023, 44(3): 473-481.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023030473
添加的模块 | 评估结果(%) | |||||
---|---|---|---|---|---|---|
ResNet50 | ResNeSt50 | MFCAM | AFM | P | R | F |
√ | - | - | - | 85.0 | 82.7 | 83.8 |
- | √ | - | - | 87.2 | 83.2 | 85.2 |
- | √ | √ | - | 90.6 | 84.0 | 87.2 |
- | √ | - | √ | 88.3 | 84.5 | 86.4 |
- | √ | √ | √ | 91.3 | 85.7 | 88.4 |
Table 1 Results of ablation experiments
添加的模块 | 评估结果(%) | |||||
---|---|---|---|---|---|---|
ResNet50 | ResNeSt50 | MFCAM | AFM | P | R | F |
√ | - | - | - | 85.0 | 82.7 | 83.8 |
- | √ | - | - | 87.2 | 83.2 | 85.2 |
- | √ | √ | - | 90.6 | 84.0 | 87.2 |
- | √ | - | √ | 88.3 | 84.5 | 86.4 |
- | √ | √ | √ | 91.3 | 85.7 | 88.4 |
方法 | P | R | F |
---|---|---|---|
TextBox++[ | 87.2 | 76.7 | 81.7 |
OPMP[ | 89.1 | 85.5 | 87.3 |
ASBNet[ | 78.2 | 84.3 | 81.2 |
ERFFC[ | 85.4 | 78.9 | 82.0 |
PSENet[ | 86.9 | 84.5 | 85.7 |
SADA[ | 88.8 | 82.6 | 85.6 |
EFPN[ | 89.2 | 82.0 | 85.5 |
TDMF[ | 79.9 | 85.3 | 82.5 |
Quadbox[ | 88.7 | 81.8 | 85.1 |
FDTA[ | 89.0 | 81.2 | 84.9 |
TextSnake[ | 84.9 | 80.4 | 82.6 |
DBNet++[ | 90.9 | 83.9 | 87.3 |
文献[ | 80.3 | 69.1 | 74.5 |
本文 | 91.3 | 85.7 | 88.4 |
Table 2 Comparison results on ICDAR2015 dataset (%)
方法 | P | R | F |
---|---|---|---|
TextBox++[ | 87.2 | 76.7 | 81.7 |
OPMP[ | 89.1 | 85.5 | 87.3 |
ASBNet[ | 78.2 | 84.3 | 81.2 |
ERFFC[ | 85.4 | 78.9 | 82.0 |
PSENet[ | 86.9 | 84.5 | 85.7 |
SADA[ | 88.8 | 82.6 | 85.6 |
EFPN[ | 89.2 | 82.0 | 85.5 |
TDMF[ | 79.9 | 85.3 | 82.5 |
Quadbox[ | 88.7 | 81.8 | 85.1 |
FDTA[ | 89.0 | 81.2 | 84.9 |
TextSnake[ | 84.9 | 80.4 | 82.6 |
DBNet++[ | 90.9 | 83.9 | 87.3 |
文献[ | 80.3 | 69.1 | 74.5 |
本文 | 91.3 | 85.7 | 88.4 |
方法 | P | R | F |
---|---|---|---|
OPMP[ | 85.1 | 80.8 | 82.9 |
TS[ | 78.2 | 77.8 | 78.0 |
Non-Local PAN [ | 78.9 | 83.8 | 81.3 |
ASBNet[ | 85.1 | 75.5 | 80.0 |
PSENet[ | 80.6 | 75.6 | 78.0 |
SADA [ | 86.2 | 80.4 | 83.2 |
TextSnake[ | 67.9 | 85.3 | 75.6 |
TextRay[ | 80.4 | 82.8 | 81.6 |
ATRR[ | 80.1 | 80.2 | 80.1 |
本文 | 81.6 | 87.2 | 84.3 |
Table 3 Comparison results on CTW1500 dataset (%)
方法 | P | R | F |
---|---|---|---|
OPMP[ | 85.1 | 80.8 | 82.9 |
TS[ | 78.2 | 77.8 | 78.0 |
Non-Local PAN [ | 78.9 | 83.8 | 81.3 |
ASBNet[ | 85.1 | 75.5 | 80.0 |
PSENet[ | 80.6 | 75.6 | 78.0 |
SADA [ | 86.2 | 80.4 | 83.2 |
TextSnake[ | 67.9 | 85.3 | 75.6 |
TextRay[ | 80.4 | 82.8 | 81.6 |
ATRR[ | 80.1 | 80.2 | 80.1 |
本文 | 81.6 | 87.2 | 84.3 |
方法 | P | R | F |
---|---|---|---|
TextBox++[ | 86.0 | 74.0 | 80.0 |
Faster-RCNN[ | 71.0 | 75.0 | 73.0 |
TS[ | 88.0 | 81.7 | 84.7 |
文献[ | 80.8 | 69.1 | 74.5 |
文献[ | 88.0 | 84.5 | 86.2 |
TDMF[ | 83.2 | 68.4 | 73.0 |
QuadBox[ | 88.0 | 81.0 | 84.0 |
文献[ | 80.8 | 69.1 | 74.5 |
TransDETR[ | 80.6 | 70.2 | 75.0 |
SRMCA[ | 79.0 | 81.0 | 80.0 |
本文 | 89.5 | 85.8 | 87.6 |
Table 4 Comparison results on ICDAR2013 dataset (%)
方法 | P | R | F |
---|---|---|---|
TextBox++[ | 86.0 | 74.0 | 80.0 |
Faster-RCNN[ | 71.0 | 75.0 | 73.0 |
TS[ | 88.0 | 81.7 | 84.7 |
文献[ | 80.8 | 69.1 | 74.5 |
文献[ | 88.0 | 84.5 | 86.2 |
TDMF[ | 83.2 | 68.4 | 73.0 |
QuadBox[ | 88.0 | 81.0 | 84.0 |
文献[ | 80.8 | 69.1 | 74.5 |
TransDETR[ | 80.6 | 70.2 | 75.0 |
SRMCA[ | 79.0 | 81.0 | 80.0 |
本文 | 89.5 | 85.8 | 87.6 |
[1] | 王建新, 王子亚, 田萱. 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报, 2020, 31(5): 1465-1496. |
WANG J X, WANG Z Y, TIAN X. Review of natural scene text detection and recognition based on deep learning[J]. Journal of Software, 2020, 31(5): 1465-1496. (in Chinese) | |
[2] | 刘崇宇, 陈晓雪, 罗灿杰, 等. 自然场景文本检测与识别的深度学习方法[J]. 中国图象图形学报, 2021, 26(6): 1330-1367. |
LIU C Y, CHEN X X, LUO C J, et al. Deep learning methods for scene text detection and recognition[J]. Journal of Image and Graphics, 2021, 26(6): 1330-1367. (in Chinese) | |
[3] |
KIM K I, JUNG K, KIM J H. Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(12): 1631-1639.
DOI URL |
[4] |
MINETTO R, THOME N, CORD M, et al. T-HOG: an effective gradient-based descriptor for single line text regions[J]. Pattern Recognition, 2013, 46(3): 1078-1090.
DOI URL |
[5] | LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image, 2018, 27(8): 3676-3690. |
[6] |
ZHANG S, LIU Y L, JIN L W, et al. OPMP: an omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection[J]. IEEE Transactions on Multimedia, 2021, 23: 454-467.
DOI URL |
[7] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6): 1137-1149.
DOI URL |
[8] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2016: 21-37. |
[9] | 易尧华, 何婧婧, 卢利琼, 等. 顾及目标关联的自然场景文本检测[J]. 中国图象图形学报, 2020, 25(1): 126-135. |
YI Y H, HE J J, LU L Q, et al. Association of text and other objects for text detection with natural scene images[J]. Journal of Image and Graphics, 2020, 25(1): 126-135. (in Chinese) | |
[10] | WANG C, ZHAO S, ZHU L, et al. Semi-supervised pixel-level scene text segmentation by mutually guided network[J]. IEEE Transactions on Image, 2021, 30(5): 8212-8221. |
[11] | 师广琛, 巫义锐. 像素聚合和特征增强的任意形状场景文本检测[J]. 中国图象图形学报, 2021, 26(7): 1614-1624. |
SHI G C, WU Y R. Arbitrary shape scene-text detection based on pixel aggregation and feature enhancement[J]. Journal of Image and Graphics, 2021, 26(7): 1614-1624. (in Chinese) | |
[12] |
LIAO M H, LYU P Y, HE M H, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 532-548.
DOI URL |
[13] |
HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 386-397.
DOI PMID |
[14] |
LIAO M H, ZOU ZS, WAN Z Y, et al. Real-time scene text detection with differentiable binarization[J] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
DOI URL |
[15] |
HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023.
DOI PMID |
[16] | WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19. |
[17] | ZHANG L, LIU Y F, XIAO H, et al. Efficient scene text detection with textual attention tower[C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2020: 4272-4276. |
[18] | LIU X H, CHEN X K, KUANG H L, et al. A multi-level feature fusion network for scene text detection with text attention mechanism[C]// 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference. New York: IEEE Press, 2021: 954-958. |
[19] | 梁浩然, 叶凌晨, 梁荣华, 等. 注意力监督策略下的自然场景文本检测算法[J]. 计算机辅助设计与图形学学报, 2022, 34(7): 1011-1019. |
LIANG H R, YE L C, LIANG R H, et al. Text detection algorithm for natural scenes under attention supervision strategy[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(7): 1011-1019. (in Chinese) | |
[20] | 李晓玉, 宋永红, 余涛. 结合感受野增强和全卷积网络的场景文字检测方法[J]. 自动化学报, 2022, 48(3): 797-807. |
LI X Y, SONG Y H, YU T. Text detection in natural scene images based on enhanced receptive field and fully convolution network[J]. Acta Automatica Sinica, 2022, 48(3): 797-807. (in Chinese) | |
[21] |
杨锶齐, 易尧华, 汤梓伟, 等. 嵌入注意力机制的自然场景文本检测方法[J]. 计算机工程与应用, 2021, 57(24): 185-191.
DOI |
YANG S Q, YI Y H, TANG Z W, et al. Text detection in natural scenes embedded attention mechanism[J]. Computer Engineering and Applications, 2021, 57(24): 185-191. (in Chinese)
DOI |
|
[22] | LI X, WANG W H, HOU W B, et al. Shape robust text detection with progressive scale expansion network[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9328-9337. |
[23] |
WANG Y X, XIE H T, ZHA Z J, et al. R-net: a relationship network for efficient and accurate scene text detection[J]. IEEE Transactions on Multimedia, 2021, 23: 1316-1329.
DOI URL |
[24] | SHAO H L, JI Y, LI Y, et al. BDFPN: Bi-direction feature pyramid network for scene text detection[C]// 2021 International Joint Conference on Neural Networks. New York: IEEE Press, 2021: 1-8. |
[25] | ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2022: 2735-2745. |
[26] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[27] | KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]// The 13th International Conference on Document Analysis and Recognition. New York: IEEE Press, 2015: 1156-1160. |
[28] |
DAI P W, LI Y, ZHANG H, et al. Accurate scene text detection via scale-aware data augmentation and shape similarity constraint[J]. IEEE Transactions on Multimedia, 2021, 24: 1883-1895.
DOI URL |
[29] | LIAO M H, ZOU Z S, WAN Z Y, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[EB/OL]. (2022-02-21) [2022-10-22]. https://arxiv.org/abs/2202.10304. |
[30] |
陈卓, 王国胤, 刘群. 结合多粒度特征融合的自然场景文本检测方法[J]. 计算机科学, 2021, 48(12): 243-248.
DOI |
CHEN Z, WANG G Y, LIU Q. Natural scene text detection algorithm combining multi-granularity feature fusion[J]. Computer Science, 2021, 48(12): 243-248. (in Chinese)
DOI |
|
[31] |
KESERWANI P, DHANKHAR A, SAINI R, et al. Quadbox: quadrilateral bounding box based scene text detection using vector regression[J]. IEEE Access, 2021, 9: 36802-36818.
DOI URL |
[32] |
CAO Y C, MA S S, PAN H C. FDTA: fully convolutional scene text detection with text attention[J]. IEEE Access, 2020, 8: 155441-155449.
DOI URL |
[33] | LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 19-35. |
[34] |
邵海琳, 季怡, 刘纯平, 等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022, 49(2): 248-255.
DOI |
SHAO H L, JI Y, LIU C P, et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer Science, 2022, 49(2): 248-255. (in Chinese)
DOI |
|
[35] |
杨剑锋, 王润民, 何璇, 等. 基于FCN的多方向自然场景文字检测方法[J]. 计算机工程与应用, 2020, 56(2): 164-170.
DOI |
YANG J F, WANG R M, HE X, et al. Multi-oriented natural scene text detection algorithm based on FCN[J]. Computer Engineering and Applications, 2020, 56(2): 164-170. (in Chinese)
DOI |
|
[36] | WANG F F, CHEN Y F, WU F, et al. TextRay: contour-based geometric modeling for arbitrary-shaped scene text detection[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 111-119. |
[37] | WANG X B, JIANG Y Y, LUO Z B, et al. Arbitrary shape scene text detection with adaptive text region representation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 6442-6451. |
[38] | WU W J, ZHANG D B, FU Y, et al. End-to-end video text spotting with transformer[EB/OL]. [2022-05-10]. https://www.researchgate.net/publication/359390680_End-to-End_Video_Text_Spotting_with_Transformer. |
[39] | LIU S P, XIAN Y T, LI H F, et al. Text detection in natural scene images using morphological component analysis and Laplacian dictionary[J]. IEEE/CAA Journal of Automatica Sinica, 20207, 7(1): 214-222. |
[1] |
LI Li-xia , WANG Xin, WANG Jun , ZHANG You-yuan.
Small object detection algorithm in UAV image based on
feature fusion and attention mechanism
[J]. Journal of Graphics, 2023, 44(4): 658-666.
|
[2] |
LI Xin , PU Yuan-yuan, ZHAO Zheng-peng , XU Dan , QIAN Wen-hua.
Content semantics and style features match consistent
artistic style transfer
[J]. Journal of Graphics, 2023, 44(4): 699-709.
|
[3] |
YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping.
Monocular depth estimation based on Laplacian
pyramid with attention fusion
[J]. Journal of Graphics, 2023, 44(4): 728-738.
|
[4] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[5] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[6] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[7] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. |
[8] | LU Qiu, SHAO Hua-ze , ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. |
[9] | XIE Guo-bo, HE Di-xuan, HE Yu-qin, LIN Zhi-yi. P-CenterNet for chimney detection in optical remote-sensing images [J]. Journal of Graphics, 2023, 44(2): 233-249. |
[10] | XIONG Ju-ju , XU Yang, FAN Run-ze , SUN Shao-cong. Flowers recognition based on lightweight visual transformer [J]. Journal of Graphics, 2023, 44(2): 271-279. |
[11] | CHENG Lang , JING Chao. X-ray image rotating object detection based on improved YOLOv7 [J]. Journal of Graphics, 2023, 44(2): 324-334. |
[12] | CAO Yi-qin , WU Ming-lin , XU Lu. Steel surface defect detection based on improved YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(2): 335-345. |
[13] | ZHANG Wei-kang, SUN Hao, CHEN Xin-kai, LI Xu-bing, YAO Li-gang, DONG Hui . Research on weed detection in vegetable seedling fields based on the improved YOLOv5 intelligent weeding robot [J]. Journal of Graphics, 2023, 44(2): 346-356. |
[14] | LI Xiao-bo , LI Yang-gui, GUO Ning , FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism [J]. Journal of Graphics, 2023, 44(1): 16-25. |
[15] | SHAO Wen-bin, LIU Yu-jie, SUN Xiao-rui, LI Zong-min . Cross modality person re-identification based on residual enhanced attention [J]. Journal of Graphics, 2023, 44(1): 33-40. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||