Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature

doi:10.11996/JG.j.2095-302X.2024010056

Abstract

Abstract:

A text detection method was proposed based on an enhanced feature extraction network and semantic feature fusion, thus addressing the challenges such as variable length and oblique angle of scene text. An enhanced dilated residual module (EDRM) was designed by combining deformable convolution with atrous convolution for the layers conv4_x and conv5_x of ResNet18. This module served as the backbone network, enhancing the capability of feature extraction while increasing the feature map resolution and reducing the loss of spatial information. Secondly, to address the inadequacies of the existing algorithms in extracting text semantic features, bi-directional long short-term memory (BiLSTM) was applied to the feature fusion section, enhancing the representation ability of fusion feature map for scene text, the correlation of feature sequences, and the text localization ability of the model. The model was evaluated on the multi-directional text dataset ICDAR2015 and the long text dataset MSRA-TD500. The results demonstrated that compared with the current efficient DBNet algorithm, the F value of the proposed algorithm increased by 1.8% and 3.3 %, respectively, showing strong competitiveness.

Key words: deformable convolution, atrous convolution, text detection, semantic feature, bi-directional long short-term memory

CLC Number:

TP391

LV Ling, LI Hua, WANG Wu. Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature[J]. Journal of Graphics, 2024, 45(1): 56-64.

Figures/Tables 12

References 29

[1]	侯杰波. 复杂场景文本检测方法研究[D]. 北京: 北京科技大学, 2021.
	HOU J B. Research on text detection in complex scenes[D]. Beijing: University of Science and Technology Beijing, 2021 (in Chinese).
[2]	GREENHALGH J, MIRMEHDI M. Recognizing text-based traffic signs[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1360-1369. DOI URL
[3]	CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6): 679-698. PMID
[4]	SHIVAKUMARA P, PHAN T Q, TAN C L. A Laplacian approach to multi-oriented text detection in video[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412-419. DOI PMID
[5]	CHEN X R, YUILLE A L. Detecting and reading text in natural scenes[C]// 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR. New York: IEEE Press, 2004:II.
[6]	LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3454-3461.
[7]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[8]	TIAN Z, HUANG W L, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]// European Conference on Computer Vision. Cham: Springer, 2016: 56-72.
[9]	BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9357-9366.
[10]	WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9328-9337.
[11]	WANG W H, XIE E Z, SONG X G, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 8439-8448.
[12]	LIAO M H, WAN Z Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]// The AAAI conference on artificial intelligence. New York: AAAI, 2020, 34(7): 11474-11481.
[13]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[14]	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. DOI URL
[15]	KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]// 2015 13th International Conference on Document Analysis and Recognition. New York: IEEE Press, 2015: 1156-1160.
[16]	YAO C, BAI X, LIU W Y, et al. Detecting texts of arbitrary orientations in natural images[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 1083-1090.
[17]	ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2642-2651.
[18]	SHI B G, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3482-3490.
[19]	LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. DOI PMID
[20]	JIANG X F, XU S G, ZHANG S Q, et al. Arbitrary-shaped text detection with adaptive text region representation[J]. IEEE Access, 2020, 8: 102106-102118. DOI URL
[21]	SHENG T, LIAN Z H. Bidirectional regression for Arbitrary- shaped text detection[M]//Document Analysis and Recognition - ICDAR 2021. Cham: Springer International Publishing, 2021: 187-201.
[22]	WANG W H, XIE E Z, LI X, et al. PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5349-5367.
[23]	LIAO M H, ZOU Z S, WAN Z Y, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931. DOI URL
[24]	徐健, 郭湛澎, 刘秀平, 等. 基于注意力机制的多方向文本检测[J]. 光电子·激光, 2023: 166-173.
	XU J, GUO Z P, LIU X P, et al. Multi-directional text detection based on attention mechanism[J]. Journal of Optoelectronics·Laser, 2023: 166-173 (in Chinese).
[25]	HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 745-753.
[26]	MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. DOI URL
[27]	LIAO M H, ZHU Z, SHI B G, et al. Rotation-sensitive regression for oriented scene text detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5909-5918.
[28]	DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 6773-6780.
[29]	LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[C]// European Conference on Computer Vision. Cham: Springer, 2018: 19-35.

所使用的模块				评估结果/%
ResNet18	EDRM-ResNet18	2-BiLSTM+FPN	3-BiLSTM+FPN	P	R	F
√	-	-	-	89.6	75.5	81.9
-	√	-	-	89.6	77.3	83.0
√	-	√	-	88.8	76.7	82.3
√	-	-	√	89.2	76.8	82.5
-	√	√	-	89.9	77.5	83.2
-	√	-	√	88.4	79.5	83.7

所使用的模块				评估结果/%
ResNet18	EDRM-ResNet18	2-BiLSTM+FPN	3-BiLSTM+FPN	P	R	F
√	-	-	-	89.6	75.5	81.9
-	√	-	-	89.6	77.3	83.0
√	-	√	-	88.8	76.7	82.3
√	-	-	√	89.2	76.8	82.5
-	√	√	-	89.9	77.5	83.2
-	√	-	√	88.4	79.5	83.7

方法	P	R	F
CTPN^[8]	74.2	51.6	60.9
EAST^[17]	83.6	78.5	78.2
SegLink^[18]	73.1	76.8	75.0
TextBoxes++^[19]	87.2	76.7	81.7
PANNet^[11]	84.0	81.9	82.9
ATTR^[20]	85.8	79.7	82.6
文献[21]	82.6	81.9	82.2
PAN++^[22]	85.9	80.4	83.1
DBNet++^[23]	90.1	77.2	83.1
文献[24]	84.8	81.3	83.0
DBNet^[12]	89.6	75.5	81.9
Ours	88.4	79.5	83.7

方法	P	R	F
CTPN^[8]	74.2	51.6	60.9
EAST^[17]	83.6	78.5	78.2
SegLink^[18]	73.1	76.8	75.0
TextBoxes++^[19]	87.2	76.7	81.7
PANNet^[11]	84.0	81.9	82.9
ATTR^[20]	85.8	79.7	82.6
文献[21]	82.6	81.9	82.2
PAN++^[22]	85.9	80.4	83.1
DBNet++^[23]	90.1	77.2	83.1
文献[24]	84.8	81.3	83.0
DBNet^[12]	89.6	75.5	81.9
Ours	88.4	79.5	83.7

方法	P	R	F
DeepReg^[25]	77.0	70.0	74.0
RRPN^[26]	82.0	68.0	74.0
EAST^[17]	87.3	67.4	76.1
SegLink^[18]	86.0	70.0	77.0
RRD^[27]	87.0	73.7	79.0
PixelLink^[28]	83.0	73.2	77.8
TextSnake^[29]	83.2	73.9	78.3
PAN++^[22]	81.6	80.3	80.9
DBNet++^[23]	89.7	76.5	82.6
DBNet^[12]	86.6	75.3	80.6
Ours	87.1	80.9	83.9