Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 56-64.DOI: 10.11996/JG.j.2095-302X.2024010056
• Image Processing and Computer Vision • Previous Articles Next Articles
LV Ling1(), LI Hua1(
), WANG Wu2
Received:
2023-07-18
Accepted:
2023-10-25
Online:
2024-02-29
Published:
2024-02-29
Contact:
LI Hua (1977-), professor, Ph.D. Her main research interests cover computer vision and virtual reality. E-mail:About author:
LV Ling (1999-), master student. Her main research interest covers computer vision. E-mail:15382351657@163.com
Supported by:
CLC Number:
LV Ling, LI Hua, WANG Wu. Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature[J]. Journal of Graphics, 2024, 45(1): 56-64.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024010056
所使用的模块 | 评估结果/% | |||||
---|---|---|---|---|---|---|
ResNet18 | EDRM-ResNet18 | 2-BiLSTM+FPN | 3-BiLSTM+FPN | P | R | F |
√ | - | - | - | 89.6 | 75.5 | 81.9 |
- | √ | - | - | 89.6 | 77.3 | 83.0 |
√ | - | √ | - | 88.8 | 76.7 | 82.3 |
√ | - | - | √ | 89.2 | 76.8 | 82.5 |
- | √ | √ | - | 89.9 | 77.5 | 83.2 |
- | √ | - | √ | 88.4 | 79.5 | 83.7 |
Table 1 Results of ablation experiment
所使用的模块 | 评估结果/% | |||||
---|---|---|---|---|---|---|
ResNet18 | EDRM-ResNet18 | 2-BiLSTM+FPN | 3-BiLSTM+FPN | P | R | F |
√ | - | - | - | 89.6 | 75.5 | 81.9 |
- | √ | - | - | 89.6 | 77.3 | 83.0 |
√ | - | √ | - | 88.8 | 76.7 | 82.3 |
√ | - | - | √ | 89.2 | 76.8 | 82.5 |
- | √ | √ | - | 89.9 | 77.5 | 83.2 |
- | √ | - | √ | 88.4 | 79.5 | 83.7 |
方法 | P | R | F |
---|---|---|---|
CTPN[ | 74.2 | 51.6 | 60.9 |
EAST[ | 83.6 | 78.5 | 78.2 |
SegLink[ | 73.1 | 76.8 | 75.0 |
TextBoxes++[ | 87.2 | 76.7 | 81.7 |
PANNet[ | 84.0 | 81.9 | 82.9 |
ATTR[ | 85.8 | 79.7 | 82.6 |
文献[ | 82.6 | 81.9 | 82.2 |
PAN++[ | 85.9 | 80.4 | 83.1 |
DBNet++[ | 90.1 | 77.2 | 83.1 |
文献[ | 84.8 | 81.3 | 83.0 |
DBNet[ | 89.6 | 75.5 | 81.9 |
Ours | 88.4 | 79.5 | 83.7 |
Table 2 Comparison results on ICDAR2015 dataset/%
方法 | P | R | F |
---|---|---|---|
CTPN[ | 74.2 | 51.6 | 60.9 |
EAST[ | 83.6 | 78.5 | 78.2 |
SegLink[ | 73.1 | 76.8 | 75.0 |
TextBoxes++[ | 87.2 | 76.7 | 81.7 |
PANNet[ | 84.0 | 81.9 | 82.9 |
ATTR[ | 85.8 | 79.7 | 82.6 |
文献[ | 82.6 | 81.9 | 82.2 |
PAN++[ | 85.9 | 80.4 | 83.1 |
DBNet++[ | 90.1 | 77.2 | 83.1 |
文献[ | 84.8 | 81.3 | 83.0 |
DBNet[ | 89.6 | 75.5 | 81.9 |
Ours | 88.4 | 79.5 | 83.7 |
方法 | P | R | F |
---|---|---|---|
DeepReg[ | 77.0 | 70.0 | 74.0 |
RRPN[ | 82.0 | 68.0 | 74.0 |
EAST[ | 87.3 | 67.4 | 76.1 |
SegLink[ | 86.0 | 70.0 | 77.0 |
RRD[ | 87.0 | 73.7 | 79.0 |
PixelLink[ | 83.0 | 73.2 | 77.8 |
TextSnake[ | 83.2 | 73.9 | 78.3 |
PAN++[ | 81.6 | 80.3 | 80.9 |
DBNet++[ | 89.7 | 76.5 | 82.6 |
DBNet[ | 86.6 | 75.3 | 80.6 |
Ours | 87.1 | 80.9 | 83.9 |
Table 3 Comparison results on MSRA-TD500 dataset/%
方法 | P | R | F |
---|---|---|---|
DeepReg[ | 77.0 | 70.0 | 74.0 |
RRPN[ | 82.0 | 68.0 | 74.0 |
EAST[ | 87.3 | 67.4 | 76.1 |
SegLink[ | 86.0 | 70.0 | 77.0 |
RRD[ | 87.0 | 73.7 | 79.0 |
PixelLink[ | 83.0 | 73.2 | 77.8 |
TextSnake[ | 83.2 | 73.9 | 78.3 |
PAN++[ | 81.6 | 80.3 | 80.9 |
DBNet++[ | 89.7 | 76.5 | 82.6 |
DBNet[ | 86.6 | 75.3 | 80.6 |
Ours | 87.1 | 80.9 | 83.9 |
[1] | 侯杰波. 复杂场景文本检测方法研究[D]. 北京: 北京科技大学, 2021. |
HOU J B. Research on text detection in complex scenes[D]. Beijing: University of Science and Technology Beijing, 2021 (in Chinese). | |
[2] |
GREENHALGH J, MIRMEHDI M. Recognizing text-based traffic signs[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1360-1369.
DOI URL |
[3] |
CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, 8(6): 679-698.
PMID |
[4] |
SHIVAKUMARA P, PHAN T Q, TAN C L. A Laplacian approach to multi-oriented text detection in video[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412-419.
DOI PMID |
[5] | CHEN X R, YUILLE A L. Detecting and reading text in natural scenes[C]// 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR. New York: IEEE Press, 2004:II. |
[6] | LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3454-3461. |
[7] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
[8] | TIAN Z, HUANG W L, HE T, et al. Detecting text in natural image with connectionist text proposal network[C]// European Conference on Computer Vision. Cham: Springer, 2016: 56-72. |
[9] | BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9357-9366. |
[10] | WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9328-9337. |
[11] | WANG W H, XIE E Z, SONG X G, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 8439-8448. |
[12] | LIAO M H, WAN Z Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]// The AAAI conference on artificial intelligence. New York: AAAI, 2020, 34(7): 11474-11481. |
[13] | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773. |
[14] |
CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
DOI URL |
[15] | KARATZAS D, GOMEZ-BIGORDA L, NICOLAOU A, et al. ICDAR 2015 competition on robust reading[C]// 2015 13th International Conference on Document Analysis and Recognition. New York: IEEE Press, 2015: 1156-1160. |
[16] | YAO C, BAI X, LIU W Y, et al. Detecting texts of arbitrary orientations in natural images[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 1083-1090. |
[17] | ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2642-2651. |
[18] | SHI B G, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3482-3490. |
[19] |
LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690.
DOI PMID |
[20] |
JIANG X F, XU S G, ZHANG S Q, et al. Arbitrary-shaped text detection with adaptive text region representation[J]. IEEE Access, 2020, 8: 102106-102118.
DOI URL |
[21] | SHENG T, LIAN Z H. Bidirectional regression for Arbitrary- shaped text detection[M]//Document Analysis and Recognition - ICDAR 2021. Cham: Springer International Publishing, 2021: 187-201. |
[22] | WANG W H, XIE E Z, LI X, et al. PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5349-5367. |
[23] |
LIAO M H, ZOU Z S, WAN Z Y, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
DOI URL |
[24] | 徐健, 郭湛澎, 刘秀平, 等. 基于注意力机制的多方向文本检测[J]. 光电子·激光, 2023: 166-173. |
XU J, GUO Z P, LIU X P, et al. Multi-directional text detection based on attention mechanism[J]. Journal of Optoelectronics·Laser, 2023: 166-173 (in Chinese). | |
[25] | HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 745-753. |
[26] |
MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
DOI URL |
[27] | LIAO M H, ZHU Z, SHI B G, et al. Rotation-sensitive regression for oriented scene text detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5909-5918. |
[28] | DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 6773-6780. |
[29] | LONG S B, RUAN J Q, ZHANG W J, et al. TextSnake: a flexible representation for detecting text of arbitrary shapes[C]// European Conference on Computer Vision. Cham: Springer, 2018: 19-35. |
[1] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. |
[2] | WU Bing, TIAN Ying. Research on multi-scale road damage detection algorithm based on attention mechanism [J]. Journal of Graphics, 2024, 45(4): 770-778. |
[3] | GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks [J]. Journal of Graphics, 2024, 45(1): 26-34. |
[4] | WEI Chen-hao, YANG Rui, LIU Zhen-bing, LAN Ru-shi, SUN Xi-yan, LUO Xiao-nan. YOLOv8 with bi-level routing attention for road scene object detection [J]. Journal of Graphics, 2023, 44(6): 1104-1111. |
[5] | GAO Ang, LIANG Xing-zhu, XIA Chen-xing, ZHANG Chun-jiong. A dense pedestrian detection algorithm with improved YOLOv8 [J]. Journal of Graphics, 2023, 44(5): 890-898. |
[6] | WANG Dao-lei, KANG Bo, ZHU Rui. Text detection method for electrical equipment nameplates based on deep learning [J]. Journal of Graphics, 2023, 44(4): 691-698. |
[7] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[8] | MA Yan-bo, LI Lin, CHEN Yuan, ZHAO Yang, HU Rui. Multi-frame compressed video enhancement based on spatio-temporal fusion [J]. Journal of Graphics, 2022, 43(4): 651-658. |
[9] | LI Hua-en, ZHAO Yang, CHEN Yuan, ZHANG Xiao-juan. High definition reconstruction of black and white cartoon based on recurrent alignment network [J]. Journal of Graphics, 2022, 43(3): 434-442. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||