基于注意力机制的多尺度道路损伤检测算法研究

doi:10.11996/JG.j.2095-302X.2024040770

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 770-778.DOI: 10.11996/JG.j.2095-302X.2024040770

• 图像处理与计算机视觉 • 上一篇下一篇

基于注意力机制的多尺度道路损伤检测算法研究

武兵(), 田莹()

辽宁科技大学计算机与软件工程学院，辽宁鞍山 114051

收稿日期:2024-04-26 接受日期:2024-06-28 出版日期:2024-08-31 发布日期:2024-09-03
通讯作者:田莹(1971-)，女，教授，博士。主要研究方向为模式识别、数字图像处理。E-mail：t_tianying@126.com
第一作者:武兵(1999-)，男，硕士研究生。主要研究方向为计算机视觉、深度学习。E-mail：2258860606@qq.com
基金资助:
国家自然科学基金资助项目(62072086);辽宁省教育厅资助项目(LJKM20220646)

Research on multi-scale road damage detection algorithm based on attention mechanism

WU Bing(), TIAN Ying()

School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan Liaoning 114051, China

Received:2024-04-26 Accepted:2024-06-28 Published:2024-08-31 Online:2024-09-03
Contact: TIAN Ying (1971), PhD, professor, her research interests include pattern recognition, digital image processing. E-mail：t_tianying@126.com
First author：WU Bing (1999), MS candidate, his research interests include computer vision, deep learning. E-mail：2258860606@qq.com
Supported by:
National Natural Science Foundation of China(62072086);Funded by Liaoning Provincial Department of Education(LJKM20220646)

摘要/Abstract

摘要：

路损伤检测是道路养护与修复的一项重要任务。现有的道路损伤检测方式以传统的人工检测为主，人工检测需要投入大量的人力和物力，检测效率低，无法适应当前道路发展的需求。进而提出了一种改进的多尺度道路损伤检测算法YOLOv8-RDD。首先，YOLOv8-RDD算法在C2f模块中使用可变形卷积(DCN)建了全新的C2f_DCN模块，扩大感受野的有效范围，更准确地定位目标对象的边界和位置，有助于提升对目标的识别和定位能力；其次，网络末端设计了全新的SPPF_GS模块，在SPPF模块中引入了自注意力机制(SA)和幻影卷积Ghost模块，并重新优化了池化核的大小，更好的处理长距离依赖性和捕获全局信息；最后，在Neck中引入坐标注意力机制(CA)，强化模型的特征提取能力，减少冗余信息。实验结果表明，改进后的算法在RDD2022数据集上面的精确度(Precision)为61.1%、召回率(Recall)为55.5%，平均精度(mAP)为56.2%，相较于YOLOv8n算法分别提高了4.6%、4.7%和5.2%,在道路损伤的目标检测上取得了优异的效果。

关键词: 道路损伤检测, YOLOv8, 可变形卷积, 注意力机制, Ghost模块

Abstract:

Road damage detection is an important task in road maintenance and repair. The existing road damage detection methods primarily rely on traditional manual detection, which requires significant manpower and material resources, resulting in low detection efficiency and an inability to meet the needs of current road development.To address these problems, an improved multi-scale road damage detection algorithm, YOLOv8-RDD, was proposed. Firstly, the YOLOv8-RDD algorithm employed Deformable Convolutional Networks (DCN) in the C2f module to build a new C2f_DCN module. This expanded the effective range of the receptive field and located the boundary and position of target objects more accurately, thus enhancing the ability to identify and locate the target. At the end of backbone network, a new SPPF_GS module was designed, introducing the Self-Attention (SA) mechanism and the Phantom Convolution Ghost module into the SPPF module, with the size of pooled kernel re-optimized to better deal with long-distance dependence and capture global information. Finally, Coordinate Attention (CA) was introduced into the Neck to strengthen the feature extraction ability of the model and reduce redundant information. Experimental results demonstrated that the improved algorithm achieved a Precision of 61.1%, a Recall rate of 55.5%, and a mean average precision (mAP) of 56.2% on the RDD2022 dataset. Compared with the YOLOv8n algorithm, the results were improved by 4.6%, 4.7%, and 5.2%, respectively, which achieved excellent performance in the target detection of road damage.

Key words: road damage detection, YOLOv8, deformable convolutional networks, attention mechanism, Ghost module

中图分类号:

武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778.

WU Bing, TIAN Ying. Research on multi-scale road damage detection algorithm based on attention mechanism[J]. Journal of Graphics, 2024, 45(4): 770-778.

图/表 16

图1 YOLOv8-RDD模型结构图

Fig. 1 YOLOv8-RDD model structure diagram

图2 可变形卷积结构图

Fig. 2 Deformable convolution structure diagram

图3 可变形ROI池化结构图

Fig. 3 Deformable ROI pooling structure diagram

图4 Bottleneck_DCN模型结构图

Fig. 4 Bottleneck_DCN model structure diagram

图5 C2f_DCN模型结构图

Fig. 5 C2f_DCN model structure diagram

图6 Self-Attention模型结构图

Fig. 6 Self-Attention model structure diagram

图7 SPPF_GS模型结构图

Fig. 7 SPPF_GS model structure diagram

图8 Coordinate Attention模型结构图

Fig. 8 Coordinate Attention model structure diagram

表1 注意力机制对比实验

Table 1 Comparison of experimental results of attention mechanism

Algorithm	P/%	R/%	mAP50/%	mAP50~95/%	Params/10⁶	GFLOPs
YOLOv8n	57.8	53.0	53.4	24.1	3.0	8.1
YOLOv8+CBAM	59.1	54.3	53.8	23.9	3.3	8.3
YOLOv8+SE	58.8	52.3	52.5	23.8	3.0	8.2
YOLOv8+CA (Ours)	63.0	54.5	54.3	24.1	3.0	8.2

表2 SPPF模块改进实验结果

Table 2 The SPPF module improves the experimental results

Algorithm	mAP50/%	mAP50~95/%	Params/10⁶	GFLOPs
SPPF	53.4	24.1	3.0	8.1
SPPF+SA	53.9	24.0	3.1	8.3
SPPF+Ghost	53.4	23.9	2.9	8.0
SPPF-GS	54.6	24.1	3.0	8.2

表3 消融实验结果

Table 3 Results of ablation experiment

YOLOv8n	C2f_DCN	CA	SPPF_GS	mAP50/%	mAP50~95/%	Params/10⁶	GFLOPs
√				53.4	24.1	3.0	8.1
√	√			54.3	24.5	3.2	7.7
√		√		54.3	24.0	3.0	8.2
√			√	54.6	24.8	3.0	8.1
√	√	√		55.4	25.3	3.2	7.8
√	√		√	54.2	24.7	3.2	7.8
√	√	√	√	56.2	25.0	3.2	7.8

图9 光线充足场景检测对比图((a)原始图片；(b) YOLOv8n；(c) YOLOv8-RDD)

Fig. 9 Comparison of detection in well-lit scenarios ((a) Original image; (b) YOLOv8n; (c) YOLOv8-RDD)

图10 光线不充足场景检测对比图((a)原始图片；(b) YOLOv8n；(c) YOLOv8-RDD)

Fig. 10 Comparison of detection of insufficient light ((a) Original image; (b) YOLOv8n; (c) YOLOv8-RDD)

图11 YOLOv8n和YOLO-RDD的mAP对比图

Fig. 11 mAP contrast in YOLOv8n and YOLOv8-RDD

表4 对比实验结果

Table 4 Results of comparative experiment

Algorithm	mAP50/%	mAP50~95/%	Params/106	GFLOPs	FPS
Faster-RCNN	43.5	17.2	137.5	370.3	28
YOLOv3tiny	42.4	17.2	12.1	19.1	176
YOLOv4tiny	40.0	16.5	6.1	16.5	156
YOLOv5s	51.6	23.7	7.0	16.0	85
YOLOv6	52.1	23.8	4.2	11.9	87
YOLOv7tiny	48.4	19.4	6.0	13.2	117
YOLOv8n	53.4	24.0	3.0	8.2	105
YOLOv9c	59.1	29.6	156.0	68.1	103
RT-DETR	56.7	25.6	20.1	58.3	124
Swin Transformer	55.8	25.1	96.8	17.1	99
YOLOv8-RDD	56.2	25.0	3.2	7.8	95

表5 泛化能力实验

Table 5 Results of generalization ability experiment

Algorithm	YOLOv8n		YOLOv8-RDD
Type	mAP50%	mAP50%~95%	mAP50%	mAP50%~95%
D00	58.2	26.5	60.1(+1.9)	27.4(+0.9)
D10	50.6	24.1	54.2(+3.6)	25.1(+1.0)
D20	59.4	25.9	61.0(+1.6)	27.7(+1.8)
D40	60.2	27.5	62.8(+2.6)	29.4(+1.9)
All	57.2	25.3	59.5(+2.3)	27.2(+1.9)

参考文献 21

[1]	曾志超, 徐玥, 王景玉, 等. 基于SOE-YOLO轻量化的水面目标检测算法[EB/OL]. [2024-04-25]. http://kns.cnki.net/kcms/detail/10.1034.T.20240417.1457.002.html.
	ZENG Z C, XU Y, WANG J Y, et al. A water surface target detection algorithm based on SOE-YOLO lightweight network[EB/OL]. [2024-04-25]. http://kns.cnki.net/kcms/detail/10.1034.T.20240417.1457.002.html (in Chinese).
[2]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587.
[3]	GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[4]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[5]	KANG D, BENIPAL S S, GOPAL D L, et al. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning[J]. Automation in Construction, 2020, 118: 103291.
[6]	YAMAGUCHI T, MIZUTANI T. Quantitative road crack evaluation by a U-Net architecture using smartphone images and Lidar data[J]. Computer-Aided Civil and Infrastructure Engineering, 2024, 39(7): 963-982.
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[8]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6517-6525.
[9]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-04-25]. http://arxiv.org/abs/1804.02767.
[10]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2024-04-25]. http://arxiv.org/abs/2004.10934.
[11]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[12]	WANG N N, SHANG L H, SONG X T. A transformer- optimized deep learning network for road damage detection and tracking[J]. Sensors, 2023, 23(17): 7395.
[13]	XIANG W N, WANG H C, XU Y, et al. Road disease detection algorithm based on YOLOv5s-DSG[J]. Journal of Real-Time Image Processing, 2023, 20(3): 56.
[14]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 7464-7475.
[15]	崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125. DOI
	CUI K B, JIAO J Y. Steel surface defect detection algorithm based on MCB-FAH-YOLOv8[J]. Journal of Graphics, 2024, 45(1): 112-125 (in Chinese). DOI
[16]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[17]	ZHU X Z, HU H, LIN S, et al. Deformable ConvNets V2: more deformable, better results[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9300-9308.
[18]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1706.03762.
[19]	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2024-01-12]. http://arxiv.org/abs/1704.04861.
[20]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[21]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.

基于注意力机制的多尺度道路损伤检测算法研究

Research on multi-scale road damage detection algorithm based on attention mechanism

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 21

相关文章 15

编辑推荐

Metrics

本文评价

[1]	胡凤阔 , 叶兰 , 谭显峰 , 张钦展 , 胡志新 , 方清 , 王磊 , 满孝锋 . 一种基于改进 YOLOv8 的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900.
[2]	王亚茹, 冯利龙, 宋晓轲, 屈卓, 杨珂, 王乾铭, 翟永杰. TFD-YOLOv8：一种用于输电线路的异物检测方法 [J]. 图学学报, 2024, 45(5): 901-912.
[3]	刘义艳 , 郝婷楠 , 贺晨 , 常英杰 . 基于 DBBR-YOLO 的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921.
[4]	吴沛宸 , 袁立宁 , 胡皓 , 刘钊 , 郭放 . 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929.
[5]	刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956.
[6]	章东平 , 魏杨悦 , 何数技 , 徐云超 , 胡海苗 , 黄文君 . 特征融合与层间传递：一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978.
[7]	李刚 , 蔡泽浩 , 孙华勋 , 赵振兵 . 基于改进 OLOv8与语义知识融合的金具缺陷检测方法研究[J]. 图学学报, 2024, 45(5): 979-986.
[8]	谢国波, 林松泽, 林志毅, 吴陈锋, 梁立辉. 基于改进YOLOv7-tiny的道路病害检测算法[J]. 图学学报, 2024, 45(5): 987-997.
[9]	熊超 , 王云艳 , 罗雨浩 . 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016.
[10]	彭文, 林金炜. 基于空间信息关注和纹理增强的短小染色体分类方法[J]. 图学学报, 2024, 45(5): 1017-1029.
[11]	孙己龙 , 刘勇 , 周黎伟 , 路鑫 , 侯小龙 , 王亚琼 , 王志丰 . 基于DCNv2和Transformer Decoder的隧道衬砌裂缝高效检测模型研究[J]. 图学学报, 2024, 45(5): 1050-1061.
[12]	刘宗明 , 洪唯 , 龙睿 , 祝越 , 张小宇 . 基于自注意机制的乳源瑶绣自动生成与应用研究[J]. 图学学报, 2024, 45(5): 1096-1105.
[13]	李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658.
[14]	魏敏, 姚鑫. 基于多尺度与注意力机制的两阶段风暴单体外推研究[J]. 图学学报, 2024, 45(4): 696-704.
[15]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.