具有双层路由注意力的YOLOv8道路场景目标检测方法

doi:10.11996/JG.j.2095-302X.2023061104

图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1104-1111.DOI: 10.11996/JG.j.2095-302X.2023061104

• 图像处理与计算机视觉 • 上一篇下一篇

具有双层路由注意力的YOLOv8道路场景目标检测方法

魏陈浩¹(), 杨睿¹, 刘振丙¹, 蓝如师¹(), 孙希延², 罗笑南²

1.广西图像图形与智能处理重点实验室(桂林电子科技大学)，广西桂林 541004
2.卫星导航定位与位置服务国家地方联合工程研究中心(桂林电子科技大学)，广西桂林 541004

收稿日期:2023-06-29 接受日期:2023-09-16 出版日期:2023-12-31 发布日期:2023-12-17
通讯作者: 蓝如师(1986-)，男，教授，博士。主要研究方向为人工智能、图像处理和医学信息处理。E-mail：rslan2016@163.com
作者简介:
魏陈浩(1999-)，男，硕士研究生。主要研究方向为目标检测和深度学习。E-mail：chwei529@163.com

YOLOv8 with bi-level routing attention for road scene object detection

WEI Chen-hao¹(), YANG Rui¹, LIU Zhen-bing¹, LAN Ru-shi¹(), SUN Xi-yan², LUO Xiao-nan²

1. Guangxi Key Laboratory of Image and Graphic Intelligent Processing (Guilin University of Electronic Technology), Guilin Guangxi 541004, China
2. National Local Joint Engineering Research Center of Satellite Navigation and Location Service (Guilin University of Electronic Technology), Guilin Guangxi, 541004, China

Received:2023-06-29 Accepted:2023-09-16 Online:2023-12-31 Published:2023-12-17
Contact: LAN Rui-shi (1986-), professor, Ph.D. His main research interests cover artificial intelligence, image processing and medical information processing. E-mail：rslan2016@163.com
About author:
WEI Chen-hao (1999-), master student. His main research interests cover object detection and deep learning. E-mail：chwei529@163.com

摘要/Abstract

摘要：

随着机动车的数量不断增加，道路交通环境变得更加复杂，尤其是光照变化以及复杂背景都会干扰目标检测算法的准确性和精度，同时道路场景下多变形态的目标也会给检测任务造成干扰。针对这一系列问题，提出了一种YOLOv8n_T方法，在YOLOv8的基础上首先针对骨干网络构建了基于可变形卷积的D_C2f块，强化了特征提取网络对复杂背景下目标的特征学习，更好地适应道路目标复杂多变的情形；其次增加了双层路由注意力模块，以查询自适应的方式去除不相关的区域，留下相关度最高的区域；最后针对道路上行人、交通灯等小目标增加小目标检测层。实验表明，本文提出的YOLOv8n_T有效提高了模型在道路场景下的目标检测精度，在BDD100K数据集上的平均精度比原始YOLOv8n提升了6.8个百分点，比YOLOv5n提升了11.2个百分点。

关键词: 可变形卷积, 道路场景, 目标检测, YOLO, 注意力机制

Abstract:

With the continuous increase of motor vehicles, the road traffic environment has become increasingly complex, particularly due to changes in light conditions and complex backgrounds that can interfere with the accuracy and precision of target detection algorithms. Meanwhile, the diverse shapes of targets in road scenes can pose challenges to the detection task. In response to these challenges, a method named YOLOv8n_T was proposed. Building on the YOLOv8 skeleton network, it incorporated a D_C2f block utilizing deformable convolution to enhance feature learning for targets under complex backgrounds, making it more adaptable to the diverse and complex scenarios of road targets. Furthermore, the model incorporated a dual routing attention module to query adaptively and remove irrelevant regions, retaining only the most relevant regions. For small targets such as pedestrians and traffic lights on the road, a small target detection layer was added. Experimental results demonstrated that the proposed YOLOv8n_T could significantly enhance the precision of target detection in road scenarios, with an average precision increase of 6.8 percentage points compared to the original YOLOv8n and 11.2 percentage points compared to YOLOv5n on the BDD100K dataset.

Key words: deformable convolution, road scene, object detection, YOLO, attention mechanism

中图分类号:

TP391

魏陈浩, 杨睿, 刘振丙, 蓝如师, 孙希延, 罗笑南. 具有双层路由注意力的YOLOv8道路场景目标检测方法[J]. 图学学报, 2023, 44(6): 1104-1111.

WEI Chen-hao, YANG Rui, LIU Zhen-bing, LAN Ru-shi, SUN Xi-yan, LUO Xiao-nan. YOLOv8 with bi-level routing attention for road scene object detection[J]. Journal of Graphics, 2023, 44(6): 1104-1111.

图/表 8

图1 YOLOv8n_T网络结构图

Fig. 1 YOLOv8n_T network architecture

图2 基于可变形卷积的D_Bottleneck

Fig. 2 D_Bottleneck based on deformable convolution

图3 基于D_Bottleneck的D_C2f模块

Fig. 3 D_C2f module based on D_Bottleneck

图4 双层路由注意力机制

Fig. 4 Bi-level routing attention mechanism

表1 YOLOv8n_T与其他目标检测算法的实验结果对比

Table 1 Comparative experimental results between YOLOv8n_T and other object detection algorithms

方法		BDD100K				NEXET
方法	mAP50	mAP50~95	APs	APm	APl	mAP50	mAP50~95	APs	APm	APl
YOLOv5n	0.51	0.284	0.095	0.279	0.384	0.628	0.435	0.106	0.365	0.498
SSD	0.528	0.297	0.098	0.283	0.379	0.632	0.462	0.11	0.382	0.505
YOLOv7-Tiny	0.547	0.316	0.112	0.298	0.406	0.643	0.453	0.125	0.397	0.516
YOLOv8n	0.554	0.347	0.115	0.294	0.415	0.651	0.474	0.133	0.421	0.528
FCOS	0.564	0.33	0.127	0.279	0.397	0.647	0.461	0.162	0.416	0.524
Faster-RCNN	0.587	0.362	0.106	0.316	0.42	0.66	0.482	0.118	0.408	0.531
YOLOv8n_T(Ours)	0.622	0.401	0.193	0.371	0.458	0.684	0.496	0.251	0.432	0.560

表2 消融实验结果

Table 2 Experiment results for each component

方法	D_C2f	BRA	nano	mAP50	mAP50~95
YOLOv8n	×	×	×	0.554	0.347
YOLOv8n_D	√	×	×	0.571	0.356
YOLOv8n_BRA	×	√	×	0.557	0.35
YOLOv8n_nano	×	×	√	0.608	0.393
YOLOv8n_T (Ours)	√	√	√	0.622	0.401

图5 复杂道路场景实验结果对比

Fig. 5 comparative experimental results of complex road scene ((a) SSD; (b) YOLOv5n; (c) YOLOv7-Tiny; (d) FCOS; (e) YOLOv8n_T)

图6 不同目标上的mAP比较结果

Fig. 6 comparative results of mean average precision (mAP) on different targets

参考文献 19

[1]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI URL
[2]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-01-10]. https://arxiv.org/abs/1409.1556.pdf.
[3]	SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1-9.
[4]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[5]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[6]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[7]	HUANG G, LIU Z, LAURENS V D M, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269.
[8]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-03-10]. https://arxiv.org/abs/1804.02767.pdf.
[9]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-03-10]. https://arxiv.org/abs/2004.10934.pdf.
[10]	GLENN R J. YOLOv5[EB/OL]. [2023-03-10]. https://github.com/ultralytics/yolov5.
[11]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[12]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[13]	JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[EB/OL]. [2023-03-10]. https://arxiv.org/abs/1506.02025.pdf.
[14]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[15]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[16]	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with Bi-level routing attention[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333.
[17]	YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving dataset for heterogeneous multitask learning[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2636-2645.
[18]	KLEIN I. NEXET-the largest and most diverse road dataset in the world[EB/OL]. [2023-03-10]. https://www.kaggle.com/datasets/solesensei/nexet-original.
[19]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 9626-9635.

具有双层路由注意力的YOLOv8道路场景目标检测方法

YOLOv8 with bi-level routing attention for road scene object detection

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 19

相关文章 15

编辑推荐

Metrics

本文评价

[1]	郭宗洋, 刘立东, 蒋东华, 刘子翔, 朱熟康, 陈京华 . 基于语义引导神经网络的人体动作识别算法 [J]. 图学学报, 2024, 45(1): 26-34.
[2]	苑朝 , 赵亚冬 , 张耀 , 王嘉璇 , 徐大伟 , 翟永杰 , 朱松松 . 基于 YOLO 轻量化的多模态行人检测算法 [J]. 图学学报, 2024, 45(1): 35-46.
[3]	胡欣, 胡帅, 马丽军, 司利云, 肖剑, 袁晔 . 基于融合MBAM与YOLOv5的PCB缺陷检测方法 [J]. 图学学报, 2024, 45(1): 47-55.
[4]	吕伶, 李华, 王武 . 基于增强特征提取网络与语义特征融合的多方向文本检测 [J]. 图学学报, 2024, 45(1): 56-64.
[5]	翟永杰, 赵晓瑜, 王璐瑶, 王亚茹, 宋晓轲, 朱浩硕. IDD-YOLOv7：一种用于输电线路绝缘子多缺陷的轻量化检测方法 [J]. 图学学报, 2024, 45(1): 90-101.
[6]	古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成 [J]. 图学学报, 2024, 45(1): 102-111.
[7]	崔克彬, 焦静颐 . 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法 [J]. 图学学报, 2024, 45(1): 112-125.
[8]	王大阜, 王静, 石宇凯, 邓志文, 贾志勇. 基于深度迁移学习的图像隐私目标检测研究[J]. 图学学报, 2023, 44(6): 1112-1120.
[9]	丁建川, 肖金桐, 赵可新, 贾冬青, 崔炳德, 杨鑫. 基于脉冲神经网络的复杂场景导航避障算法[J]. 图学学报, 2023, 44(6): 1121-1129.
[10]	张丽媛, 赵海蓉, 何巍, 唐雄风. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190.
[11]	张驰, 张效娟, 赵洋, 杨帆. 基于调色板的半交互式低照度唐卡图像增强[J]. 图学学报, 2023, 44(6): 1202-1211.
[12]	杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867.
[13]	高昂, 梁兴柱, 夏晨星, 张春炯. 一种改进YOLOv8的密集行人检测算法[J]. 图学学报, 2023, 44(5): 890-898.
[14]	赵振兵, 马迪雅, 石颖, 李刚. 基于改进YOLOX的变电站仪表外观缺陷检测算法[J]. 图学学报, 2023, 44(5): 937-946.
[15]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭. 基于改进教师学生网络的隧道火灾检测[J]. 图学学报, 2023, 44(5): 978-987.