一种改进YOLOv8的密集行人检测算法

doi:10.11996/JG.j.2095-302X.2023050890

图学学报 ›› 2023, Vol. 44 ›› Issue (5): 890-898.DOI: 10.11996/JG.j.2095-302X.2023050890

• 图像处理与计算机视觉 • 上一篇下一篇

一种改进YOLOv8的密集行人检测算法

高昂¹(), 梁兴柱¹^,²(), 夏晨星¹, 张春炯³

1.安徽理工大学计算机科学与工程学院，安徽淮南 232001
2.安徽理工大学环境友好材料与职业健康研究院(芜湖)，安徽芜湖 241003
3.同济大学电子与信息工程学院，上海 201804

收稿日期:2023-05-15 接受日期:2023-07-24 出版日期:2023-10-31 发布日期:2023-10-31
通讯作者: 梁兴柱(1979-)，男，副教授，硕士。主要研究方向为模式识别、计算机视觉等。E-mail：xzliang@aust.edu.cn
作者简介:高昂(1999-)，男，硕士研究生。主要研究方向为目标检测与图像处理。E-mail：2021201221@aust.edu.cn
基金资助:
国家自然科学基金项目(62102003);安徽理工大学环境友好材料与职业健康研究院研发专项(ALW2021YF04);芜湖市科技计划项目(2020yf48)

A dense pedestrian detection algorithm with improved YOLOv8

GAO Ang¹(), LIANG Xing-zhu¹^,²(), XIA Chen-xing¹, ZHANG Chun-jiong³

1. School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan Anhui 232001, China
2. Institute of Environment-friendly Materials and Occupational Health, Anhui University of Science and Technology, Wuhu Anhui 241003, China
3. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

Received:2023-05-15 Accepted:2023-07-24 Online:2023-10-31 Published:2023-10-31
Contact: LIANG Xing-zhu (1979-), associate professor, master. His main research interests cover pattern recognition, computer vision, etc. E-mail：xzliang@aust.edu.cn
About author:GAO Ang (1999-), master student. His main research interests cover object detection and image processing. E-mail：2021201221@aust.edu.cn
Supported by:
National Natural Science Foundation of China(62102003);Research Foundation of the Institute of Environment-Friendly Materials and Occupational Health (Wuhu), Anhui University of Science and Technology(ALW2021YF04);Science and Technology Research Project of Wuhu City(2020yf48)

摘要/Abstract

摘要：

针对密集场景中小尺度的遮挡行人容易漏检的问题，提出一种改进YOLOv8检测算法。首先，针对小尺度行人特征提取问题，采用由可变形卷积改进的骨干网络增强网络对特征的提取能力，并设计遮挡感知注意力机制增强遮挡行人可见部分特征；其次，针对密集行人场景检测头定位不准的问题，设计动态解耦头增强对多尺度行人特征的关注，提高检测头的表达能力；最后，针对模型训练效率低的问题，训练时采用Wise-IoU与分布式聚焦损失结合的回归损失，提高模型的收敛能力。通过实验结果分析，改进YOLOv8算法在2个具有挑战性的密集行人数据集CrowdHuman和WiderPerson上性能优秀，AP₅₀分别达到90.6%和92.3%，AP_50:95分别达到62.5%和68.2%。相较原算法有了较大提升，且与其他先进行人检测模型进行比较时表现出了很强的竞争力。所提算法在密集行人检测任务中具有广泛的应用前景。

关键词: YOLOv8, 密集行人检测, 遮挡感知注意力, 可变形卷积, 动态解耦头

Abstract:

In response to the challenge of detecting small-scale, occluded pedestrians in dense scenes, where they are prone to being missed, we proposed an improved YOLOv8 detection algorithm. First, to address the issue of extracting features from small-scale pedestrians, a backbone network improved by deformable convolution was employed to enhance the feature extraction capability of the network, and an occlusion-aware attention mechanism was designed to enhance the visible part of the occluded pedestrian features. Second, to address imprecise localization of the detection head in dense pedestrian scenes, a dynamic decoupling head was designed to enhance attention to multi-scale pedestrian features, thereby improving the expression capability of the detection head. Finally, to address the problem of low model training efficiency, the regression loss that combined Wise-IoU with distributed focus loss was utilized for training, thereby enhancing the convergence ability of the model. Through the analysis of experimental results, the improved YOLOv8 algorithm demonstrated exceptional performance on two challenging and dense pedestrian datasets, namely CrowdHuman and WiderPerson, achieving an AP₅₀ of 90.6% and 92.3% and an AP_50:95 of 62.5% and 68.2%, respectively. In contrast to the original algorithm, the improvements were substantial, establishing robust competitiveness when compared with other state-of-the-art pedestrian detection models. The proposed algorithm exhibited a wide range of applications in dense pedestrian detection tasks.

Key words: YOLOv8, dense pedestrian detection, occlusion-aware attention, deformable convolution, dynamic decoupled head

中图分类号:

TP391

高昂, 梁兴柱, 夏晨星, 张春炯. 一种改进YOLOv8的密集行人检测算法[J]. 图学学报, 2023, 44(5): 890-898.

GAO Ang, LIANG Xing-zhu, XIA Chen-xing, ZHANG Chun-jiong. A dense pedestrian detection algorithm with improved YOLOv8[J]. Journal of Graphics, 2023, 44(5): 890-898.

图/表 14

图1 改进YOLOv8算法网络结构

Fig. 1 Structure of improved YOLOv8 algorithm network

图2 采样过程((a)标准卷积采样点；(b)可变形卷积采样点)

Fig. 2 Sampling process ((a) Standard convolutional sampling points; (b) Deformable convolutional sampling points)

图3 C2f_DCN模块结构

Fig. 3 Structure of the C2f_DCN module

图4 遮挡感知注意力机制

Fig. 4 Occlusion-aware attention mechanism

图5 动态检测头结构

Fig. 5 Structure of the dynamic head module

图6 动态解耦头模块

Fig. 6 Dynamic decoupled head module

表1 各模型在CrowdHuman数据集的检测精度(%)

Table 1 Detection accuracy of each model in CrowdHuman dataset (%)

模型	AP₅₀	AP_50:95
RetinaNet^[20]	81.9	49.8
Faster R-CNN^[4]	85.8	55.2
Deformable DETR^[25]	86.7	-
CrowdDet^[6]	90.3	-
YOLOv5-n	75.2	42.5
YOLOv8-n	82.7	51.5
YOLOv8-n (本文)	84.4	53.5
YOLOX-m	82.5	52.1
YOLOv5-m	83.8	52.9
YOLOv8-m	86.9	58.6
YOLOv8-m (本文)	88.7	60.8
YOLOX-l	85.5	57.6
YOLOv5-l	86.4	58.2
YOLOv8-l	88.1	59.3
YOLOv8-l (本文)	90.6	62.5

表2 各模型在WiderPerson数据集的检测精度(%)

Table 2 Detection accuracy of each model in WiderPerson dataset (%)

模型	AP₅₀	AP_50:95
IterDet^[26]	89.5	-
PS-RCNN^[27]	90.0	-
YOLOv8-n	88.0	61.9
YOLOv8-n (本文)	89.4	63.5
YOLOv8-m	90.2	65.2
YOLOv8-m (本文)	91.5	66.9
YOLOv8-l	90.8	66.3
YOLOv8-l (本文)	92.3	68.2

图7 检测效果对比((a) YOLOv5-l；(b) YOLOv8-l；(c)改进YOLOv8-l)

Fig. 7 Comparison of detection effects ((a) Graph of YOLOv5-l; (b) Graph of YOLOv8-l; (c) Graph of improved YOLOv8-l)

表3 模块消融实验

Table 3 Module ablation experiment

YOLOv8-n	C2f_DCN	遮挡感知注意力	动态解耦头	Wise-IoU	AP₅₀ (%)	AP_50:95 (%)
√	-	-	-	-	82.7	51.5
√	-	√	-	-	83.5	52.6
√	-	-	-	√	83.1	52.2
√	√	√	√	-	84.0	53.1
√	√	-	√	√	83.7	53.0
√	√	√	√	√	84.4	53.5

表4 嵌入各注意力机制的对比

Table 4 Comparison of embedding each attention mechanisms

YOLOv8-n	参数量(M)	FLOPs (B)	AP (%)
+ SE	+0.008	8.2	82.9
+ CBAM	+ 0.015	8.7	83.3
+ GAM	+ 1.639	9.5	83.8
+ Biformer	+ 0.265	18.6	82.2
+遮挡感知注意力	+ 0	8.4	83.5

图8 训练损失曲线对比

Fig. 8 Comparison of training loss curves

图9 损失函数不同权重对比

Fig. 9 Comparison of different weight of loss function

图10 特征提取效果对比((a)输入图像；(b) YOLOv8-l；(c)改进YOLOv8-l)

Fig. 10 Comparison of feature extraction effect ((a) Input image; (b) Graph of YOLOv8-l; (c) Graph of improved Yolov8-l)

参考文献 27

[1]	李颀, 王娇, 邓耀辉. 基于遮挡感知的行人检测与跟踪算法[J]. 传感器与微系统, 2023, 42(4): 126-130.
	LI Q, WANG J, DENG Y H. Pedestrian detection and tracking algorithm based on occlusion-aware[J]. Transducer and Microsystem Technologies, 2023, 42(4): 126-130. (in Chinese)
[2]	ZHANG T L, YE Q X, ZHANG B C, et al. Feature calibration network for occluded pedestrian detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4151-4163. DOI URL
[3]	刘毅, 于畅洋, 李国燕, 等. UAST-RCNN: 遮挡行人的目标检测算法[J]. 电子测量与仪器学报, 2022, 36(12): 168-175.
	LIU Y, YU C Y, LI G Y, et al. UAST-RCNN: object detection algorithm for blocking pedestrians[J]. Journal of Electronic Measurement and Instrumentation, 2022, 36(12): 168-175. (in Chinese)
[4]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[5]	ZHANG Y A, HE H Y, LI J G, et al. Variational pedestrian detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11617-11626.
[6]	CHU X G, ZHENG A L, ZHANG X Y, et al. Detection in crowded scenes: one proposal, multiple predictions[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12211-12220.
[7]	沙梦洲, 沈韬, 曾凯, 等. 融合深浅特征和动态选择机制的行人检测研究[J]. 数据采集与处理, 2023, 38(1): 162-173.
	SHA M Z, SHEN T, ZENG K, et al. Pedestrian detection incorporating deep and shallow features and dynamic selection mechanisms[J]. Journal of Data Acquisition and Processing, 2023, 38(1): 162-173. (in Chinese)
[8]	孙佩珺, 张仲荣, 李琦铭, 等. 基于改进多尺度残差网络的行人检测方法[J]. 计算机工程与设计, 2023, 44(3): 762-769.
	SUN P J, ZHANG Z R, LI Q M, et al. Pedestrian detection based on improved multi-scale Res2NeXt[J]. Computer Engineering and Design, 2023, 44(3): 762-769. (in Chinese)
[9]	HONG M B, LI S W, YANG Y C, et al. SSPNet: scale selection pyramid network for tiny person detection from UAV images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[10]	HUANG S H, LU Z C, CHENG R, et al. FaPN: feature-aligned pyramid network for dense image prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 844-853.
[11]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-04-20]. https://arxiv.org/abs/1804.02767.
[12]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[13]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[14]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19.
[15]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539.
[16]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[17]	LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2021-12-10) [2023-04-20]. https://arxiv.org/abs/2112.05561.
[18]	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333.
[19]	DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7369-7378.
[20]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[21]	LI X, WANG W H, WU L J, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 21002-21012.
[22]	TONG Z J, CHEN Y H, XU Z W, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-04-20]. https://arxiv.org/abs/2301.10051.
[23]	SHAO S, ZHAO Z J, LI B X, et al. CrowdHuman: a benchmark for detecting human in a crowd[EB/OL]. (2018-04-30) [2023-04-20]. https://arxiv.org/abs/1805.00123.
[24]	ZHANG S F, XIE Y L, WAN J, et al. WiderPerson: a diverse dataset for dense pedestrian detection in the wild[J]. IEEE Transactions on Multimedia, 2020, 22(2): 380-393. DOI URL
[25]	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2023-04-20]. https://arxiv.org/abs/2010.04159.
[26]	RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: iterative scheme for object detection in crowded environments[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 344-354.
[27]	GE Z, JIE Z Q, HUANG X, et al. PS-RCNN: detecting secondary human instances in a crowd via primary object suppression[C]// 2020 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2020: 1-6.

一种改进YOLOv8的密集行人检测算法

A dense pedestrian detection algorithm with improved YOLOv8

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 27

相关文章 4

编辑推荐

Metrics

本文评价

[1]	魏陈浩 , 杨睿 , 刘振丙 , 蓝如师 , 孙希延 , 罗笑南 . 具有双层路由注意力的 YOLOv8 道路场景目标检测方法 [J]. 图学学报, 2023, 44(6): 1104-1111.
[2]	王大阜, 王静, 石宇凯, 邓志文, 贾志勇 . 基于深度迁移学习的图像隐私目标检测研究 [J]. 图学学报, 2023, 44(6): 1112-1120.
[3]	马彦博, 李琳, 陈缘, 赵洋, 胡锐. 基于时空融合的多帧压缩视频增强方法[J]. 图学学报, 2022, 43(4): 651-658.
[4]	李华恩, 赵洋, 陈缘, 张效娟. 基于递归对齐网络的黑白老卡通高清重制[J]. 图学学报, 2022, 43(3): 434-442.