A dense pedestrian detection algorithm with improved YOLOv8

doi:10.11996/JG.j.2095-302X.2023050890

Abstract

Abstract:

In response to the challenge of detecting small-scale, occluded pedestrians in dense scenes, where they are prone to being missed, we proposed an improved YOLOv8 detection algorithm. First, to address the issue of extracting features from small-scale pedestrians, a backbone network improved by deformable convolution was employed to enhance the feature extraction capability of the network, and an occlusion-aware attention mechanism was designed to enhance the visible part of the occluded pedestrian features. Second, to address imprecise localization of the detection head in dense pedestrian scenes, a dynamic decoupling head was designed to enhance attention to multi-scale pedestrian features, thereby improving the expression capability of the detection head. Finally, to address the problem of low model training efficiency, the regression loss that combined Wise-IoU with distributed focus loss was utilized for training, thereby enhancing the convergence ability of the model. Through the analysis of experimental results, the improved YOLOv8 algorithm demonstrated exceptional performance on two challenging and dense pedestrian datasets, namely CrowdHuman and WiderPerson, achieving an AP₅₀ of 90.6% and 92.3% and an AP_50:95 of 62.5% and 68.2%, respectively. In contrast to the original algorithm, the improvements were substantial, establishing robust competitiveness when compared with other state-of-the-art pedestrian detection models. The proposed algorithm exhibited a wide range of applications in dense pedestrian detection tasks.

Key words: YOLOv8, dense pedestrian detection, occlusion-aware attention, deformable convolution, dynamic decoupled head

CLC Number:

TP391

GAO Ang, LIANG Xing-zhu, XIA Chen-xing, ZHANG Chun-jiong. A dense pedestrian detection algorithm with improved YOLOv8[J]. Journal of Graphics, 2023, 44(5): 890-898.

Figures/Tables 14

References 27

[1]	李颀, 王娇, 邓耀辉. 基于遮挡感知的行人检测与跟踪算法[J]. 传感器与微系统, 2023, 42(4): 126-130.
	LI Q, WANG J, DENG Y H. Pedestrian detection and tracking algorithm based on occlusion-aware[J]. Transducer and Microsystem Technologies, 2023, 42(4): 126-130. (in Chinese)
[2]	ZHANG T L, YE Q X, ZHANG B C, et al. Feature calibration network for occluded pedestrian detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4151-4163. DOI URL
[3]	刘毅, 于畅洋, 李国燕, 等. UAST-RCNN: 遮挡行人的目标检测算法[J]. 电子测量与仪器学报, 2022, 36(12): 168-175.
	LIU Y, YU C Y, LI G Y, et al. UAST-RCNN: object detection algorithm for blocking pedestrians[J]. Journal of Electronic Measurement and Instrumentation, 2022, 36(12): 168-175. (in Chinese)
[4]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[5]	ZHANG Y A, HE H Y, LI J G, et al. Variational pedestrian detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11617-11626.
[6]	CHU X G, ZHENG A L, ZHANG X Y, et al. Detection in crowded scenes: one proposal, multiple predictions[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12211-12220.
[7]	沙梦洲, 沈韬, 曾凯, 等. 融合深浅特征和动态选择机制的行人检测研究[J]. 数据采集与处理, 2023, 38(1): 162-173.
	SHA M Z, SHEN T, ZENG K, et al. Pedestrian detection incorporating deep and shallow features and dynamic selection mechanisms[J]. Journal of Data Acquisition and Processing, 2023, 38(1): 162-173. (in Chinese)
[8]	孙佩珺, 张仲荣, 李琦铭, 等. 基于改进多尺度残差网络的行人检测方法[J]. 计算机工程与设计, 2023, 44(3): 762-769.
	SUN P J, ZHANG Z R, LI Q M, et al. Pedestrian detection based on improved multi-scale Res2NeXt[J]. Computer Engineering and Design, 2023, 44(3): 762-769. (in Chinese)
[9]	HONG M B, LI S W, YANG Y C, et al. SSPNet: scale selection pyramid network for tiny person detection from UAV images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[10]	HUANG S H, LU Z C, CHENG R, et al. FaPN: feature-aligned pyramid network for dense image prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 844-853.
[11]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-04-20]. https://arxiv.org/abs/1804.02767.
[12]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[13]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[14]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19.
[15]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539.
[16]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[17]	LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2021-12-10) [2023-04-20]. https://arxiv.org/abs/2112.05561.
[18]	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333.
[19]	DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7369-7378.
[20]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[21]	LI X, WANG W H, WU L J, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 21002-21012.
[22]	TONG Z J, CHEN Y H, XU Z W, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-04-20]. https://arxiv.org/abs/2301.10051.
[23]	SHAO S, ZHAO Z J, LI B X, et al. CrowdHuman: a benchmark for detecting human in a crowd[EB/OL]. (2018-04-30) [2023-04-20]. https://arxiv.org/abs/1805.00123.
[24]	ZHANG S F, XIE Y L, WAN J, et al. WiderPerson: a diverse dataset for dense pedestrian detection in the wild[J]. IEEE Transactions on Multimedia, 2020, 22(2): 380-393. DOI URL
[25]	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2023-04-20]. https://arxiv.org/abs/2010.04159.
[26]	RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: iterative scheme for object detection in crowded environments[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 344-354.
[27]	GE Z, JIE Z Q, HUANG X, et al. PS-RCNN: detecting secondary human instances in a crowd via primary object suppression[C]// 2020 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2020: 1-6.

模型	AP₅₀	AP_50:95
RetinaNet^[20]	81.9	49.8
Faster R-CNN^[4]	85.8	55.2
Deformable DETR^[25]	86.7	-
CrowdDet^[6]	90.3	-
YOLOv5-n	75.2	42.5
YOLOv8-n	82.7	51.5
YOLOv8-n (本文)	84.4	53.5
YOLOX-m	82.5	52.1
YOLOv5-m	83.8	52.9
YOLOv8-m	86.9	58.6
YOLOv8-m (本文)	88.7	60.8
YOLOX-l	85.5	57.6
YOLOv5-l	86.4	58.2
YOLOv8-l	88.1	59.3
YOLOv8-l (本文)	90.6	62.5

模型	AP₅₀	AP_50:95
RetinaNet^[20]	81.9	49.8
Faster R-CNN^[4]	85.8	55.2
Deformable DETR^[25]	86.7	-
CrowdDet^[6]	90.3	-
YOLOv5-n	75.2	42.5
YOLOv8-n	82.7	51.5
YOLOv8-n (本文)	84.4	53.5
YOLOX-m	82.5	52.1
YOLOv5-m	83.8	52.9
YOLOv8-m	86.9	58.6
YOLOv8-m (本文)	88.7	60.8
YOLOX-l	85.5	57.6
YOLOv5-l	86.4	58.2
YOLOv8-l	88.1	59.3
YOLOv8-l (本文)	90.6	62.5

模型	AP₅₀	AP_50:95
IterDet^[26]	89.5	-
PS-RCNN^[27]	90.0	-
YOLOv8-n	88.0	61.9
YOLOv8-n (本文)	89.4	63.5
YOLOv8-m	90.2	65.2
YOLOv8-m (本文)	91.5	66.9
YOLOv8-l	90.8	66.3
YOLOv8-l (本文)	92.3	68.2

模型	AP₅₀	AP_50:95
IterDet^[26]	89.5	-
PS-RCNN^[27]	90.0	-
YOLOv8-n	88.0	61.9
YOLOv8-n (本文)	89.4	63.5
YOLOv8-m	90.2	65.2
YOLOv8-m (本文)	91.5	66.9
YOLOv8-l	90.8	66.3
YOLOv8-l (本文)	92.3	68.2

YOLOv8-n	C2f_DCN	遮挡感知注意力	动态解耦头	Wise-IoU	AP₅₀ (%)	AP_50:95 (%)
√	-	-	-	-	82.7	51.5
√	-	√	-	-	83.5	52.6
√	-	-	-	√	83.1	52.2
√	√	√	√	-	84.0	53.1
√	√	-	√	√	83.7	53.0
√	√	√	√	√	84.4	53.5