Journal of Graphics ›› 2023, Vol. 44 ›› Issue (5): 890-898.DOI: 10.11996/JG.j.2095-302X.2023050890
• Image Processing and Computer Vision • Previous Articles Next Articles
GAO Ang1(), LIANG Xing-zhu1,2(
), XIA Chen-xing1, ZHANG Chun-jiong3
Received:
2023-05-15
Accepted:
2023-07-24
Online:
2023-10-31
Published:
2023-10-31
Contact:
LIANG Xing-zhu (1979-), associate professor, master. His main research interests cover pattern recognition, computer vision, etc. E-mail:About author:
GAO Ang (1999-), master student. His main research interests cover object detection and image processing. E-mail:2021201221@aust.edu.cn
Supported by:
CLC Number:
GAO Ang, LIANG Xing-zhu, XIA Chen-xing, ZHANG Chun-jiong. A dense pedestrian detection algorithm with improved YOLOv8[J]. Journal of Graphics, 2023, 44(5): 890-898.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023050890
模型 | AP50 | AP50:95 |
---|---|---|
RetinaNet[ | 81.9 | 49.8 |
Faster R-CNN[ | 85.8 | 55.2 |
Deformable DETR[ | 86.7 | - |
CrowdDet[ | 90.3 | - |
YOLOv5-n | 75.2 | 42.5 |
YOLOv8-n | 82.7 | 51.5 |
YOLOv8-n (本文) | 84.4 | 53.5 |
YOLOX-m | 82.5 | 52.1 |
YOLOv5-m | 83.8 | 52.9 |
YOLOv8-m | 86.9 | 58.6 |
YOLOv8-m (本文) | 88.7 | 60.8 |
YOLOX-l | 85.5 | 57.6 |
YOLOv5-l | 86.4 | 58.2 |
YOLOv8-l | 88.1 | 59.3 |
YOLOv8-l (本文) | 90.6 | 62.5 |
Table 1 Detection accuracy of each model in CrowdHuman dataset (%)
模型 | AP50 | AP50:95 |
---|---|---|
RetinaNet[ | 81.9 | 49.8 |
Faster R-CNN[ | 85.8 | 55.2 |
Deformable DETR[ | 86.7 | - |
CrowdDet[ | 90.3 | - |
YOLOv5-n | 75.2 | 42.5 |
YOLOv8-n | 82.7 | 51.5 |
YOLOv8-n (本文) | 84.4 | 53.5 |
YOLOX-m | 82.5 | 52.1 |
YOLOv5-m | 83.8 | 52.9 |
YOLOv8-m | 86.9 | 58.6 |
YOLOv8-m (本文) | 88.7 | 60.8 |
YOLOX-l | 85.5 | 57.6 |
YOLOv5-l | 86.4 | 58.2 |
YOLOv8-l | 88.1 | 59.3 |
YOLOv8-l (本文) | 90.6 | 62.5 |
模型 | AP50 | AP50:95 |
---|---|---|
IterDet[ | 89.5 | - |
PS-RCNN[ | 90.0 | - |
YOLOv8-n | 88.0 | 61.9 |
YOLOv8-n (本文) | 89.4 | 63.5 |
YOLOv8-m | 90.2 | 65.2 |
YOLOv8-m (本文) | 91.5 | 66.9 |
YOLOv8-l | 90.8 | 66.3 |
YOLOv8-l (本文) | 92.3 | 68.2 |
Table 2 Detection accuracy of each model in WiderPerson dataset (%)
模型 | AP50 | AP50:95 |
---|---|---|
IterDet[ | 89.5 | - |
PS-RCNN[ | 90.0 | - |
YOLOv8-n | 88.0 | 61.9 |
YOLOv8-n (本文) | 89.4 | 63.5 |
YOLOv8-m | 90.2 | 65.2 |
YOLOv8-m (本文) | 91.5 | 66.9 |
YOLOv8-l | 90.8 | 66.3 |
YOLOv8-l (本文) | 92.3 | 68.2 |
YOLOv8-n | C2f_DCN | 遮挡感知注意力 | 动态解耦头 | Wise-IoU | AP50 (%) | AP50:95 (%) |
---|---|---|---|---|---|---|
√ | - | - | - | - | 82.7 | 51.5 |
√ | - | √ | - | - | 83.5 | 52.6 |
√ | - | - | - | √ | 83.1 | 52.2 |
√ | √ | √ | √ | - | 84.0 | 53.1 |
√ | √ | - | √ | √ | 83.7 | 53.0 |
√ | √ | √ | √ | √ | 84.4 | 53.5 |
Table 3 Module ablation experiment
YOLOv8-n | C2f_DCN | 遮挡感知注意力 | 动态解耦头 | Wise-IoU | AP50 (%) | AP50:95 (%) |
---|---|---|---|---|---|---|
√ | - | - | - | - | 82.7 | 51.5 |
√ | - | √ | - | - | 83.5 | 52.6 |
√ | - | - | - | √ | 83.1 | 52.2 |
√ | √ | √ | √ | - | 84.0 | 53.1 |
√ | √ | - | √ | √ | 83.7 | 53.0 |
√ | √ | √ | √ | √ | 84.4 | 53.5 |
YOLOv8-n | 参数量(M) | FLOPs (B) | AP (%) |
---|---|---|---|
+ SE | +0.008 | 8.2 | 82.9 |
+ CBAM | + 0.015 | 8.7 | 83.3 |
+ GAM | + 1.639 | 9.5 | 83.8 |
+ Biformer | + 0.265 | 18.6 | 82.2 |
+遮挡感知注意力 | + 0 | 8.4 | 83.5 |
Table 4 Comparison of embedding each attention mechanisms
YOLOv8-n | 参数量(M) | FLOPs (B) | AP (%) |
---|---|---|---|
+ SE | +0.008 | 8.2 | 82.9 |
+ CBAM | + 0.015 | 8.7 | 83.3 |
+ GAM | + 1.639 | 9.5 | 83.8 |
+ Biformer | + 0.265 | 18.6 | 82.2 |
+遮挡感知注意力 | + 0 | 8.4 | 83.5 |
[1] | 李颀, 王娇, 邓耀辉. 基于遮挡感知的行人检测与跟踪算法[J]. 传感器与微系统, 2023, 42(4): 126-130. |
LI Q, WANG J, DENG Y H. Pedestrian detection and tracking algorithm based on occlusion-aware[J]. Transducer and Microsystem Technologies, 2023, 42(4): 126-130. (in Chinese) | |
[2] |
ZHANG T L, YE Q X, ZHANG B C, et al. Feature calibration network for occluded pedestrian detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4151-4163.
DOI URL |
[3] | 刘毅, 于畅洋, 李国燕, 等. UAST-RCNN: 遮挡行人的目标检测算法[J]. 电子测量与仪器学报, 2022, 36(12): 168-175. |
LIU Y, YU C Y, LI G Y, et al. UAST-RCNN: object detection algorithm for blocking pedestrians[J]. Journal of Electronic Measurement and Instrumentation, 2022, 36(12): 168-175. (in Chinese) | |
[4] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
[5] | ZHANG Y A, HE H Y, LI J G, et al. Variational pedestrian detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11617-11626. |
[6] | CHU X G, ZHENG A L, ZHANG X Y, et al. Detection in crowded scenes: one proposal, multiple predictions[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12211-12220. |
[7] | 沙梦洲, 沈韬, 曾凯, 等. 融合深浅特征和动态选择机制的行人检测研究[J]. 数据采集与处理, 2023, 38(1): 162-173. |
SHA M Z, SHEN T, ZENG K, et al. Pedestrian detection incorporating deep and shallow features and dynamic selection mechanisms[J]. Journal of Data Acquisition and Processing, 2023, 38(1): 162-173. (in Chinese) | |
[8] | 孙佩珺, 张仲荣, 李琦铭, 等. 基于改进多尺度残差网络的行人检测方法[J]. 计算机工程与设计, 2023, 44(3): 762-769. |
SUN P J, ZHANG Z R, LI Q M, et al. Pedestrian detection based on improved multi-scale Res2NeXt[J]. Computer Engineering and Design, 2023, 44(3): 762-769. (in Chinese) | |
[9] | HONG M B, LI S W, YANG Y C, et al. SSPNet: scale selection pyramid network for tiny person detection from UAV images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5. |
[10] | HUANG S H, LU Z C, CHENG R, et al. FaPN: feature-aligned pyramid network for dense image prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 844-853. |
[11] | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2023-04-20]. https://arxiv.org/abs/1804.02767. |
[12] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[13] | DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773. |
[14] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19. |
[15] | WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539. |
[16] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[17] | LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. (2021-12-10) [2023-04-20]. https://arxiv.org/abs/2112.05561. |
[18] | ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[C]// 2023 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10323-10333. |
[19] | DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7369-7378. |
[20] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007. |
[21] | LI X, WANG W H, WU L J, et al. Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 21002-21012. |
[22] | TONG Z J, CHEN Y H, XU Z W, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. (2023-01-24) [2023-04-20]. https://arxiv.org/abs/2301.10051. |
[23] | SHAO S, ZHAO Z J, LI B X, et al. CrowdHuman: a benchmark for detecting human in a crowd[EB/OL]. (2018-04-30) [2023-04-20]. https://arxiv.org/abs/1805.00123. |
[24] |
ZHANG S F, XIE Y L, WAN J, et al. WiderPerson: a diverse dataset for dense pedestrian detection in the wild[J]. IEEE Transactions on Multimedia, 2020, 22(2): 380-393.
DOI URL |
[25] | ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2023-04-20]. https://arxiv.org/abs/2010.04159. |
[26] | RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: iterative scheme for object detection in crowded environments[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 344-354. |
[27] | GE Z, JIE Z Q, HUANG X, et al. PS-RCNN: detecting secondary human instances in a crowd via primary object suppression[C]// 2020 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2020: 1-6. |
[1] |
WEI Chen-hao, YANG Rui, LIU Zhen-bing, LAN Ru-shi, SUN Xi-yan, LUO Xiao-nan.
YOLOv8 with bi-level routing attention for road scene object detection
[J]. Journal of Graphics, 2023, 44(6): 1104-1111.
|
[2] |
WANG Da-fu, WANG Jing, SHI Yu-kai, DENG Zhi-wen, JIA Zhi-yong.
Research on image privacy detection based on deep transfer learning
[J]. Journal of Graphics, 2023, 44(6): 1112-1120.
|
[3] | MA Yan-bo, LI Lin, CHEN Yuan, ZHAO Yang, HU Rui. Multi-frame compressed video enhancement based on spatio-temporal fusion [J]. Journal of Graphics, 2022, 43(4): 651-658. |
[4] | LI Hua-en, ZHAO Yang, CHEN Yuan, ZHANG Xiao-juan. High definition reconstruction of black and white cartoon based on recurrent alignment network [J]. Journal of Graphics, 2022, 43(3): 434-442. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||