基于YOLO轻量化的多模态行人检测算法

doi:10.11996/JG.j.2095-302X.2024010035

图学学报 ›› 2024, Vol. 45 ›› Issue (1): 35-46.DOI: 10.11996/JG.j.2095-302X.2024010035

• 图像处理与计算机视觉 • 上一篇下一篇

基于YOLO轻量化的多模态行人检测算法

苑朝¹(), 赵亚冬¹, 张耀¹, 王嘉璇¹, 徐大伟¹^,²(), 翟永杰¹, 朱松松³

1.华北电力大学自动化系，河北保定 071003
2.中国科学院自动化研究所复杂系统管理与控制国家重点实验室，北京 100190
3.天津新松智能科技有限公司，天津 301800

收稿日期:2023-07-11 接受日期:2023-10-18 出版日期:2024-02-29 发布日期:2024-02-29
通讯作者:徐大伟(1990-)，男，讲师，博士。主要研究方向为绳驱机械臂建模与控制、超冗余机械臂运动规划。E-mail：xudawei@ncepu.edu.cn
第一作者:苑朝(1985-)，男，讲师，博士。主要研究方向为机器人学、传感器系统设计。E-mail：chaoyuan@ncepu.edu.cn
基金资助:
国家自然科学基金联合基金项目重点支持项目(U21A20486);中国科学院自动化研究所复杂系统管理与控制国家重点实验室开放课题(20220102);中央高校基本科研业务费专项资金资助(2022MS100)

Lightweight multi-modal pedestrian detection algorithm based on YOLO

YUAN Chao¹(), ZHAO Yadong¹, ZHANG Yao¹, WANG Jiaxuan¹, XU Dawei¹^,²(), ZHAI Yongjie¹, ZHU Songsong³

1. Department of Automation, North China Electric Power University, Baoding Hebei 071033, China
2. The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
3. Tianjin Siasun Intelligent Technology Co., Ltd, Tianjin 301800, China

Received:2023-07-11 Accepted:2023-10-18 Published:2024-02-29 Online:2024-02-29
First author：YUAN Chao (1985-), lecturer, Ph.D. His main research interests cover robotics and sensor system design. E-mail：chaoyuan@ncepu.edu.cn
Supported by:
The National Natural Science Foundation of China Joint Fund of China is a Key Support Project(U21A20486);Projects of the National Key Laboratory of Complex System Management and Control of the Institute of Automation of the Chinese Academy of Sciences(20220102);Funded by the Central University Basic Research Business Fund(2022MS100)

摘要/Abstract

摘要：

针对低光照环境下行人检测精度低和模型参数量大的问题，基于YOLO框架，提出一种轻量化的多模态行人检测算法EF-DEM-YOLO。采用轻量的ES-MobileNet作为主干特征提取网络，并在该网络中引入ECA和SE-ECA注意力机制模块，增强重要的通道特征，提高小目标行人的检测精度。在颈部网络中设计了基于深度可分离卷积的DBL模块，进一步缩减模型的参数量。另外，为了提高低光照条件下行人的检测精度，利用可见光模态和红外模态在不同光照条件下特征互补的特点，提出了基于图像熵的可见光与红外模态加权融合方法，并设计了融合模块EWF。相比与基准方法，该算法对于不同光照条件下的行人目标，模型的mAP提高55.5%，MR降低85.9%，模型的推理速度达到33.4帧/秒，并且均优于其他经典的目标检测算法，为边缘计算和低光照场景下的行人目标的实时检测提供了可能。

华北电力大学苑朝博士及其学生赵亚冬等针对低光照环境下行人检测精度低和模型参数量大的问题，基于YOLO框架，提出一种轻量化的多模态行人检测算法EF-DEM-YOLO，该算法采用轻量的ES-MobileNet作为主干特征提取网络，为了提高低光照条件下行人的检测精度，提出了基于图像熵的可见光与红外模态加权融合方法，并设计了融合模块EWF。相比与基准方法，模型的mAP提高55.5%，模型的推理速度达到33.4帧/秒。该算法为边缘计算和低光照场景下的行人目标的实时检测提供了可能。

关键词: 行人检测, YOLO, 轻量化, 多模态, 深度可分离, 图像熵

Abstract:

To address the problems of low accuracy in pedestrian detection and the large number of model parameters in low-light environments, a lightweight multi-modal pedestrian detection algorithm named EF-DEM-YOLO was proposed based on the YOLO framework. This algorithm employed the lightweight ES-MobileNet as the backbone feature extraction network and integrated ECA and SE-ECA attention mechanism modules in this network to enhance the important channel features, thereby elevating the detection accuracy for small-target pedestrians. A DBL module based on depth-separable convolution was also designed in the neck network to further reduce the number of parameters in the model. In addition, to improve the detection accuracy of pedestrians under low-light conditions, a weighted fusion method of visible and infrared modes based on image entropy was proposed. This method utilized the complementary features of visible and infrared modes under different lighting conditions, and the fusion module EWF is designed. In comparison to baseline methods: the proposed algorithm yielded significant improvements for pedestrian targets under different lighting conditions. The model’s mAP was increased by 55.5%, the MR was reduced by 85.9%, and the inference speed reached 33.4 frames per second, outperforming other classical object detection algorithms. This algorithm provided the possibility for real-time detection of pedestrian targets in edge computing and low-light scenes.

Key words: pedestrian detection, YOLO, lightweighting, multi-modality, depth separability, image entropy

中图分类号:

TP391

苑朝, 赵亚冬, 张耀, 王嘉璇, 徐大伟, 翟永杰, 朱松松. 基于YOLO轻量化的多模态行人检测算法[J]. 图学学报, 2024, 45(1): 35-46.

YUAN Chao, ZHAO Yadong, ZHANG Yao, WANG Jiaxuan, XU Dawei, ZHAI Yongjie, ZHU Songsong. Lightweight multi-modal pedestrian detection algorithm based on YOLO[J]. Journal of Graphics, 2024, 45(1): 35-46.

图/表 22

图1 算法整体框架

Fig. 1 Overall algorithm architecture

图2 Bneck模块结构图

Fig. 2 Structure of Bneck module

图3 SE模块结构图

Fig. 3 Structure of SE module

图4 ECA模块结构图

Fig. 4 Structure of ECA module

图5 ECA-SE模块结构图

Fig. 5 Structure of ECA-SE module

表1 ES-MobileNet网络

Table 1 ES-MobileNet network

特征层	输入尺度	基本单元	注意力机制	激活函数
1	512×512×3	Conv2d	不施加	h-swish
2	256×256×16	Bneck,3×3	不施加	ReLU
3	256×256×16	Bneck,3×3	不施加	ReLU
4	128×128×24	Bneck,3×3	不施加	ReLU
5	128×128×24	Bneck,5×5	ES-ECA注意力	ReLU
6	64×64×40	Bneck,5×5	ES-ECA注意力	ReLU
7	64×64×40	Bneck,5×5	ES-ECA注意力	ReLU
8	64×64×40	Bneck,3×3	不施加	h-swish
9	32×32×80	Bneck,3×3	不施加	h-swish
10	32×32×80	Bneck,3×3	不施加	h-swish
11	32×32×80	Bneck,3×3	不施加	h-swish
12	32×32×80	Bneck,3×3	ECA注意力	h-swish
13	32×32×112	Bneck,3×3	ECA注意力	h-swish
14	32×32×112	Bneck,5×5	ECA注意力	h-swish
15	16×16×160	Bneck,5×5	ECA注意力	h-swish
16	16×16×160	Bneck,5×5	ECA注意力	h-swish
17	16×16×160	Conv2d	不施加	h-swish

图6 可分离卷积结构图

Fig. 6 Structure of Separable convolutional

图7 不同场景下的图像((a)白天可见光图像；(b)白天红外图像；(c)夜晚可见光图像；(d)夜晚红外图像)

Fig. 7 Images in different scenarios ((a) Daytime visible image; (b) Daytime infrared image; (c) Night visible image; (d) Night infrared image)

图8 模态融合网络结构

Fig. 8 Structure of modal converged network

图9 添加噪声((a)椒盐噪声；(b)高斯噪声)

Fig. 9 Add noise ((a) Salt and pepper noise; (b) Gaussian noise)

图10 几何变换((a)水平翻转；(b)垂直翻转)

Fig. 10 Geometric transformations ((a) Flip horizontally; (b) Flip vertically)

图11 随机色彩抖动((a)对比度增加；(b)亮度增加)

Fig. 11 Random color dithering ((a) Contrast increases; (b) Brightness increases)

图12 损失函数曲线

Fig. 12 Loss function curve

图13 mAP和MR曲线

Fig. 13 mAP and MR curve

表2 模型轻量化对比实验

Table 2 Model lightweight comparison experiment

模型名称	MR/ %	mAP/ %	模型大小/ MB	FPS/ (帧/秒)
YOLOv4	17.6	87.0	244	15.6
M-YOLO	22.3	82.2	53.8	25.6
EM-YOLO	18.8	86.8	50.6	26.0
DEM-YOLO	18.9	86.4	42.5	28.7

表3 模型多模态对比实验

Table 3 Model multimodal comparison experiment

模型名称	MR/ %	mAP/ %	模型大小/ MB	FPS/ (帧/秒)
ACF-T-T	47.0	61.4	-	32.0
TC-D^[32]	21.7	82.8	235.0	15.6
MSDS-RCNN^[33]	11.6	90.3	356.0	10.8
F-DEM-YOLO	10.3	90.6	65.0	29.5
EF-DEM-YOLO	6.6	95.5	55.3	33.4

表4 不同算法性能指标对比结果

Table 4 Comparison results of performance indicators of different algorithms

模型名称	主干网络	mAP/%			FPS/(帧/秒)
模型名称	主干网络	全天测试集	白天测试集	夜晚测试集	FPS/(帧/秒)
Faster R-CNN	VGG-16	92.1	94.5	88.7	2.0
DenseBox	VGG-19	70.8	76.1	58.7	12.1
YOLOv3	Darknet-53	78.9	82.1	71.6	13.5
YOLOv4	CSP-Darknet-53	87.0	90.8	81.6	15.6
YOLOv5	CSP-Darknet-53	88.2	92.2	82.5	21.5
本文算法	ES-MobileNetv3	95.5	96.2	93.9	33.4

图14 傍晚的图像检测结果((a)可见光图；(b)红外图；(c) Faster R-CNN；(d) DenseBox；(e) YOLOv3；(f) YOLOv4；(g) YOLOv5；(h)本文算法)

Fig. 14 Evening image detection results ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

图15 夜晚灯光多的图像检测结果((a)可见光图；(b)红外图；(c) Faster R-CNN；(d) DenseBox；(e) YOLOv3；(f) YOLOv4；(g) YOLOv5；(h)本文算法)

Fig. 15 Image detection results with many lights at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

图16 夜晚灯光少的图像检测结果((a)可见光图；(b)红外图；(c) Faster R-CNN；(d) DenseBox；(e) YOLOv3；(f) YOLOv4；(g) YOLOv5；(h)本文算法)

Fig. 16 Image detection results with low light at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

图17 测试平台

Fig. 17 Test platform

图18 行人数量变化图

Fig. 18 Graph of changes in pedestrian numbers

参考文献 36

[1]	曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
	CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese).
[2]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[3]	吴岸聪, 林城梽, 郑伟诗. 面向跨模态行人重识别的单模态自监督信息挖掘[J]. 中国图象图形学报, 2022, 27(10): 2843-2859.
	WU A C, LIN C Z, ZHENG W S. Single-modality self-supervised information mining for cross-modality person re-identification[J]. Journal of Image and Graphics, 2022, 27(10): 2843-2859 (in Chinese).
[4]	PANIGRAHI S, RAJU U S N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection[J]. International Journal of Multimedia Information Retrieval, 2022, 11(3): 409-430. DOI
[5]	ZHENG H T, LIU H, QI W, et al. Little-YOLOv4: a lightweight pedestrian detection network based on YOLOv4 and GhostNet[J]. Wireless Communications and Mobile Computing, 2022, 2022: 5155970.
[6]	SONG X W, LI G Y, YANG L, et al. Real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion[J]. Entropy, 2022, 24(8): 1091. DOI URL
[7]	刘小飞, 李明杰. 基于红外成像的夜间车辆行驶轨迹识别方法[J]. 激光杂志, 2022, 43(12): 51-55.
	LIU X F, LI M J. Night vehicle trajectory recognition method based on infrared imaging[J]. Laser Journal, 2022, 43(12): 51-55 (in Chinese).
[8]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[9]	LEE Y, BUI T D, SHIN J. Pedestrian detection based on deep fusion network using feature correlation[C]// 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. New York: IEEE Press, 2019: 694-699.
[10]	ZHUANG Y F, PU Z Y, HU J, et al. Illumination and temperature-aware multispectral networks for edge-computing- enabled pedestrian detection[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(3): 1282-1295. DOI URL
[11]	KIM J U, PARK S, RO Y M. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1510-1523. DOI URL
[12]	LIU L S, KE C Y, LIN H, et al. Research on pedestrian detection algorithm based on MobileNet-YoLo[J]. Computational Intelligence and Neuroscience, 2022, 2022: 8924027.
[13]	SHA M Z, ZENG K, TAO Z M, et al. Lightweight pedestrian detection based on feature multiplexed residual network[J]. Electronics, 2023, 12(4): 918. DOI URL
[14]	LI C, WANG Y D, LIU X M. A multi-pedestrian tracking algorithm for dense scenes based on an attention mechanism and dual data association[J]. Applied Sciences, 2022, 12(19): 9597. DOI URL
[15]	ZOU F M, LI X, XU Q M, et al. Correlation-and-correction fusion attention network for occluded pedestrian detection[J]. IEEE Sensors Journal, 2023, 23(6): 6061-6073. DOI URL
[16]	LI M L, SUN G B, YU J X. A pedestrian detection network model based on improved YOLOv5[J]. Entropy, 2023, 25(2): 381. DOI URL
[17]	HAO S, GAO S, MA X, et al. Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism[J]. Infrared Physics & Technology, 2023, 131: 104660.
[18]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[EB/OL]. (2020-03-24) [2023-03-01]. https://arxiv.org/abs/1910.03151.pdf.
[19]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-03-01]. https://arxiv.org/abs/2004.10934.pdf.
[20]	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO Series in 2021[EB/OL]. (2021-08-6) [2023-03-15]. https://www.researchgate.net/publication/353343997_YOLOX_Exceeding_YOLO_Series_in_2021.
[21]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464. DOI
	HAO P F, LIU L Q, GU R Y. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model[J]. Journal of Graphics, 2023, 44(3): 456-464 (in Chinese).
[22]	杨泳波, 赵远洋, 李振波, 等. 基于胶囊SE-Inception的茄科病害识别方法研究[J]. 图学学报, 2022, 43(1): 28-35.
	YANG Y B, ZHAO Y Y, LI Z B, et al. Solanaceae disease recognition method based on capsule SE-Inception[J]. Journal of Graphics, 2022, 43(1): 28-35 (in Chinese). DOI
[23]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472. DOI
	LUO W Y, FU M Y. On-site monitoring technology of illegal swimming and fishing based on YoloX-ECA[J]. Journal of Graphics, 2023, 44(3): 465-472 (in Chinese).
[24]	YING B Y, XU Y C, ZHANG S A, et al. Weed detection in images of carrot fields based on improved YOLO v4[J]. Traitement Du Signal, 2021, 38(2): 341-348. DOI URL
[25]	ZHANG Y T, YIN Z S, NIE L Z, et al. Attention based multi-layer fusion of multispectral images for pedestrian detection[J]. IEEE Access, 2020, 8: 165071-165084. DOI URL
[26]	ZHENG C H, PEI W J, YAN Q, et al. Pedestrian detection based on gradient and texture feature integration[J]. Neurocomputing, 2017, 228: 71-78. DOI URL
[27]	WEI X, ZHANG H T, LIU S F, et al. Pedestrian detection in underground mines via parallel feature transfer network[J]. Pattern Recognition, 2020, 103: 107195. DOI URL
[28]	SHANNON C E. A mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3): 379-423. DOI URL
[29]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// European Conference on Computer Vision. Cham: Springer, 2014: 346-361.
[30]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[31]	刘学平, 李玙乾, 刘励, 等. 嵌入SENet结构的改进YOLOV3目标识别算法[J]. 计算机工程, 2019, 45(11): 243-248. DOI
	LIU X P, LI Y Q, LIU L, et al. Improved YOLOV3 target recognition algorithm with embedded SENet structure[J]. Computer Engineering, 2019, 45(11): 243-248 (in Chinese). DOI
[32]	KIEU M, BAGDANOV A D, BERTINI M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]// European Conference on Computer Vision. Cham: Springer, 2020: 546-562.
[33]	LI C Y, SONG D, TONG R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[EB/OL]. [2023-03-15]. https://arxiv.org/abs/1808.04818.pdf.
[34]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[35]	田卓钰, 马苗, 杨楷芳. 基于级联注意力与点监督机制的考场目标检测模型[J]. 软件学报, 2022, 33(7): 2633-2645.
	TIAN Z Y, MA M, YANG K F. Object detection model for examination classroom based on cascade attention and point supervision mechanism[J]. Journal of Software, 2022, 33(7): 2633-2645 (in Chinese).
[36]	胡欣, 周运强, 肖剑, 等. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437. DOI
	HU X, ZHOU Y Q, XIAO J, et al. Surface defect detection of threaded steel based on improved YOLOv5[J]. Journal of Graphics, 2023, 44(3): 427-437 (in Chinese). DOI

基于YOLO轻量化的多模态行人检测算法

Lightweight multi-modal pedestrian detection algorithm based on YOLO

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 22

参考文献 36

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658.
[2]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
[3]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[4]	曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744.
[5]	武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778.
[6]	赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790.
[7]	朱强军, 胡斌, 汪慧兰, 王杨. 基于轻量化YOLOv8s交通标志的检测[J]. 图学学报, 2024, 45(3): 422-432.
[8]	李跃华, 仲新, 姚章燕, 胡彬. 基于改进YOLOv5s的着装不规范检测算法研究[J]. 图学学报, 2024, 45(3): 433-445.
[9]	张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453.
[10]	何柳, 安然, 刘姝妍, 李润岐, 陶剑, 曾照洋. 基于知识图谱的航空多模态数据组织与知识发现技术研究[J]. 图学学报, 2024, 45(2): 300-307.
[11]	胡欣, 胡帅, 马丽军, 司利云, 肖剑, 袁晔. 基于融合MBAM与YOLOv5的PCB缺陷检测方法[J]. 图学学报, 2024, 45(1): 47-55.
[12]	王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77.
[13]	翟永杰, 赵晓瑜, 王璐瑶, 王亚茹, 宋晓轲, 朱浩硕. IDD-YOLOv7：一种用于输电线路绝缘子多缺陷的轻量化检测方法[J]. 图学学报, 2024, 45(1): 90-101.
[14]	崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125.
[15]	魏陈浩, 杨睿, 刘振丙, 蓝如师, 孙希延, 罗笑南. 具有双层路由注意力的YOLOv8道路场景目标检测方法[J]. 图学学报, 2023, 44(6): 1104-1111.