Lightweight multi-modal pedestrian detection algorithm based on YOLO

doi:10.11996/JG.j.2095-302X.2024010035

Abstract

Abstract:

To address the problems of low accuracy in pedestrian detection and the large number of model parameters in low-light environments, a lightweight multi-modal pedestrian detection algorithm named EF-DEM-YOLO was proposed based on the YOLO framework. This algorithm employed the lightweight ES-MobileNet as the backbone feature extraction network and integrated ECA and SE-ECA attention mechanism modules in this network to enhance the important channel features, thereby elevating the detection accuracy for small-target pedestrians. A DBL module based on depth-separable convolution was also designed in the neck network to further reduce the number of parameters in the model. In addition, to improve the detection accuracy of pedestrians under low-light conditions, a weighted fusion method of visible and infrared modes based on image entropy was proposed. This method utilized the complementary features of visible and infrared modes under different lighting conditions, and the fusion module EWF is designed. In comparison to baseline methods: the proposed algorithm yielded significant improvements for pedestrian targets under different lighting conditions. The model’s mAP was increased by 55.5%, the MR was reduced by 85.9%, and the inference speed reached 33.4 frames per second, outperforming other classical object detection algorithms. This algorithm provided the possibility for real-time detection of pedestrian targets in edge computing and low-light scenes.

Key words: pedestrian detection, YOLO, lightweighting, multi-modality, depth separability, image entropy

CLC Number:

TP391

YUAN Chao, ZHAO Yadong, ZHANG Yao, WANG Jiaxuan, XU Dawei, ZHAI Yongjie, ZHU Songsong. Lightweight multi-modal pedestrian detection algorithm based on YOLO[J]. Journal of Graphics, 2024, 45(1): 35-46.

Figures/Tables 22

Fig. 1 Overall algorithm architecture

Fig. 2 Structure of Bneck module

Fig. 3 Structure of SE module

Fig. 4 Structure of ECA module

Fig. 5 Structure of ECA-SE module

Table 1 ES-MobileNet network

特征层	输入尺度	基本单元	注意力机制	激活函数
1	512×512×3	Conv2d	不施加	h-swish
2	256×256×16	Bneck,3×3	不施加	ReLU
3	256×256×16	Bneck,3×3	不施加	ReLU
4	128×128×24	Bneck,3×3	不施加	ReLU
5	128×128×24	Bneck,5×5	ES-ECA注意力	ReLU
6	64×64×40	Bneck,5×5	ES-ECA注意力	ReLU
7	64×64×40	Bneck,5×5	ES-ECA注意力	ReLU
8	64×64×40	Bneck,3×3	不施加	h-swish
9	32×32×80	Bneck,3×3	不施加	h-swish
10	32×32×80	Bneck,3×3	不施加	h-swish
11	32×32×80	Bneck,3×3	不施加	h-swish
12	32×32×80	Bneck,3×3	ECA注意力	h-swish
13	32×32×112	Bneck,3×3	ECA注意力	h-swish
14	32×32×112	Bneck,5×5	ECA注意力	h-swish
15	16×16×160	Bneck,5×5	ECA注意力	h-swish
16	16×16×160	Bneck,5×5	ECA注意力	h-swish
17	16×16×160	Conv2d	不施加	h-swish

Fig. 6 Structure of Separable convolutional

Fig. 7 Images in different scenarios ((a) Daytime visible image; (b) Daytime infrared image; (c) Night visible image; (d) Night infrared image)

Fig. 8 Structure of modal converged network

Fig. 9 Add noise ((a) Salt and pepper noise; (b) Gaussian noise)

Fig. 10 Geometric transformations ((a) Flip horizontally; (b) Flip vertically)

Fig. 11 Random color dithering ((a) Contrast increases; (b) Brightness increases)

Fig. 12 Loss function curve

Fig. 13 mAP and MR curve

Table 2 Model lightweight comparison experiment

模型名称	MR/ %	mAP/ %	模型大小/ MB	FPS/ (帧/秒)
YOLOv4	17.6	87.0	244	15.6
M-YOLO	22.3	82.2	53.8	25.6
EM-YOLO	18.8	86.8	50.6	26.0
DEM-YOLO	18.9	86.4	42.5	28.7

Table 3 Model multimodal comparison experiment

模型名称	MR/ %	mAP/ %	模型大小/ MB	FPS/ (帧/秒)
ACF-T-T	47.0	61.4	-	32.0
TC-D^[32]	21.7	82.8	235.0	15.6
MSDS-RCNN^[33]	11.6	90.3	356.0	10.8
F-DEM-YOLO	10.3	90.6	65.0	29.5
EF-DEM-YOLO	6.6	95.5	55.3	33.4

Table 4 Comparison results of performance indicators of different algorithms

模型名称	主干网络	mAP/%			FPS/(帧/秒)
模型名称	主干网络	全天测试集	白天测试集	夜晚测试集	FPS/(帧/秒)
Faster R-CNN	VGG-16	92.1	94.5	88.7	2.0
DenseBox	VGG-19	70.8	76.1	58.7	12.1
YOLOv3	Darknet-53	78.9	82.1	71.6	13.5
YOLOv4	CSP-Darknet-53	87.0	90.8	81.6	15.6
YOLOv5	CSP-Darknet-53	88.2	92.2	82.5	21.5
本文算法	ES-MobileNetv3	95.5	96.2	93.9	33.4

Fig. 14 Evening image detection results ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

Fig. 15 Image detection results with many lights at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

Fig. 16 Image detection results with low light at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)

Fig. 17 Test platform

Fig. 18 Graph of changes in pedestrian numbers

References 36

[1]	曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722.
	CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese).
[2]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[3]	吴岸聪, 林城梽, 郑伟诗. 面向跨模态行人重识别的单模态自监督信息挖掘[J]. 中国图象图形学报, 2022, 27(10): 2843-2859.
	WU A C, LIN C Z, ZHENG W S. Single-modality self-supervised information mining for cross-modality person re-identification[J]. Journal of Image and Graphics, 2022, 27(10): 2843-2859 (in Chinese).
[4]	PANIGRAHI S, RAJU U S N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection[J]. International Journal of Multimedia Information Retrieval, 2022, 11(3): 409-430. DOI
[5]	ZHENG H T, LIU H, QI W, et al. Little-YOLOv4: a lightweight pedestrian detection network based on YOLOv4 and GhostNet[J]. Wireless Communications and Mobile Computing, 2022, 2022: 5155970.
[6]	SONG X W, LI G Y, YANG L, et al. Real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion[J]. Entropy, 2022, 24(8): 1091. DOI URL
[7]	刘小飞, 李明杰. 基于红外成像的夜间车辆行驶轨迹识别方法[J]. 激光杂志, 2022, 43(12): 51-55.
	LIU X F, LI M J. Night vehicle trajectory recognition method based on infrared imaging[J]. Laser Journal, 2022, 43(12): 51-55 (in Chinese).
[8]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[9]	LEE Y, BUI T D, SHIN J. Pedestrian detection based on deep fusion network using feature correlation[C]// 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. New York: IEEE Press, 2019: 694-699.
[10]	ZHUANG Y F, PU Z Y, HU J, et al. Illumination and temperature-aware multispectral networks for edge-computing- enabled pedestrian detection[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(3): 1282-1295. DOI URL
[11]	KIM J U, PARK S, RO Y M. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1510-1523. DOI URL
[12]	LIU L S, KE C Y, LIN H, et al. Research on pedestrian detection algorithm based on MobileNet-YoLo[J]. Computational Intelligence and Neuroscience, 2022, 2022: 8924027.
[13]	SHA M Z, ZENG K, TAO Z M, et al. Lightweight pedestrian detection based on feature multiplexed residual network[J]. Electronics, 2023, 12(4): 918. DOI URL
[14]	LI C, WANG Y D, LIU X M. A multi-pedestrian tracking algorithm for dense scenes based on an attention mechanism and dual data association[J]. Applied Sciences, 2022, 12(19): 9597. DOI URL
[15]	ZOU F M, LI X, XU Q M, et al. Correlation-and-correction fusion attention network for occluded pedestrian detection[J]. IEEE Sensors Journal, 2023, 23(6): 6061-6073. DOI URL
[16]	LI M L, SUN G B, YU J X. A pedestrian detection network model based on improved YOLOv5[J]. Entropy, 2023, 25(2): 381. DOI URL
[17]	HAO S, GAO S, MA X, et al. Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism[J]. Infrared Physics & Technology, 2023, 131: 104660.
[18]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[EB/OL]. (2020-03-24) [2023-03-01]. https://arxiv.org/abs/1910.03151.pdf.
[19]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-03-01]. https://arxiv.org/abs/2004.10934.pdf.
[20]	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO Series in 2021[EB/OL]. (2021-08-6) [2023-03-15]. https://www.researchgate.net/publication/353343997_YOLOX_Exceeding_YOLO_Series_in_2021.
[21]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464. DOI
	HAO P F, LIU L Q, GU R Y. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model[J]. Journal of Graphics, 2023, 44(3): 456-464 (in Chinese).
[22]	杨泳波, 赵远洋, 李振波, 等. 基于胶囊SE-Inception的茄科病害识别方法研究[J]. 图学学报, 2022, 43(1): 28-35.
	YANG Y B, ZHAO Y Y, LI Z B, et al. Solanaceae disease recognition method based on capsule SE-Inception[J]. Journal of Graphics, 2022, 43(1): 28-35 (in Chinese). DOI
[23]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472. DOI
	LUO W Y, FU M Y. On-site monitoring technology of illegal swimming and fishing based on YoloX-ECA[J]. Journal of Graphics, 2023, 44(3): 465-472 (in Chinese).
[24]	YING B Y, XU Y C, ZHANG S A, et al. Weed detection in images of carrot fields based on improved YOLO v4[J]. Traitement Du Signal, 2021, 38(2): 341-348. DOI URL
[25]	ZHANG Y T, YIN Z S, NIE L Z, et al. Attention based multi-layer fusion of multispectral images for pedestrian detection[J]. IEEE Access, 2020, 8: 165071-165084. DOI URL
[26]	ZHENG C H, PEI W J, YAN Q, et al. Pedestrian detection based on gradient and texture feature integration[J]. Neurocomputing, 2017, 228: 71-78. DOI URL
[27]	WEI X, ZHANG H T, LIU S F, et al. Pedestrian detection in underground mines via parallel feature transfer network[J]. Pattern Recognition, 2020, 103: 107195. DOI URL
[28]	SHANNON C E. A mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3): 379-423. DOI URL
[29]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// European Conference on Computer Vision. Cham: Springer, 2014: 346-361.
[30]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045.
[31]	刘学平, 李玙乾, 刘励, 等. 嵌入SENet结构的改进YOLOV3目标识别算法[J]. 计算机工程, 2019, 45(11): 243-248. DOI
	LIU X P, LI Y Q, LIU L, et al. Improved YOLOV3 target recognition algorithm with embedded SENet structure[J]. Computer Engineering, 2019, 45(11): 243-248 (in Chinese). DOI
[32]	KIEU M, BAGDANOV A D, BERTINI M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]// European Conference on Computer Vision. Cham: Springer, 2020: 546-562.
[33]	LI C Y, SONG D, TONG R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[EB/OL]. [2023-03-15]. https://arxiv.org/abs/1808.04818.pdf.
[34]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[35]	田卓钰, 马苗, 杨楷芳. 基于级联注意力与点监督机制的考场目标检测模型[J]. 软件学报, 2022, 33(7): 2633-2645.
	TIAN Z Y, MA M, YANG K F. Object detection model for examination classroom based on cascade attention and point supervision mechanism[J]. Journal of Software, 2022, 33(7): 2633-2645 (in Chinese).
[36]	胡欣, 周运强, 肖剑, 等. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437. DOI
	HU X, ZHOU Y Q, XIAO J, et al. Surface defect detection of threaded steel based on improved YOLOv5[J]. Journal of Graphics, 2023, 44(3): 427-437 (in Chinese). DOI