Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 35-46.DOI: 10.11996/JG.j.2095-302X.2024010035
• Image Processing and Computer Vision • Previous Articles Next Articles
					
													YUAN Chao1(
), ZHAO Yadong1, ZHANG Yao1, WANG Jiaxuan1, XU Dawei1,2(
), ZHAI Yongjie1, ZHU Songsong3
												  
						
						
						
					
				
Received:2023-07-11
															
							
															
							
																	Accepted:2023-10-18
															
							
																	Online:2024-02-29
															
							
																	Published:2024-02-29
															
						Contact:
								XU Dawei (1990-), lecturer, Ph.D. His main research interests cover modeling and control of rope drive manipulator, Ultra-redundant robotic arm motion planning. E-mail:About author:YUAN Chao (1985-), lecturer, Ph.D. His main research interests cover robotics and sensor system design. E-mail:chaoyuan@ncepu.edu.cn
Supported by:CLC Number:
YUAN Chao, ZHAO Yadong, ZHANG Yao, WANG Jiaxuan, XU Dawei, ZHAI Yongjie, ZHU Songsong. Lightweight multi-modal pedestrian detection algorithm based on YOLO[J]. Journal of Graphics, 2024, 45(1): 35-46.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024010035
| 特征层 | 输入 尺度  |  基本 单元  |  注意力 机制  |  激活 函数  | 
|---|---|---|---|---|
| 1 | 512×512×3 | Conv2d | 不施加 | h-swish | 
| 2 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU | 
| 3 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU | 
| 4 | 128×128×24 | Bneck,3×3 | 不施加 | ReLU | 
| 5 | 128×128×24 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 6 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 7 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 8 | 64×64×40 | Bneck,3×3 | 不施加 | h-swish | 
| 9 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 10 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 11 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 12 | 32×32×80 | Bneck,3×3 | ECA注意力 | h-swish | 
| 13 | 32×32×112 | Bneck,3×3 | ECA注意力 | h-swish | 
| 14 | 32×32×112 | Bneck,5×5 | ECA注意力 | h-swish | 
| 15 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish | 
| 16 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish | 
| 17 | 16×16×160 | Conv2d | 不施加 | h-swish | 
Table 1 ES-MobileNet network
| 特征层 | 输入 尺度  |  基本 单元  |  注意力 机制  |  激活 函数  | 
|---|---|---|---|---|
| 1 | 512×512×3 | Conv2d | 不施加 | h-swish | 
| 2 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU | 
| 3 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU | 
| 4 | 128×128×24 | Bneck,3×3 | 不施加 | ReLU | 
| 5 | 128×128×24 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 6 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 7 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU | 
| 8 | 64×64×40 | Bneck,3×3 | 不施加 | h-swish | 
| 9 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 10 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 11 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish | 
| 12 | 32×32×80 | Bneck,3×3 | ECA注意力 | h-swish | 
| 13 | 32×32×112 | Bneck,3×3 | ECA注意力 | h-swish | 
| 14 | 32×32×112 | Bneck,5×5 | ECA注意力 | h-swish | 
| 15 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish | 
| 16 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish | 
| 17 | 16×16×160 | Conv2d | 不施加 | h-swish | 
| 模型名称 | MR/ %  |  mAP/ %  |  模型大小/ MB  |  FPS/ (帧/秒)  | 
|---|---|---|---|---|
| YOLOv4 | 17.6 | 87.0 | 244 | 15.6 | 
| M-YOLO | 22.3 | 82.2 | 53.8 | 25.6 | 
| EM-YOLO | 18.8 | 86.8 | 50.6 | 26.0 | 
| DEM-YOLO | 18.9 | 86.4 | 42.5 | 28.7 | 
Table 2 Model lightweight comparison experiment
| 模型名称 | MR/ %  |  mAP/ %  |  模型大小/ MB  |  FPS/ (帧/秒)  | 
|---|---|---|---|---|
| YOLOv4 | 17.6 | 87.0 | 244 | 15.6 | 
| M-YOLO | 22.3 | 82.2 | 53.8 | 25.6 | 
| EM-YOLO | 18.8 | 86.8 | 50.6 | 26.0 | 
| DEM-YOLO | 18.9 | 86.4 | 42.5 | 28.7 | 
| 模型名称 | MR/ %  |  mAP/ %  |  模型大小/ MB  |  FPS/ (帧/秒)  | 
|---|---|---|---|---|
| ACF-T-T | 47.0 | 61.4 | - | 32.0 | 
| TC-D[ |  21.7 | 82.8 | 235.0 | 15.6 | 
| MSDS-RCNN[ |  11.6 | 90.3 | 356.0 | 10.8 | 
| F-DEM-YOLO | 10.3 | 90.6 | 65.0 | 29.5 | 
| EF-DEM-YOLO | 6.6 | 95.5 | 55.3 | 33.4 | 
Table 3 Model multimodal comparison experiment
| 模型名称 | MR/ %  |  mAP/ %  |  模型大小/ MB  |  FPS/ (帧/秒)  | 
|---|---|---|---|---|
| ACF-T-T | 47.0 | 61.4 | - | 32.0 | 
| TC-D[ |  21.7 | 82.8 | 235.0 | 15.6 | 
| MSDS-RCNN[ |  11.6 | 90.3 | 356.0 | 10.8 | 
| F-DEM-YOLO | 10.3 | 90.6 | 65.0 | 29.5 | 
| EF-DEM-YOLO | 6.6 | 95.5 | 55.3 | 33.4 | 
| 模型 名称  |  主干 网络  |  mAP/% | FPS/(帧/秒) | ||
|---|---|---|---|---|---|
| 全天测试集 | 白天测试集 | 夜晚测试集 | |||
| Faster R-CNN | VGG-16 | 92.1 | 94.5 | 88.7 | 2.0 | 
| DenseBox | VGG-19 | 70.8 | 76.1 | 58.7 | 12.1 | 
| YOLOv3 | Darknet-53 | 78.9 | 82.1 | 71.6 | 13.5 | 
| YOLOv4 | CSP-Darknet-53 | 87.0 | 90.8 | 81.6 | 15.6 | 
| YOLOv5 | CSP-Darknet-53 | 88.2 | 92.2 | 82.5 | 21.5 | 
| 本文算法 | ES-MobileNetv3 | 95.5 | 96.2 | 93.9 | 33.4 | 
Table 4 Comparison results of performance indicators of different algorithms
| 模型 名称  |  主干 网络  |  mAP/% | FPS/(帧/秒) | ||
|---|---|---|---|---|---|
| 全天测试集 | 白天测试集 | 夜晚测试集 | |||
| Faster R-CNN | VGG-16 | 92.1 | 94.5 | 88.7 | 2.0 | 
| DenseBox | VGG-19 | 70.8 | 76.1 | 58.7 | 12.1 | 
| YOLOv3 | Darknet-53 | 78.9 | 82.1 | 71.6 | 13.5 | 
| YOLOv4 | CSP-Darknet-53 | 87.0 | 90.8 | 81.6 | 15.6 | 
| YOLOv5 | CSP-Darknet-53 | 88.2 | 92.2 | 82.5 | 21.5 | 
| 本文算法 | ES-MobileNetv3 | 95.5 | 96.2 | 93.9 | 33.4 | 
																													Fig. 14 Evening image detection results ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
																													Fig. 15 Image detection results with many lights at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
																													Fig. 16 Image detection results with low light at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
| [1] | 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. | 
| CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese). | |
| [2] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. | 
| [3] | 吴岸聪, 林城梽, 郑伟诗. 面向跨模态行人重识别的单模态自监督信息挖掘[J]. 中国图象图形学报, 2022, 27(10): 2843-2859. | 
| WU A C, LIN C Z, ZHENG W S. Single-modality self-supervised information mining for cross-modality person re-identification[J]. Journal of Image and Graphics, 2022, 27(10): 2843-2859 (in Chinese). | |
| [4] |  
											 PANIGRAHI S, RAJU U S N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection[J]. International Journal of Multimedia Information Retrieval, 2022, 11(3): 409-430. 
																							 DOI  | 
										
| [5] | ZHENG H T, LIU H, QI W, et al. Little-YOLOv4: a lightweight pedestrian detection network based on YOLOv4 and GhostNet[J]. Wireless Communications and Mobile Computing, 2022, 2022: 5155970. | 
| [6] |  
											 SONG X W, LI G Y, YANG L, et al. Real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion[J]. Entropy, 2022, 24(8): 1091. 
																							 DOI URL  | 
										
| [7] | 刘小飞, 李明杰. 基于红外成像的夜间车辆行驶轨迹识别方法[J]. 激光杂志, 2022, 43(12): 51-55. | 
| LIU X F, LI M J. Night vehicle trajectory recognition method based on infrared imaging[J]. Laser Journal, 2022, 43(12): 51-55 (in Chinese). | |
| [8] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. | 
| [9] | LEE Y, BUI T D, SHIN J. Pedestrian detection based on deep fusion network using feature correlation[C]// 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. New York: IEEE Press, 2019: 694-699. | 
| [10] |  
											 ZHUANG Y F, PU Z Y, HU J, et al. Illumination and temperature-aware multispectral networks for edge-computing- enabled pedestrian detection[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(3): 1282-1295. 
																							 DOI URL  | 
										
| [11] |  
											 KIM J U, PARK S, RO Y M. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1510-1523. 
																							 DOI URL  | 
										
| [12] | LIU L S, KE C Y, LIN H, et al. Research on pedestrian detection algorithm based on MobileNet-YoLo[J]. Computational Intelligence and Neuroscience, 2022, 2022: 8924027. | 
| [13] |  
											 SHA M Z, ZENG K, TAO Z M, et al. Lightweight pedestrian detection based on feature multiplexed residual network[J]. Electronics, 2023, 12(4): 918. 
																							 DOI URL  | 
										
| [14] |  
											 LI C, WANG Y D, LIU X M. A multi-pedestrian tracking algorithm for dense scenes based on an attention mechanism and dual data association[J]. Applied Sciences, 2022, 12(19): 9597. 
																							 DOI URL  | 
										
| [15] |  
											 ZOU F M, LI X, XU Q M, et al. Correlation-and-correction fusion attention network for occluded pedestrian detection[J]. IEEE Sensors Journal, 2023, 23(6): 6061-6073. 
																							 DOI URL  | 
										
| [16] |  
											 LI M L, SUN G B, YU J X. A pedestrian detection network model based on improved YOLOv5[J]. Entropy, 2023, 25(2): 381. 
																							 DOI URL  | 
										
| [17] | HAO S, GAO S, MA X, et al. Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism[J]. Infrared Physics & Technology, 2023, 131: 104660. | 
| [18] | WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[EB/OL]. (2020-03-24) [2023-03-01]. https://arxiv.org/abs/1910.03151.pdf. | 
| [19] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-03-01]. https://arxiv.org/abs/2004.10934.pdf. | 
| [20] | GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO Series in 2021[EB/OL]. (2021-08-6) [2023-03-15]. https://www.researchgate.net/publication/353343997_YOLOX_Exceeding_YOLO_Series_in_2021. | 
| [21] |  
											 郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464. 
																							 DOI  | 
										
| HAO P F, LIU L Q, GU R Y. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model[J]. Journal of Graphics, 2023, 44(3): 456-464 (in Chinese). | |
| [22] | 杨泳波, 赵远洋, 李振波, 等. 基于胶囊SE-Inception的茄科病害识别方法研究[J]. 图学学报, 2022, 43(1): 28-35. | 
|  
											 YANG Y B, ZHAO Y Y, LI Z B, et al. Solanaceae disease recognition method based on capsule SE-Inception[J]. Journal of Graphics, 2022, 43(1): 28-35 (in Chinese). 
																							 DOI  | 
										|
| [23] |  
											 罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472. 
																							 DOI  | 
										
| LUO W Y, FU M Y. On-site monitoring technology of illegal swimming and fishing based on YoloX-ECA[J]. Journal of Graphics, 2023, 44(3): 465-472 (in Chinese). | |
| [24] |  
											 YING B Y, XU Y C, ZHANG S A, et al. Weed detection in images of carrot fields based on improved YOLO v4[J]. Traitement Du Signal, 2021, 38(2): 341-348. 
																							 DOI URL  | 
										
| [25] |  
											 ZHANG Y T, YIN Z S, NIE L Z, et al. Attention based multi-layer fusion of multispectral images for pedestrian detection[J]. IEEE Access, 2020, 8: 165071-165084. 
																							 DOI URL  | 
										
| [26] |  
											 ZHENG C H, PEI W J, YAN Q, et al. Pedestrian detection based on gradient and texture feature integration[J]. Neurocomputing, 2017, 228: 71-78. 
																							 DOI URL  | 
										
| [27] |  
											 WEI X, ZHANG H T, LIU S F, et al. Pedestrian detection in underground mines via parallel feature transfer network[J]. Pattern Recognition, 2020, 103: 107195. 
																							 DOI URL  | 
										
| [28] |  
											 SHANNON C E. A mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3): 379-423. 
																							 DOI URL  | 
										
| [29] | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// European Conference on Computer Vision. Cham: Springer, 2014: 346-361. | 
| [30] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. | 
| [31] |  
											 刘学平, 李玙乾, 刘励, 等. 嵌入SENet结构的改进YOLOV3目标识别算法[J]. 计算机工程, 2019, 45(11): 243-248. 
																							 DOI  | 
										
|  
											 LIU X P, LI Y Q, LIU L, et al. Improved YOLOV3 target recognition algorithm with embedded SENet structure[J]. Computer Engineering, 2019, 45(11): 243-248 (in Chinese). 
																							 DOI  | 
										|
| [32] | KIEU M, BAGDANOV A D, BERTINI M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]// European Conference on Computer Vision. Cham: Springer, 2020: 546-562. | 
| [33] | LI C Y, SONG D, TONG R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[EB/OL]. [2023-03-15]. https://arxiv.org/abs/1808.04818.pdf. | 
| [34] |  
											 REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. 
																							 DOI PMID  | 
										
| [35] | 田卓钰, 马苗, 杨楷芳. 基于级联注意力与点监督机制的考场目标检测模型[J]. 软件学报, 2022, 33(7): 2633-2645. | 
| TIAN Z Y, MA M, YANG K F. Object detection model for examination classroom based on cascade attention and point supervision mechanism[J]. Journal of Software, 2022, 33(7): 2633-2645 (in Chinese). | |
| [36] |  
											 胡欣, 周运强, 肖剑, 等. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437. 
																							 DOI  | 
										
|  
											 HU X, ZHOU Y Q, XIAO J, et al. Surface defect detection of threaded steel based on improved YOLOv5[J]. Journal of Graphics, 2023, 44(3): 427-437 (in Chinese). 
																							 DOI  | 
										
| [1] | LI Daxiang, JI Zhan, LIU Ying, TANG Yao. Improving YOLOv7 remote sensing image target detection algorithm [J]. Journal of Graphics, 2024, 45(4): 650-658. | 
| [2] | HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang. Binocular ranging method based on improved YOLOv8 and GMM image point set matching [J]. Journal of Graphics, 2024, 45(4): 714-725. | 
| [3] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. | 
| [4] | ZENG Zhichao, XU Yue, WANG Jingyu, YE Yuanlong, HUANG Zhikai, WANG Huan. A water surface target detection algorithm based on SOE-YOLO lightweight network [J]. Journal of Graphics, 2024, 45(4): 736-744. | 
| [5] | WU Bing, TIAN Ying. Research on multi-scale road damage detection algorithm based on attention mechanism [J]. Journal of Graphics, 2024, 45(4): 770-778. | 
| [6] | ZHAO Lei, LI Dong, FANG Jiandong, CAO Qi. Improved YOLO object detection algorithm for traffic signs [J]. Journal of Graphics, 2024, 45(4): 779-790. | 
| [7] | ZHU Qiangjun, HU Bin, WANG Huilan, WANG Yang. Detection of traffic signs based on lightweight YOLOv8s [J]. Journal of Graphics, 2024, 45(3): 422-432. | 
| [8] | LI Yuehua, ZHONG Xin, YAO Zhangyan, HU Bin. Detection of dress code violations based on improved YOLOv5s [J]. Journal of Graphics, 2024, 45(3): 433-445. | 
| [9] | ZHANG Xiangsheng, YANG Xiao. Defect detection method of rubber seal ring based on improved YOLOv7-tiny [J]. Journal of Graphics, 2024, 45(3): 446-453. | 
| [10] | HU Xin, HU Shuai, MA Lijun, SI Liyun, XIAO Jian, YUAN Ye. PCB defect detection method based on fusion of MBAM and YOLOv5 [J]. Journal of Graphics, 2024, 45(1): 47-55. | 
| [11] | ZHAI Yongjie, ZHAO Xiaoyu, WANG Luyao, WANG Yaru, SONG Xiaoke, ZHU Haoshuo. IDD-YOLOv7: a lightweight method for multiple defect detection of insulators in transmission lines [J]. Journal of Graphics, 2024, 45(1): 90-101. | 
| [12] | CUI Kebin, JIAO Jingyi. Steel surface defect detection algorithm based on MCB-FAH-YOLOv8 [J]. Journal of Graphics, 2024, 45(1): 112-125. | 
| [13] | WEI Chen-hao, YANG Rui, LIU Zhen-bing, LAN Ru-shi, SUN Xi-yan, LUO Xiao-nan. YOLOv8 with bi-level routing attention for road scene object detection [J]. Journal of Graphics, 2023, 44(6): 1104-1111. | 
| [14] | WANG Da-fu, WANG Jing, SHI Yu-kai, DENG Zhi-wen, JIA Zhi-yong. Research on image privacy detection based on deep transfer learning [J]. Journal of Graphics, 2023, 44(6): 1112-1120. | 
| [15] | GAO Ang, LIANG Xing-zhu, XIA Chen-xing, ZHANG Chun-jiong. A dense pedestrian detection algorithm with improved YOLOv8 [J]. Journal of Graphics, 2023, 44(5): 890-898. | 
| Viewed | ||||||
| 
										Full text | 
									
										 | 
								|||||
| 
										Abstract | 
									
										 | 
								|||||