图学学报 ›› 2024, Vol. 45 ›› Issue (1): 35-46.DOI: 10.11996/JG.j.2095-302X.2024010035
苑朝1(), 赵亚冬1, 张耀1, 王嘉璇1, 徐大伟1,2(
), 翟永杰1, 朱松松3
收稿日期:
2023-07-11
接受日期:
2023-10-18
出版日期:
2024-02-29
发布日期:
2024-02-29
通讯作者:
徐大伟(1990-),男,讲师,博士。主要研究方向为绳驱机械臂建模与控制、超冗余机械臂运动规划。E-mail:xudawei@ncepu.edu.cn第一作者:
苑朝(1985-),男,讲师,博士。主要研究方向为机器人学、传感器系统设计。E-mail:chaoyuan@ncepu.edu.cn
基金资助:
YUAN Chao1(), ZHAO Yadong1, ZHANG Yao1, WANG Jiaxuan1, XU Dawei1,2(
), ZHAI Yongjie1, ZHU Songsong3
Received:
2023-07-11
Accepted:
2023-10-18
Published:
2024-02-29
Online:
2024-02-29
First author:
YUAN Chao (1985-), lecturer, Ph.D. His main research interests cover robotics and sensor system design. E-mail:chaoyuan@ncepu.edu.cn
Supported by:
摘要:
针对低光照环境下行人检测精度低和模型参数量大的问题,基于YOLO框架,提出一种轻量化的多模态行人检测算法EF-DEM-YOLO。采用轻量的ES-MobileNet作为主干特征提取网络,并在该网络中引入ECA和SE-ECA注意力机制模块,增强重要的通道特征,提高小目标行人的检测精度。在颈部网络中设计了基于深度可分离卷积的DBL模块,进一步缩减模型的参数量。另外,为了提高低光照条件下行人的检测精度,利用可见光模态和红外模态在不同光照条件下特征互补的特点,提出了基于图像熵的可见光与红外模态加权融合方法,并设计了融合模块EWF。相比与基准方法,该算法对于不同光照条件下的行人目标,模型的mAP提高55.5%,MR降低85.9%,模型的推理速度达到33.4帧/秒,并且均优于其他经典的目标检测算法,为边缘计算和低光照场景下的行人目标的实时检测提供了可能。
中图分类号:
苑朝, 赵亚冬, 张耀, 王嘉璇, 徐大伟, 翟永杰, 朱松松. 基于YOLO轻量化的多模态行人检测算法[J]. 图学学报, 2024, 45(1): 35-46.
YUAN Chao, ZHAO Yadong, ZHANG Yao, WANG Jiaxuan, XU Dawei, ZHAI Yongjie, ZHU Songsong. Lightweight multi-modal pedestrian detection algorithm based on YOLO[J]. Journal of Graphics, 2024, 45(1): 35-46.
特征层 | 输入 尺度 | 基本 单元 | 注意力 机制 | 激活 函数 |
---|---|---|---|---|
1 | 512×512×3 | Conv2d | 不施加 | h-swish |
2 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU |
3 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU |
4 | 128×128×24 | Bneck,3×3 | 不施加 | ReLU |
5 | 128×128×24 | Bneck,5×5 | ES-ECA注意力 | ReLU |
6 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU |
7 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU |
8 | 64×64×40 | Bneck,3×3 | 不施加 | h-swish |
9 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
10 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
11 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
12 | 32×32×80 | Bneck,3×3 | ECA注意力 | h-swish |
13 | 32×32×112 | Bneck,3×3 | ECA注意力 | h-swish |
14 | 32×32×112 | Bneck,5×5 | ECA注意力 | h-swish |
15 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish |
16 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish |
17 | 16×16×160 | Conv2d | 不施加 | h-swish |
表1 ES-MobileNet网络
Table 1 ES-MobileNet network
特征层 | 输入 尺度 | 基本 单元 | 注意力 机制 | 激活 函数 |
---|---|---|---|---|
1 | 512×512×3 | Conv2d | 不施加 | h-swish |
2 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU |
3 | 256×256×16 | Bneck,3×3 | 不施加 | ReLU |
4 | 128×128×24 | Bneck,3×3 | 不施加 | ReLU |
5 | 128×128×24 | Bneck,5×5 | ES-ECA注意力 | ReLU |
6 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU |
7 | 64×64×40 | Bneck,5×5 | ES-ECA注意力 | ReLU |
8 | 64×64×40 | Bneck,3×3 | 不施加 | h-swish |
9 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
10 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
11 | 32×32×80 | Bneck,3×3 | 不施加 | h-swish |
12 | 32×32×80 | Bneck,3×3 | ECA注意力 | h-swish |
13 | 32×32×112 | Bneck,3×3 | ECA注意力 | h-swish |
14 | 32×32×112 | Bneck,5×5 | ECA注意力 | h-swish |
15 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish |
16 | 16×16×160 | Bneck,5×5 | ECA注意力 | h-swish |
17 | 16×16×160 | Conv2d | 不施加 | h-swish |
图7 不同场景下的图像((a)白天可见光图像;(b)白天红外图像;(c)夜晚可见光图像;(d)夜晚红外图像)
Fig. 7 Images in different scenarios ((a) Daytime visible image; (b) Daytime infrared image; (c) Night visible image; (d) Night infrared image)
模型名称 | MR/ % | mAP/ % | 模型大小/ MB | FPS/ (帧/秒) |
---|---|---|---|---|
YOLOv4 | 17.6 | 87.0 | 244 | 15.6 |
M-YOLO | 22.3 | 82.2 | 53.8 | 25.6 |
EM-YOLO | 18.8 | 86.8 | 50.6 | 26.0 |
DEM-YOLO | 18.9 | 86.4 | 42.5 | 28.7 |
表2 模型轻量化对比实验
Table 2 Model lightweight comparison experiment
模型名称 | MR/ % | mAP/ % | 模型大小/ MB | FPS/ (帧/秒) |
---|---|---|---|---|
YOLOv4 | 17.6 | 87.0 | 244 | 15.6 |
M-YOLO | 22.3 | 82.2 | 53.8 | 25.6 |
EM-YOLO | 18.8 | 86.8 | 50.6 | 26.0 |
DEM-YOLO | 18.9 | 86.4 | 42.5 | 28.7 |
模型名称 | MR/ % | mAP/ % | 模型大小/ MB | FPS/ (帧/秒) |
---|---|---|---|---|
ACF-T-T | 47.0 | 61.4 | - | 32.0 |
TC-D[ | 21.7 | 82.8 | 235.0 | 15.6 |
MSDS-RCNN[ | 11.6 | 90.3 | 356.0 | 10.8 |
F-DEM-YOLO | 10.3 | 90.6 | 65.0 | 29.5 |
EF-DEM-YOLO | 6.6 | 95.5 | 55.3 | 33.4 |
表3 模型多模态对比实验
Table 3 Model multimodal comparison experiment
模型名称 | MR/ % | mAP/ % | 模型大小/ MB | FPS/ (帧/秒) |
---|---|---|---|---|
ACF-T-T | 47.0 | 61.4 | - | 32.0 |
TC-D[ | 21.7 | 82.8 | 235.0 | 15.6 |
MSDS-RCNN[ | 11.6 | 90.3 | 356.0 | 10.8 |
F-DEM-YOLO | 10.3 | 90.6 | 65.0 | 29.5 |
EF-DEM-YOLO | 6.6 | 95.5 | 55.3 | 33.4 |
模型 名称 | 主干 网络 | mAP/% | FPS/(帧/秒) | ||
---|---|---|---|---|---|
全天测试集 | 白天测试集 | 夜晚测试集 | |||
Faster R-CNN | VGG-16 | 92.1 | 94.5 | 88.7 | 2.0 |
DenseBox | VGG-19 | 70.8 | 76.1 | 58.7 | 12.1 |
YOLOv3 | Darknet-53 | 78.9 | 82.1 | 71.6 | 13.5 |
YOLOv4 | CSP-Darknet-53 | 87.0 | 90.8 | 81.6 | 15.6 |
YOLOv5 | CSP-Darknet-53 | 88.2 | 92.2 | 82.5 | 21.5 |
本文算法 | ES-MobileNetv3 | 95.5 | 96.2 | 93.9 | 33.4 |
表4 不同算法性能指标对比结果
Table 4 Comparison results of performance indicators of different algorithms
模型 名称 | 主干 网络 | mAP/% | FPS/(帧/秒) | ||
---|---|---|---|---|---|
全天测试集 | 白天测试集 | 夜晚测试集 | |||
Faster R-CNN | VGG-16 | 92.1 | 94.5 | 88.7 | 2.0 |
DenseBox | VGG-19 | 70.8 | 76.1 | 58.7 | 12.1 |
YOLOv3 | Darknet-53 | 78.9 | 82.1 | 71.6 | 13.5 |
YOLOv4 | CSP-Darknet-53 | 87.0 | 90.8 | 81.6 | 15.6 |
YOLOv5 | CSP-Darknet-53 | 88.2 | 92.2 | 82.5 | 21.5 |
本文算法 | ES-MobileNetv3 | 95.5 | 96.2 | 93.9 | 33.4 |
图14 傍晚的图像检测结果((a)可见光图;(b)红外图;(c) Faster R-CNN;(d) DenseBox;(e) YOLOv3;(f) YOLOv4;(g) YOLOv5;(h)本文算法)
Fig. 14 Evening image detection results ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
图15 夜晚灯光多的图像检测结果((a)可见光图;(b)红外图;(c) Faster R-CNN;(d) DenseBox;(e) YOLOv3;(f) YOLOv4;(g) YOLOv5;(h)本文算法)
Fig. 15 Image detection results with many lights at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
图16 夜晚灯光少的图像检测结果((a)可见光图;(b)红外图;(c) Faster R-CNN;(d) DenseBox;(e) YOLOv3;(f) YOLOv4;(g) YOLOv5;(h)本文算法)
Fig. 16 Image detection results with low light at night ((a) Visible image; (b) Infrared image; (c) Faster R-CNN; (d) DenseBox; (e) YOLOv3; (f) YOLOv4; (g) YOLOv5; (h) Textual algorithm)
[1] | 曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. |
CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection[J]. Journal of Image and Graphics, 2022, 27(6): 1697-1722 (in Chinese). | |
[2] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. |
[3] | 吴岸聪, 林城梽, 郑伟诗. 面向跨模态行人重识别的单模态自监督信息挖掘[J]. 中国图象图形学报, 2022, 27(10): 2843-2859. |
WU A C, LIN C Z, ZHENG W S. Single-modality self-supervised information mining for cross-modality person re-identification[J]. Journal of Image and Graphics, 2022, 27(10): 2843-2859 (in Chinese). | |
[4] |
PANIGRAHI S, RAJU U S N. InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection[J]. International Journal of Multimedia Information Retrieval, 2022, 11(3): 409-430.
DOI |
[5] | ZHENG H T, LIU H, QI W, et al. Little-YOLOv4: a lightweight pedestrian detection network based on YOLOv4 and GhostNet[J]. Wireless Communications and Mobile Computing, 2022, 2022: 5155970. |
[6] |
SONG X W, LI G Y, YANG L, et al. Real and pseudo pedestrian detection method with CA-YOLOv5s based on stereo image fusion[J]. Entropy, 2022, 24(8): 1091.
DOI URL |
[7] | 刘小飞, 李明杰. 基于红外成像的夜间车辆行驶轨迹识别方法[J]. 激光杂志, 2022, 43(12): 51-55. |
LIU X F, LI M J. Night vehicle trajectory recognition method based on infrared imaging[J]. Laser Journal, 2022, 43(12): 51-55 (in Chinese). | |
[8] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. |
[9] | LEE Y, BUI T D, SHIN J. Pedestrian detection based on deep fusion network using feature correlation[C]// 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. New York: IEEE Press, 2019: 694-699. |
[10] |
ZHUANG Y F, PU Z Y, HU J, et al. Illumination and temperature-aware multispectral networks for edge-computing- enabled pedestrian detection[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(3): 1282-1295.
DOI URL |
[11] |
KIM J U, PARK S, RO Y M. Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1510-1523.
DOI URL |
[12] | LIU L S, KE C Y, LIN H, et al. Research on pedestrian detection algorithm based on MobileNet-YoLo[J]. Computational Intelligence and Neuroscience, 2022, 2022: 8924027. |
[13] |
SHA M Z, ZENG K, TAO Z M, et al. Lightweight pedestrian detection based on feature multiplexed residual network[J]. Electronics, 2023, 12(4): 918.
DOI URL |
[14] |
LI C, WANG Y D, LIU X M. A multi-pedestrian tracking algorithm for dense scenes based on an attention mechanism and dual data association[J]. Applied Sciences, 2022, 12(19): 9597.
DOI URL |
[15] |
ZOU F M, LI X, XU Q M, et al. Correlation-and-correction fusion attention network for occluded pedestrian detection[J]. IEEE Sensors Journal, 2023, 23(6): 6061-6073.
DOI URL |
[16] |
LI M L, SUN G B, YU J X. A pedestrian detection network model based on improved YOLOv5[J]. Entropy, 2023, 25(2): 381.
DOI URL |
[17] | HAO S, GAO S, MA X, et al. Anchor-free infrared pedestrian detection based on cross-scale feature fusion and hierarchical attention mechanism[J]. Infrared Physics & Technology, 2023, 131: 104660. |
[18] | WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[EB/OL]. (2020-03-24) [2023-03-01]. https://arxiv.org/abs/1910.03151.pdf. |
[19] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-03-01]. https://arxiv.org/abs/2004.10934.pdf. |
[20] | GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO Series in 2021[EB/OL]. (2021-08-6) [2023-03-15]. https://www.researchgate.net/publication/353343997_YOLOX_Exceeding_YOLO_Series_in_2021. |
[21] |
郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
DOI |
HAO P F, LIU L Q, GU R Y. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model[J]. Journal of Graphics, 2023, 44(3): 456-464 (in Chinese). | |
[22] | 杨泳波, 赵远洋, 李振波, 等. 基于胶囊SE-Inception的茄科病害识别方法研究[J]. 图学学报, 2022, 43(1): 28-35. |
YANG Y B, ZHAO Y Y, LI Z B, et al. Solanaceae disease recognition method based on capsule SE-Inception[J]. Journal of Graphics, 2022, 43(1): 28-35 (in Chinese).
DOI |
|
[23] |
罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472.
DOI |
LUO W Y, FU M Y. On-site monitoring technology of illegal swimming and fishing based on YoloX-ECA[J]. Journal of Graphics, 2023, 44(3): 465-472 (in Chinese). | |
[24] |
YING B Y, XU Y C, ZHANG S A, et al. Weed detection in images of carrot fields based on improved YOLO v4[J]. Traitement Du Signal, 2021, 38(2): 341-348.
DOI URL |
[25] |
ZHANG Y T, YIN Z S, NIE L Z, et al. Attention based multi-layer fusion of multispectral images for pedestrian detection[J]. IEEE Access, 2020, 8: 165071-165084.
DOI URL |
[26] |
ZHENG C H, PEI W J, YAN Q, et al. Pedestrian detection based on gradient and texture feature integration[J]. Neurocomputing, 2017, 228: 71-78.
DOI URL |
[27] |
WEI X, ZHANG H T, LIU S F, et al. Pedestrian detection in underground mines via parallel feature transfer network[J]. Pattern Recognition, 2020, 103: 107195.
DOI URL |
[28] |
SHANNON C E. A mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3): 379-423.
DOI URL |
[29] | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// European Conference on Computer Vision. Cham: Springer, 2014: 346-361. |
[30] | HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1037-1045. |
[31] |
刘学平, 李玙乾, 刘励, 等. 嵌入SENet结构的改进YOLOV3目标识别算法[J]. 计算机工程, 2019, 45(11): 243-248.
DOI |
LIU X P, LI Y Q, LIU L, et al. Improved YOLOV3 target recognition algorithm with embedded SENet structure[J]. Computer Engineering, 2019, 45(11): 243-248 (in Chinese).
DOI |
|
[32] | KIEU M, BAGDANOV A D, BERTINI M, et al. Task-conditioned domain adaptation for pedestrian detection in thermal imagery[C]// European Conference on Computer Vision. Cham: Springer, 2020: 546-562. |
[33] | LI C Y, SONG D, TONG R F, et al. Multispectral pedestrian detection via simultaneous detection and segmentation[EB/OL]. [2023-03-15]. https://arxiv.org/abs/1808.04818.pdf. |
[34] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
[35] | 田卓钰, 马苗, 杨楷芳. 基于级联注意力与点监督机制的考场目标检测模型[J]. 软件学报, 2022, 33(7): 2633-2645. |
TIAN Z Y, MA M, YANG K F. Object detection model for examination classroom based on cascade attention and point supervision mechanism[J]. Journal of Software, 2022, 33(7): 2633-2645 (in Chinese). | |
[36] |
胡欣, 周运强, 肖剑, 等. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
DOI |
HU X, ZHOU Y Q, XIAO J, et al. Surface defect detection of threaded steel based on improved YOLOv5[J]. Journal of Graphics, 2023, 44(3): 427-437 (in Chinese).
DOI |
[1] | 李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658. |
[2] | 胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725. |
[3] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[4] | 曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744. |
[5] | 武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778. |
[6] | 赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790. |
[7] | 朱强军, 胡斌, 汪慧兰, 王杨. 基于轻量化YOLOv8s交通标志的检测[J]. 图学学报, 2024, 45(3): 422-432. |
[8] | 李跃华, 仲新, 姚章燕, 胡彬. 基于改进YOLOv5s的着装不规范检测算法研究[J]. 图学学报, 2024, 45(3): 433-445. |
[9] | 张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453. |
[10] | 何柳, 安然, 刘姝妍, 李润岐, 陶剑, 曾照洋. 基于知识图谱的航空多模态数据组织与知识发现技术研究[J]. 图学学报, 2024, 45(2): 300-307. |
[11] | 胡欣, 胡帅, 马丽军, 司利云, 肖剑, 袁晔. 基于融合MBAM与YOLOv5的PCB缺陷检测方法[J]. 图学学报, 2024, 45(1): 47-55. |
[12] | 王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77. |
[13] | 翟永杰, 赵晓瑜, 王璐瑶, 王亚茹, 宋晓轲, 朱浩硕. IDD-YOLOv7:一种用于输电线路绝缘子多缺陷的轻量化检测方法[J]. 图学学报, 2024, 45(1): 90-101. |
[14] | 崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125. |
[15] | 魏陈浩, 杨睿, 刘振丙, 蓝如师, 孙希延, 罗笑南. 具有双层路由注意力的YOLOv8道路场景目标检测方法[J]. 图学学报, 2023, 44(6): 1104-1111. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||