图学学报 ›› 2024, Vol. 45 ›› Issue (4): 714-725.DOI: 10.11996/JG.j.2095-302X.2024040714
收稿日期:
2024-02-10
接受日期:
2024-04-15
出版日期:
2024-08-31
发布日期:
2024-09-03
通讯作者:
肖剑(1975-),男,副教授,博士。主要研究方向为智能感知与计算、机器视觉与图像处理。E-mail:xiaojian@chd.edu.cn第一作者:
胡欣(1975-),女,教授, 博士后。主要研究方向为电网大数据处理、机器学习与深度学习。E-mail:huxin@chd.edu.cn
基金资助:
HU Xin1(), CHANG Yashu1, QIN Hao2, XIAO Jian3(
), CHENG Hongliang3
Received:
2024-02-10
Accepted:
2024-04-15
Published:
2024-08-31
Online:
2024-09-03
Contact:
XIAO Jian (1975-), associate professor, Ph.D. His main research interests cover intelligent perception and computing, machine vision, and image processing. E-mail:xiaojian@chd.edu.cnFirst author:
HU Xin (1975-), professor, postdoc. Her main research interests cover power grid big data processing, machine learning, and deep learning. E-mail:huxin@chd.edu.cn
Supported by:
摘要:
针对无人塔吊系统的研究需求,提出一种基于改进YOLOv8和GMM图像点集匹配的双目测距方法,对驾驶室外环境中的塔吊吊钩进行检测识别并测距。通过双目摄像头进行图像采集,引入FasterNet骨干网络和Slim-neck颈部连接层,对YOLOv8目标检测算法进行改进,有效检测画面中的塔吊吊钩并获取检测框的二维坐标信息;采用局部敏感哈希方法,并融合分阶段匹配策略,提升GMM图像点集匹配模型的匹配效率,针对检测框中的塔吊吊钩,进行特征点匹配;最后通过双目相机三角测量原理计算得出塔吊吊钩的深度信息。实验结果表明,改进后的YOLOv8算法与原算法相比,精确率P提高了2.9%,平均精度AP50提高了2.2%,模型复杂度降低了10.01 GFLops,参数量减少了3.37 M,在提升检测精度的同时实现了模型的轻量化。改进后的图像点集匹配算法与原算法相比,各个指标表现出更加良好的鲁棒性。最后在工程现场对塔吊吊钩进行识别与测距,误差可接受范围内有效完成了塔吊吊钩的检测识别与测距任务,验证了本方法的可行性。
中图分类号:
胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang. Binocular ranging method based on improved YOLOv8 and GMM image point set matching[J]. Journal of Graphics, 2024, 45(4): 714-725.
名称 | 实验配置 |
---|---|
操作系统 | Windows 11 |
编程语言 | Python3.8 |
深度学习框架 | PyTorch1.13.1 |
CPU | Intel(R)Core(TM)i7-13700H |
GPU | NVIDIA GeForce RTX 4060(8 G) |
Cuda | 11.6 |
平台 | Pycharm2022,Matlab2021 |
表1 实验软硬件配置
Table 1 Software and hardware configuration
名称 | 实验配置 |
---|---|
操作系统 | Windows 11 |
编程语言 | Python3.8 |
深度学习框架 | PyTorch1.13.1 |
CPU | Intel(R)Core(TM)i7-13700H |
GPU | NVIDIA GeForce RTX 4060(8 G) |
Cuda | 11.6 |
平台 | Pycharm2022,Matlab2021 |
图9 复杂因素下吊钩数据集((a)昏暗光线下;(b)夜晚光照环境下;(c)有遮挡;(d)远距离小目标;(e)复杂背景;(f)自然光照下;(g)负样本1;(h)负样本2)
Fig. 9 Hook dataset under complex factors ((a) In dim light conditions; (b) In nighttime lighting environments; (c) With occlusion; (d) Small targets at long distances; (e) Complex backgrounds; (f) Under normal light conditions; (g) Negative sample 1; (h) Negative sample 2)
模型 | 骨干网络 | P | AP50 | FLOPs | Parameters/M |
---|---|---|---|---|---|
YOLOv3-spp | Darknet-53 | 0.944 | 0.945 | 283.1 | 104.710 |
YOLOv5s | CSP-Darknet-53 | 0.953 | 0.921 | 23.8 | 9.112 |
YOLOv6s | RepVGG | 0.964 | 0.945 | 44.0 | 16.297 |
YOLOv8s | C2f-sppf-Darknet-53 | 0.930 | 0.932 | 28.4 | 11.126 |
FS-YOLO(Ours) | FasterNet | 0.960 | 0.951 | 18.3 | 7.756 |
表2 目标检测不同网络性能对比
Table 2 Comparison of different network performance for object detection
模型 | 骨干网络 | P | AP50 | FLOPs | Parameters/M |
---|---|---|---|---|---|
YOLOv3-spp | Darknet-53 | 0.944 | 0.945 | 283.1 | 104.710 |
YOLOv5s | CSP-Darknet-53 | 0.953 | 0.921 | 23.8 | 9.112 |
YOLOv6s | RepVGG | 0.964 | 0.945 | 44.0 | 16.297 |
YOLOv8s | C2f-sppf-Darknet-53 | 0.930 | 0.932 | 28.4 | 11.126 |
FS-YOLO(Ours) | FasterNet | 0.960 | 0.951 | 18.3 | 7.756 |
FasterNet | Slim-Neck | P | AP50 | FLOPs | Parameters/M |
---|---|---|---|---|---|
_ | _ | 0.930 | 0.928 | 28.4 | 11.126 |
| _ | 0.934 | 0.939 | 21.7 | 8.616 |
_ | | 0.955 | 0.941 | 25.1 | 10.265 |
| | 0.959 | 0.950 | 18.3 | 7.756 |
表3 目标检测算法消融实验
Table 3 Target detection algorithm ablation experiment
FasterNet | Slim-Neck | P | AP50 | FLOPs | Parameters/M |
---|---|---|---|---|---|
_ | _ | 0.930 | 0.928 | 28.4 | 11.126 |
| _ | 0.934 | 0.939 | 21.7 | 8.616 |
_ | | 0.955 | 0.941 | 25.1 | 10.265 |
| | 0.959 | 0.950 | 18.3 | 7.756 |
模型 | 噪点个数60 | 噪点个数120 | 噪点个数180 | 噪点个数240 | 噪点个数300 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | |
CPD(GMM) | 0.419 | 0.233 | 0.299 | 0.184 | 0.028 | 0.048 | 0.206 | 0.012 | 0.023 | 0.042 | 0.002 | 0.004 | 0.111 | 0.003 | 0.006 |
NGMM | 0.978 | 0.959 | 0.968 | 0.828 | 0.929 | 0.875 | 0.768 | 0.926 | 0.839 | 0.687 | 0.920 | 0.787 | 0.641 | 0.920 | 0.755 |
PGMM(Ours) | 0.992 | 0.966 | 0.979 | 0.970 | 0.968 | 0.969 | 0.954 | 0.966 | 0.960 | 0.927 | 0.969 | 0.948 | 0.899 | 0.965 | 0.931 |
表4 点集匹配不同模型性能对比
Table 4 Performance comparison of different models for point set matching
模型 | 噪点个数60 | 噪点个数120 | 噪点个数180 | 噪点个数240 | 噪点个数300 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | P | Recall | F1 | |
CPD(GMM) | 0.419 | 0.233 | 0.299 | 0.184 | 0.028 | 0.048 | 0.206 | 0.012 | 0.023 | 0.042 | 0.002 | 0.004 | 0.111 | 0.003 | 0.006 |
NGMM | 0.978 | 0.959 | 0.968 | 0.828 | 0.929 | 0.875 | 0.768 | 0.926 | 0.839 | 0.687 | 0.920 | 0.787 | 0.641 | 0.920 | 0.755 |
PGMM(Ours) | 0.992 | 0.966 | 0.979 | 0.970 | 0.968 | 0.969 | 0.954 | 0.966 | 0.960 | 0.927 | 0.969 | 0.948 | 0.899 | 0.965 | 0.931 |
图10 Leuven图片集实验曲线对比图((a) P噪点个数曲线;(b) Recall噪点个数曲线;(c) F1噪点个数曲线)
Fig. 10 Comparison of experimental curves in the Leuven image set ((a) P noise number curve; (b) Recall noise count curve;(c) F1 noise count curve)
图12 目标检测效果对比图((a)模糊旋转;(b)树木;(c)复杂背景;(d)黑暗;(e)昏暗;(f)曝光)
Fig. 12 Comparison of object detection effect ((a) Blurred rotation; (b) Trees; (c) Complex background; (d) Darkness; (e) Dimness; (f) Overexposure)
组数 | 实际距离/m | 计算距离/m | 误差/m | 相对误差/% |
---|---|---|---|---|
1 | 2.32 | 2.336 3 | -0.016 3 | -0.7017 |
2 | 3.98 | 4.015 7 | 0.035 7 | 0.8950 |
3 | 6.13 | 6.043 6 | 0.086 4 | 1.4085 |
4 | 7.91 | 8.108 7 | -0.198 7 | -2.5090 |
5 | 10.14 | 10.553 2 | -0.413 2 | -4.0737 |
表5 塔吊实物测距实验结果
Table 5 Experimental results of physical distance measurement for tower cranes
组数 | 实际距离/m | 计算距离/m | 误差/m | 相对误差/% |
---|---|---|---|---|
1 | 2.32 | 2.336 3 | -0.016 3 | -0.7017 |
2 | 3.98 | 4.015 7 | 0.035 7 | 0.8950 |
3 | 6.13 | 6.043 6 | 0.086 4 | 1.4085 |
4 | 7.91 | 8.108 7 | -0.198 7 | -2.5090 |
5 | 10.14 | 10.553 2 | -0.413 2 | -4.0737 |
[1] | LIU C, HOU C J, ZHONG D C. An adaptive hierarchical sliding mode control scheme with accurate positioning and sway suppression for underactuated tower cranes[C]// 2023 China Automation Congress. New York: IEEE Press, 2023: 974-979. |
[2] | CHEN Y, ZENG Q, ZHENG X Z, et al. Safety supervision of tower crane operation on construction sites: an evolutionary game analysis[J]. Safety Science, 2022, 152: 105578. |
[3] | WU H T, ZHONG B T, LI H, et al. On-site safety inspection of tower cranes: a blockchain-enabled conceptual framework[J]. Safety Science, 2022, 153: 105815. |
[4] | AGHDAM H H, HERAVI E J, DEMILEW S S, et al. RAD: realtime and accurate 3D object detection on embedded systems[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2021: 2869-2877. |
[5] |
陈炎, 杨丽丽, 王振鹏. 双目视觉的匹配算法综述[J]. 图学学报, 2020, 41(5): 702-708.
DOI |
CHEN Y, YANG L L, WANG Z P. Literature survey on stereo vision matching algorithms[J]. Journal of Graphics, 2020, 41(5): 702-708 (in Chinese).
DOI |
|
[6] |
亢宇欣, 谌贵辉, 邓宇, 等. 多测度融合的立体匹配算法研究[J]. 图学学报, 2019, 40(4): 711-717.
DOI |
KANG Y X, CHEN G H, DENG Y, et al. Research on stereo matching algorithms based on multi-measure fusion[J]. Journal of Graphics, 2019, 40(4): 711-717 (in Chinese). | |
[7] | 赵杰, 汪志成, 黄南海, 等. 基于双目视觉的物料三维空间定位算法[J]. 科学技术与工程, 2023, 23(18): 7861-7867. |
ZHAO J, WANG Z C, HUANG N H, et al. Three-dimensional material positioning algorithm based on binocular vision[J]. Science Technology and Engineering, 2023, 23(18): 7861-7867 (in Chinese). | |
[8] | 颜佳桂, 李宏胜, 任飞. 基于SSD和改进双目测距模型的车辆测距方法研究[J]. 激光杂志, 2020, 41(11): 42-47. |
YAN J G, LI H S, REN F. Research on vehicle ranging method based on SSD algorithm and improved binocular ranging model[J]. Laser Journal, 2020, 41(11): 42-47 (in Chinese). | |
[9] | 颜麟, 曹守启. 基于双目视觉的无人补料装置测距技术[J]. 上海海洋大学学报, 2023, 32(5): 1006-1014. |
YAN L, CAO S Q. Ranging technology of unmanned feeding device based on binocular vision[J]. Journal of Shanghai Ocean University, 2023, 32(5): 1006-1014 (in Chinese). | |
[10] | JEON S, KIM S, KANG S, et al. Smart safety hook monitoring system for construction site[C]// 2020 IEEE International Conference on Consumer Electronics - Asia. New York: IEEE Press, 2020: 1-4. |
[11] | 刘刚, 占升, 贾潇. 建筑工程智慧工地建设[J]. 智能建筑与智慧城市, 2023(2): 121-123. |
LIU G, ZHAN S, JIA X. The construction of smart construction site of construction engineering[J]. Intelligent Building & Smart City, 2023(2): 121-123 (in Chinese). | |
[12] |
HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
DOI PMID |
[13] | LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768. |
[14] | CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12021-12031. |
[15] | LI H L, LI J, WEI H B, et al. Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2206.02424. |
[16] | JIANG X Y, MA J Y, FAN A X, et al. Robust feature matching for remote sensing image registration via linear adaptive filtering[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1577-1591. |
[17] | MIN Z, WANG J L, MENG M Q H. Joint rigid registration of multiple generalized point sets with hybrid mixture models[J]. IEEE Transactions on Automation Science and Engineering, 2019, 17(1): 334-347. |
[18] | CHENG J, LENG C, WU J X, et al. Fast and accurate image matching with cascade hashing for 3D reconstruction[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1-8. |
[19] | 原培新, 蔡炟, 曹文伟, 等. 基于双目立体视觉的列车目标识别和测距技术[J]. 东北大学学报: 自然科学版, 2022, 43(3): 335-343. |
YUAN P X, CAI D, CAO W W, et al. Train target recognition and ranging technology based on binocular stereoscopic vision[J]. Journal of Northeastern University: Natural Science, 2022, 43(3): 335-343 (in Chinese). |
[1] | 许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松.
优化医学影像三维渲染可视化效果:技术综述
[J]. 图学学报, 2024, 45(5): 879-891. |
[2] | 胡凤阔 , 叶兰 , 谭显峰 , 张钦展 , 胡志新 , 方清 , 王磊 , 满孝锋 . 一种基于改进 YOLOv8 的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900. |
[3] | 刘义艳 , 郝婷楠 , 贺晨 , 常英杰 . 基于 DBBR-YOLO 的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921. |
[4] | 翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进 Transformer 的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940. |
[5] | 姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967. |
[6] | 熊超 , 王云艳 , 罗雨浩 . 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016. |
[7] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[8] | 李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463. |
[9] | 朱光辉, 缪君, 胡宏利, 申基, 杜荣华. 基于自增强注意力机制的室内单图像分段平面三维重建[J]. 图学学报, 2024, 45(3): 464-471. |
[10] | 王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. |
[11] | 王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77. |
[12] | 李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络[J]. 图学学报, 2024, 45(1): 78-89. |
[13] | 韩亚振, 尹梦晓, 马伟钊, 杨诗耕, 胡锦飞, 朱丛洋. DGOA:基于动态图和偏移注意力的点云上采样[J]. 图学学报, 2024, 45(1): 219-229. |
[14] | 王江安, 黄乐, 庞大为, 秦林珍, 梁温茜. 基于自适应聚合循环递归的稠密点云重建网络[J]. 图学学报, 2024, 45(1): 230-239. |
[15] | 周锐闯, 田瑾, 闫丰亭, 朱天晓, 张玉金. 融合外部注意力和图卷积的点云分类模型[J]. 图学学报, 2023, 44(6): 1162-1172. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||