基于改进YOLOv8和GMM图像点集匹配的双目测距方法

doi:10.11996/JG.j.2095-302X.2024040714

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 714-725.DOI: 10.11996/JG.j.2095-302X.2024040714

• 图像处理与计算机视觉 • 上一篇下一篇

基于改进YOLOv8和GMM图像点集匹配的双目测距方法

胡欣¹(), 常娅姝¹, 秦皓², 肖剑³(), 程鸿亮³

1.长安大学能源与电气工程学院，陕西西安 710018
2.比亚迪汽车有限公司，陕西西安 710119
3.长安大学电子与控制工程学院，陕西西安 710061

收稿日期:2024-02-10 接受日期:2024-04-15 出版日期:2024-08-31 发布日期:2024-09-03
通讯作者:肖剑(1975-)，男，副教授，博士。主要研究方向为智能感知与计算、机器视觉与图像处理。E-mail：xiaojian@chd.edu.cn
第一作者:胡欣(1975-)，女，教授, 博士后。主要研究方向为电网大数据处理、机器学习与深度学习。E-mail：huxin@chd.edu.cn
基金资助:
陕西省秦创原“科学家+工程师”队伍建设项目(2024QCY-KXJ-161);西安市重点产业链项目(23ZDCYJSGG0013-2023)

Binocular ranging method based on improved YOLOv8 and GMM image point set matching

HU Xin¹(), CHANG Yashu¹, QIN Hao², XIAO Jian³(), CHENG Hongliang³

1. School of Energy and Electrical Engineering, Chang’an University, Xi’an Shaanxi 710018, China
2. BYD Auto Co., Ltd., Xi’an Shaanxi 710119, China
3. School of Electronic and Control Engineering, Chang’an University, Xi’an Shaanxi 710061, China

Received:2024-02-10 Accepted:2024-04-15 Published:2024-08-31 Online:2024-09-03
Contact: XIAO Jian (1975-), associate professor, Ph.D. His main research interests cover intelligent perception and computing, machine vision, and image processing. E-mail：xiaojian@chd.edu.cn
First author：HU Xin (1975-), professor, postdoc. Her main research interests cover power grid big data processing, machine learning, and deep learning. E-mail：huxin@chd.edu.cn
Supported by:
Shaanxi Province Qin Chuang Yuan “Scientist + Engineer” Team Construction Project(2024QCY-KXJ-161);Key Industrial Chain in Xi’an(23ZDCYJSGG0013-2023)

摘要/Abstract

摘要：

针对无人塔吊系统的研究需求，提出一种基于改进YOLOv8和GMM图像点集匹配的双目测距方法，对驾驶室外环境中的塔吊吊钩进行检测识别并测距。通过双目摄像头进行图像采集，引入FasterNet骨干网络和Slim-neck颈部连接层，对YOLOv8目标检测算法进行改进，有效检测画面中的塔吊吊钩并获取检测框的二维坐标信息；采用局部敏感哈希方法，并融合分阶段匹配策略，提升GMM图像点集匹配模型的匹配效率，针对检测框中的塔吊吊钩，进行特征点匹配；最后通过双目相机三角测量原理计算得出塔吊吊钩的深度信息。实验结果表明，改进后的YOLOv8算法与原算法相比，精确率P提高了2.9%，平均精度AP₅₀提高了2.2%，模型复杂度降低了10.01 GFLops，参数量减少了3.37 M，在提升检测精度的同时实现了模型的轻量化。改进后的图像点集匹配算法与原算法相比，各个指标表现出更加良好的鲁棒性。最后在工程现场对塔吊吊钩进行识别与测距，误差可接受范围内有效完成了塔吊吊钩的检测识别与测距任务，验证了本方法的可行性。

关键词: YOLOv8目标检测, 高斯混合模型, 点集匹配, 深度学习, 双目视觉, 智慧工地可视化

Abstract:

Addressing the research needs for unmanned tower crane systems, a binocular ranging method was proposed, based on the improved YOLOv8 and GMM image point set matching to detect and recognize the hooks of tower cranes in the outdoor environment of the driver’s cab and measure the distance. Image acquisition was performed through binocular cameras, and the FasterNet backbone network and Slim-neck connection layer was introduced to improve the YOLOv8 target detection algorithm, thereby effectively detecting the hooks of tower cranes in the image and obtaining the two-dimensional coordinate information of the detection box. The local sensitive hashing method was employed, and a phased matching strategy was integrated to improve the matching efficiency of the GMM image point set matching model, performing feature point matching for the hooks of tower cranes in the detection box. Finally, the depth information of the tower crane hook was calculated through the principle of binocular camera triangulation. The experimental results demonstrated that compared to the original algorithm, the improved YOLOv8 algorithm had increased precision P by 2.9%, average precision AP₅₀ by 2.2%, reduced model complexity by 10.01 GFLops, and reduced parameter quantity by 3.37 M. This achieved model light-weighting while enhancing detection accuracy. Compared with the original algorithm, the improved image point set matching algorithm exhibited better robustness in various indicators. Finally, the recognition and ranging of tower crane hooks were effectively completed within an acceptable margin of error at the engineering site, verifying the feasibility of this method.

Key words: YOLOv8 object detection, gaussian mixture model, point set matching, deep learning, binocular vision, smart construction site visualization

中图分类号:

胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.

HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang. Binocular ranging method based on improved YOLOv8 and GMM image point set matching[J]. Journal of Graphics, 2024, 45(4): 714-725.

图/表 18

图1 YOLOv8s网络结构图

Fig. 1 YOLOv8s network architecture diagram

图2 PConv结构

Fig. 2 PConv structure

图3 FasterNet主干网络整体架构

Fig. 3 Overall architecture of FasterNet backbone network

图4 Slim-neck颈部连接层

Fig. 4 Slim-neck connection layer ((a) GSConv; (b) GSBottleneck; (c) VoV-GSCSP)

图5 FS-YOLO网络结构图

Fig. 5 FS-YOLO network structure diagram

图6 局部敏感哈希算法

Fig. 6 Locally sensitive hash algorithm

图7 分段匹配策略

Fig. 7 Segmented matching strategy

图8 PGMM测距算法

Fig. 8 PGMM ranging algorithm

表1 实验软硬件配置

Table 1 Software and hardware configuration

名称	实验配置
操作系统	Windows 11
编程语言	Python3.8
深度学习框架	PyTorch1.13.1
CPU	Intel(R)Core(TM)i7-13700H
GPU	NVIDIA GeForce RTX 4060(8 G)
Cuda	11.6
平台	Pycharm2022，Matlab2021

图9 复杂因素下吊钩数据集((a)昏暗光线下；(b)夜晚光照环境下；(c)有遮挡；(d)远距离小目标；(e)复杂背景；(f)自然光照下；(g)负样本1；(h)负样本2)

Fig. 9 Hook dataset under complex factors ((a) In dim light conditions; (b) In nighttime lighting environments; (c) With occlusion; (d) Small targets at long distances; (e) Complex backgrounds; (f) Under normal light conditions; (g) Negative sample 1; (h) Negative sample 2)

表2 目标检测不同网络性能对比

Table 2 Comparison of different network performance for object detection

模型	骨干网络	P	AP₅₀	FLOPs	Parameters/M
YOLOv3-spp	Darknet-53	0.944	0.945	283.1	104.710
YOLOv5s	CSP-Darknet-53	0.953	0.921	23.8	9.112
YOLOv6s	RepVGG	0.964	0.945	44.0	16.297
YOLOv8s	C2f-sppf-Darknet-53	0.930	0.932	28.4	11.126
FS-YOLO(Ours)	FasterNet	0.960	0.951	18.3	7.756

表3 目标检测算法消融实验

Table 3 Target detection algorithm ablation experiment

FasterNet	Slim-Neck	P	AP₅₀	FLOPs	Parameters/M
_	_	0.930	0.928	28.4	11.126
	_	0.934	0.939	21.7	8.616
_		0.955	0.941	25.1	10.265
		0.959	0.950	18.3	7.756

表4 点集匹配不同模型性能对比

Table 4 Performance comparison of different models for point set matching

模型	噪点个数60			噪点个数120			噪点个数180			噪点个数240			噪点个数300
模型	P	Recall	F1	P	Recall	F1	P	Recall	F1	P	Recall	F1	P	Recall	F1
CPD(GMM)	0.419	0.233	0.299	0.184	0.028	0.048	0.206	0.012	0.023	0.042	0.002	0.004	0.111	0.003	0.006
NGMM	0.978	0.959	0.968	0.828	0.929	0.875	0.768	0.926	0.839	0.687	0.920	0.787	0.641	0.920	0.755
PGMM(Ours)	0.992	0.966	0.979	0.970	0.968	0.969	0.954	0.966	0.960	0.927	0.969	0.948	0.899	0.965	0.931

图10 Leuven图片集实验曲线对比图((a) P噪点个数曲线；(b) Recall噪点个数曲线；(c) F1噪点个数曲线)

Fig. 10 Comparison of experimental curves in the Leuven image set ((a) P noise number curve; (b) Recall noise count curve;(c) F1 noise count curve)

图11 相机安装位置

Fig. 11 Camera installation location

图12 目标检测效果对比图((a)模糊旋转；(b)树木；(c)复杂背景；(d)黑暗；(e)昏暗；(f)曝光)

Fig. 12 Comparison of object detection effect ((a) Blurred rotation; (b) Trees; (c) Complex background; (d) Darkness; (e) Dimness; (f) Overexposure)

图13 工地实景测距图像点集匹配图

Fig. 13 Matching map of site realistic distance measurement image point set

表5 塔吊实物测距实验结果

Table 5 Experimental results of physical distance measurement for tower cranes

组数	实际距离/m	计算距离/m	误差/m	相对误差/%
1	2.32	2.336 3	-0.016 3	-0.7017
2	3.98	4.015 7	0.035 7	0.8950
3	6.13	6.043 6	0.086 4	1.4085
4	7.91	8.108 7	-0.198 7	-2.5090
5	10.14	10.553 2	-0.413 2	-4.0737

参考文献 19

[1]	LIU C, HOU C J, ZHONG D C. An adaptive hierarchical sliding mode control scheme with accurate positioning and sway suppression for underactuated tower cranes[C]// 2023 China Automation Congress. New York: IEEE Press, 2023: 974-979.
[2]	CHEN Y, ZENG Q, ZHENG X Z, et al. Safety supervision of tower crane operation on construction sites: an evolutionary game analysis[J]. Safety Science, 2022, 152: 105578.
[3]	WU H T, ZHONG B T, LI H, et al. On-site safety inspection of tower cranes: a blockchain-enabled conceptual framework[J]. Safety Science, 2022, 153: 105815.
[4]	AGHDAM H H, HERAVI E J, DEMILEW S S, et al. RAD: realtime and accurate 3D object detection on embedded systems[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2021: 2869-2877.
[5]	陈炎, 杨丽丽, 王振鹏. 双目视觉的匹配算法综述[J]. 图学学报, 2020, 41(5): 702-708. DOI
	CHEN Y, YANG L L, WANG Z P. Literature survey on stereo vision matching algorithms[J]. Journal of Graphics, 2020, 41(5): 702-708 (in Chinese). DOI
[6]	亢宇欣, 谌贵辉, 邓宇, 等. 多测度融合的立体匹配算法研究[J]. 图学学报, 2019, 40(4): 711-717. DOI
	KANG Y X, CHEN G H, DENG Y, et al. Research on stereo matching algorithms based on multi-measure fusion[J]. Journal of Graphics, 2019, 40(4): 711-717 (in Chinese).
[7]	赵杰, 汪志成, 黄南海, 等. 基于双目视觉的物料三维空间定位算法[J]. 科学技术与工程, 2023, 23(18): 7861-7867.
	ZHAO J, WANG Z C, HUANG N H, et al. Three-dimensional material positioning algorithm based on binocular vision[J]. Science Technology and Engineering, 2023, 23(18): 7861-7867 (in Chinese).
[8]	颜佳桂, 李宏胜, 任飞. 基于SSD和改进双目测距模型的车辆测距方法研究[J]. 激光杂志, 2020, 41(11): 42-47.
	YAN J G, LI H S, REN F. Research on vehicle ranging method based on SSD algorithm and improved binocular ranging model[J]. Laser Journal, 2020, 41(11): 42-47 (in Chinese).
[9]	颜麟, 曹守启. 基于双目视觉的无人补料装置测距技术[J]. 上海海洋大学学报, 2023, 32(5): 1006-1014.
	YAN L, CAO S Q. Ranging technology of unmanned feeding device based on binocular vision[J]. Journal of Shanghai Ocean University, 2023, 32(5): 1006-1014 (in Chinese).
[10]	JEON S, KIM S, KANG S, et al. Smart safety hook monitoring system for construction site[C]// 2020 IEEE International Conference on Consumer Electronics - Asia. New York: IEEE Press, 2020: 1-4.
[11]	刘刚, 占升, 贾潇. 建筑工程智慧工地建设[J]. 智能建筑与智慧城市, 2023(2): 121-123.
	LIU G, ZHAN S, JIA X. The construction of smart construction site of construction engineering[J]. Intelligent Building & Smart City, 2023(2): 121-123 (in Chinese).
[12]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[13]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768.
[14]	CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12021-12031.
[15]	LI H L, LI J, WEI H B, et al. Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2206.02424.
[16]	JIANG X Y, MA J Y, FAN A X, et al. Robust feature matching for remote sensing image registration via linear adaptive filtering[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1577-1591.
[17]	MIN Z, WANG J L, MENG M Q H. Joint rigid registration of multiple generalized point sets with hybrid mixture models[J]. IEEE Transactions on Automation Science and Engineering, 2019, 17(1): 334-347.
[18]	CHENG J, LENG C, WU J X, et al. Fast and accurate image matching with cascade hashing for 3D reconstruction[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1-8.
[19]	原培新, 蔡炟, 曹文伟, 等. 基于双目立体视觉的列车目标识别和测距技术[J]. 东北大学学报: 自然科学版, 2022, 43(3): 335-343.
	YUAN P X, CAI D, CAO W W, et al. Train target recognition and ranging technology based on binocular stereoscopic vision[J]. Journal of Northeastern University: Natural Science, 2022, 43(3): 335-343 (in Chinese).

基于改进YOLOv8和GMM图像点集匹配的双目测距方法

Binocular ranging method based on improved YOLOv8 and GMM image point set matching

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 19

相关文章 15

编辑推荐

Metrics

本文评价

[1]	许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松. 优化医学影像三维渲染可视化效果：技术综述 [J]. 图学学报, 2024, 45(5): 879-891.
[2]	胡凤阔 , 叶兰 , 谭显峰 , 张钦展 , 胡志新 , 方清 , 王磊 , 满孝锋 . 一种基于改进 YOLOv8 的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900.
[3]	刘义艳 , 郝婷楠 , 贺晨 , 常英杰 . 基于 DBBR-YOLO 的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921.
[4]	翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进 Transformer 的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940.
[5]	姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967.
[6]	熊超 , 王云艳 , 罗雨浩 . 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016.
[7]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[8]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[9]	朱光辉, 缪君, 胡宏利, 申基, 杜荣华. 基于自增强注意力机制的室内单图像分段平面三维重建[J]. 图学学报, 2024, 45(3): 464-471.
[10]	王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.
[11]	王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77.
[12]	李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络[J]. 图学学报, 2024, 45(1): 78-89.
[13]	韩亚振, 尹梦晓, 马伟钊, 杨诗耕, 胡锦飞, 朱丛洋. DGOA：基于动态图和偏移注意力的点云上采样[J]. 图学学报, 2024, 45(1): 219-229.
[14]	王江安, 黄乐, 庞大为, 秦林珍, 梁温茜. 基于自适应聚合循环递归的稠密点云重建网络[J]. 图学学报, 2024, 45(1): 230-239.
[15]	周锐闯, 田瑾, 闫丰亭, 朱天晓, 张玉金. 融合外部注意力和图卷积的点云分类模型[J]. 图学学报, 2023, 44(6): 1162-1172.