Binocular ranging method based on improved YOLOv8 and GMM image point set matching

doi:10.11996/JG.j.2095-302X.2024040714

Abstract

Abstract:

Addressing the research needs for unmanned tower crane systems, a binocular ranging method was proposed, based on the improved YOLOv8 and GMM image point set matching to detect and recognize the hooks of tower cranes in the outdoor environment of the driver’s cab and measure the distance. Image acquisition was performed through binocular cameras, and the FasterNet backbone network and Slim-neck connection layer was introduced to improve the YOLOv8 target detection algorithm, thereby effectively detecting the hooks of tower cranes in the image and obtaining the two-dimensional coordinate information of the detection box. The local sensitive hashing method was employed, and a phased matching strategy was integrated to improve the matching efficiency of the GMM image point set matching model, performing feature point matching for the hooks of tower cranes in the detection box. Finally, the depth information of the tower crane hook was calculated through the principle of binocular camera triangulation. The experimental results demonstrated that compared to the original algorithm, the improved YOLOv8 algorithm had increased precision P by 2.9%, average precision AP₅₀ by 2.2%, reduced model complexity by 10.01 GFLops, and reduced parameter quantity by 3.37 M. This achieved model light-weighting while enhancing detection accuracy. Compared with the original algorithm, the improved image point set matching algorithm exhibited better robustness in various indicators. Finally, the recognition and ranging of tower crane hooks were effectively completed within an acceptable margin of error at the engineering site, verifying the feasibility of this method.

Key words: YOLOv8 object detection, gaussian mixture model, point set matching, deep learning, binocular vision, smart construction site visualization

CLC Number:

HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang. Binocular ranging method based on improved YOLOv8 and GMM image point set matching[J]. Journal of Graphics, 2024, 45(4): 714-725.

Figures/Tables 18

Fig. 1 YOLOv8s network architecture diagram

Fig. 2 PConv structure

Fig. 3 Overall architecture of FasterNet backbone network

Fig. 4 Slim-neck connection layer ((a) GSConv; (b) GSBottleneck; (c) VoV-GSCSP)

Fig. 5 FS-YOLO network structure diagram

Fig. 6 Locally sensitive hash algorithm

Fig. 7 Segmented matching strategy

Fig. 8 PGMM ranging algorithm

Table 1 Software and hardware configuration

名称	实验配置
操作系统	Windows 11
编程语言	Python3.8
深度学习框架	PyTorch1.13.1
CPU	Intel(R)Core(TM)i7-13700H
GPU	NVIDIA GeForce RTX 4060(8 G)
Cuda	11.6
平台	Pycharm2022，Matlab2021

Fig. 9 Hook dataset under complex factors ((a) In dim light conditions; (b) In nighttime lighting environments; (c) With occlusion; (d) Small targets at long distances; (e) Complex backgrounds; (f) Under normal light conditions; (g) Negative sample 1; (h) Negative sample 2)

Table 2 Comparison of different network performance for object detection

模型	骨干网络	P	AP₅₀	FLOPs	Parameters/M
YOLOv3-spp	Darknet-53	0.944	0.945	283.1	104.710
YOLOv5s	CSP-Darknet-53	0.953	0.921	23.8	9.112
YOLOv6s	RepVGG	0.964	0.945	44.0	16.297
YOLOv8s	C2f-sppf-Darknet-53	0.930	0.932	28.4	11.126
FS-YOLO(Ours)	FasterNet	0.960	0.951	18.3	7.756

Table 3 Target detection algorithm ablation experiment

FasterNet	Slim-Neck	P	AP₅₀	FLOPs	Parameters/M
_	_	0.930	0.928	28.4	11.126
	_	0.934	0.939	21.7	8.616
_		0.955	0.941	25.1	10.265
		0.959	0.950	18.3	7.756

Table 4 Performance comparison of different models for point set matching

模型	噪点个数60			噪点个数120			噪点个数180			噪点个数240			噪点个数300
模型	P	Recall	F1	P	Recall	F1	P	Recall	F1	P	Recall	F1	P	Recall	F1
CPD(GMM)	0.419	0.233	0.299	0.184	0.028	0.048	0.206	0.012	0.023	0.042	0.002	0.004	0.111	0.003	0.006
NGMM	0.978	0.959	0.968	0.828	0.929	0.875	0.768	0.926	0.839	0.687	0.920	0.787	0.641	0.920	0.755
PGMM(Ours)	0.992	0.966	0.979	0.970	0.968	0.969	0.954	0.966	0.960	0.927	0.969	0.948	0.899	0.965	0.931

Fig. 10 Comparison of experimental curves in the Leuven image set ((a) P noise number curve; (b) Recall noise count curve;(c) F1 noise count curve)

Fig. 11 Camera installation location

Fig. 12 Comparison of object detection effect ((a) Blurred rotation; (b) Trees; (c) Complex background; (d) Darkness; (e) Dimness; (f) Overexposure)

Fig. 13 Matching map of site realistic distance measurement image point set

Table 5 Experimental results of physical distance measurement for tower cranes

组数	实际距离/m	计算距离/m	误差/m	相对误差/%
1	2.32	2.336 3	-0.016 3	-0.7017
2	3.98	4.015 7	0.035 7	0.8950
3	6.13	6.043 6	0.086 4	1.4085
4	7.91	8.108 7	-0.198 7	-2.5090
5	10.14	10.553 2	-0.413 2	-4.0737

References 19

[1]	LIU C, HOU C J, ZHONG D C. An adaptive hierarchical sliding mode control scheme with accurate positioning and sway suppression for underactuated tower cranes[C]// 2023 China Automation Congress. New York: IEEE Press, 2023: 974-979.
[2]	CHEN Y, ZENG Q, ZHENG X Z, et al. Safety supervision of tower crane operation on construction sites: an evolutionary game analysis[J]. Safety Science, 2022, 152: 105578.
[3]	WU H T, ZHONG B T, LI H, et al. On-site safety inspection of tower cranes: a blockchain-enabled conceptual framework[J]. Safety Science, 2022, 153: 105815.
[4]	AGHDAM H H, HERAVI E J, DEMILEW S S, et al. RAD: realtime and accurate 3D object detection on embedded systems[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2021: 2869-2877.
[5]	陈炎, 杨丽丽, 王振鹏. 双目视觉的匹配算法综述[J]. 图学学报, 2020, 41(5): 702-708. DOI
	CHEN Y, YANG L L, WANG Z P. Literature survey on stereo vision matching algorithms[J]. Journal of Graphics, 2020, 41(5): 702-708 (in Chinese). DOI
[6]	亢宇欣, 谌贵辉, 邓宇, 等. 多测度融合的立体匹配算法研究[J]. 图学学报, 2019, 40(4): 711-717. DOI
	KANG Y X, CHEN G H, DENG Y, et al. Research on stereo matching algorithms based on multi-measure fusion[J]. Journal of Graphics, 2019, 40(4): 711-717 (in Chinese).
[7]	赵杰, 汪志成, 黄南海, 等. 基于双目视觉的物料三维空间定位算法[J]. 科学技术与工程, 2023, 23(18): 7861-7867.
	ZHAO J, WANG Z C, HUANG N H, et al. Three-dimensional material positioning algorithm based on binocular vision[J]. Science Technology and Engineering, 2023, 23(18): 7861-7867 (in Chinese).
[8]	颜佳桂, 李宏胜, 任飞. 基于SSD和改进双目测距模型的车辆测距方法研究[J]. 激光杂志, 2020, 41(11): 42-47.
	YAN J G, LI H S, REN F. Research on vehicle ranging method based on SSD algorithm and improved binocular ranging model[J]. Laser Journal, 2020, 41(11): 42-47 (in Chinese).
[9]	颜麟, 曹守启. 基于双目视觉的无人补料装置测距技术[J]. 上海海洋大学学报, 2023, 32(5): 1006-1014.
	YAN L, CAO S Q. Ranging technology of unmanned feeding device based on binocular vision[J]. Journal of Shanghai Ocean University, 2023, 32(5): 1006-1014 (in Chinese).
[10]	JEON S, KIM S, KANG S, et al. Smart safety hook monitoring system for construction site[C]// 2020 IEEE International Conference on Consumer Electronics - Asia. New York: IEEE Press, 2020: 1-4.
[11]	刘刚, 占升, 贾潇. 建筑工程智慧工地建设[J]. 智能建筑与智慧城市, 2023(2): 121-123.
	LIU G, ZHAN S, JIA X. The construction of smart construction site of construction engineering[J]. Intelligent Building & Smart City, 2023(2): 121-123 (in Chinese).
[12]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[13]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768.
[14]	CHEN J R, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12021-12031.
[15]	LI H L, LI J, WEI H B, et al. Slim-neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[EB/OL]. [2024-01-20]. https://arxiv.org/abs/2206.02424.
[16]	JIANG X Y, MA J Y, FAN A X, et al. Robust feature matching for remote sensing image registration via linear adaptive filtering[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1577-1591.
[17]	MIN Z, WANG J L, MENG M Q H. Joint rigid registration of multiple generalized point sets with hybrid mixture models[J]. IEEE Transactions on Automation Science and Engineering, 2019, 17(1): 334-347.
[18]	CHENG J, LENG C, WU J X, et al. Fast and accurate image matching with cascade hashing for 3D reconstruction[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1-8.
[19]	原培新, 蔡炟, 曹文伟, 等. 基于双目立体视觉的列车目标识别和测距技术[J]. 东北大学学报: 自然科学版, 2022, 43(3): 335-343.
	YUAN P X, CAI D, CAO W W, et al. Train target recognition and ranging technology based on binocular stereoscopic vision[J]. Journal of Northeastern University: Natural Science, 2022, 43(3): 335-343 (in Chinese).