基于均值漂移与深度学习融合的小语义点云语义分割

doi:10.11996/JG.j.2095-302X.2025050998

摘要/Abstract

摘要：

在点云语义分割领域，准确分割小语义对象一直是一个重要且具有挑战性的问题。点云数据通常具有稀疏性和不规则性，尤其是在面对小物体或远距离物体时，现有的全监督点云分割算法往往无法有效地捕捉这些小语义对象的特征，导致分割精度较低。这种问题在自动驾驶、机器人导航和城市建模等应用中尤为突出，因为这些任务通常依赖于对小物体的准确识别与定位。为解决此问题，提出了一种基于均值漂移与深度学习融合的小语义点云分割算法。分析了现有点云分割算法在处理小语义对象时的不足，重点阐述了由于小物体的稀疏性和局部特征弱，现有方法往往未能有效提取其语义信息。为此，将均值漂移引入深度神经网络中，作为一种特征提取模块，以提高对小语义对象的关注度。在网络架构设计上，还特别设计了特征处理模块和小语义对象邻域捕获模块。特征处理模块有效地增强了小物体的局部特征，帮助网络在复杂背景中更好地区分小物体与大物体；而小语义对象邻域捕获模块则进一步聚焦于小物体周围的上下文信息，使得模型能够在局部区域内捕捉到更精确的语义特征。通过在多个点云数据集上的实验评估表明，在分割小语义对象上，尤其在稀疏、小物体密集场景下，改进后的方法有效地提高了分割精度。综上所述，基于均值漂移与深度学习融合的小语义点云分割算法为小语义对象的准确分割提供了一种有效的解决方案，具有广泛的应用前景和实际意义。

关键词: 点云处理, 语义分割, 均值漂移, 深度学习, 小语义对象特征

Abstract:

In the field of point cloud semantic segmentation, accurate segmentation of small semantic objects has always been an important and challenging task. Point cloud data is typically sparse and irregular, and when small or distant objects are processed, existing fully-supervised point cloud segmentation algorithms often fail to effectively capture the features of these small semantic objects, leading to lower segmentation accuracy. This issue is particularly prominent in applications such as autonomous driving, robot navigation, and urban modeling, given their reliance on the accurate identification and localization of small objects. To address this problem, a small semantic point cloud segmentation algorithm integrating mean shift clustering with deep learning was proposed. The shortcomings of existing point cloud segmentation algorithms in handling small semantic objects were analyzed, emphasizing that due to the sparsity and weak local features of small objects, current methods are often unable to effectively extract their semantic information. To overcome this, mean shift was integrated into deep neural networks as a feature extraction module to enhance the model’s attention to small semantic objects. In terms of network architecture, a feature processing module and a small semantic object neighborhood capture module were also specifically designed. The feature processing module effectively enhanced the local features of small objects, facilitating the network to better distinguish small from large objects in complex backgrounds. Meanwhile, the small semantic object neighborhood capture module focused on the contextual information surrounding small objects, enabling the model to capture more precise semantic features in local regions. Through experimental evaluation on multiple point cloud datasets, the results demonstrated that the proposed method significantly improved segmentation accuracy, especially in sparse and small-object-dense scenarios. In conclusion, the small semantic point cloud segmentation algorithm based on the integration of mean shift and deep learning provided an effective solution for accurate segmentation of small semantic objects, with broad application prospects and practical significance.

Key words: point cloud processing, semantic segmentation, mean shift, deep learning, small semantic object features

中图分类号:

朱泓淼, 钟国杰, 张严辞. 基于均值漂移与深度学习融合的小语义点云语义分割[J]. 图学学报, 2025, 46(5): 998-1009.

ZHU Hongmiao, ZHONG Guojie, ZHANG Yanci. Semantic segmentation of small-scale point clouds based on integration of mean shift and deep learning[J]. Journal of Graphics, 2025, 46(5): 998-1009.

图/表 21

参考文献 45

[1]	HUANG W, LIANG H, LIN L, et al. A fast point cloud ground segmentation approach based on coarse-to-fine Markov random field[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7841-7854.
[2]	张玥焜, 余文杰, 赵习之, 等. 基于机载激光雷达点云的交互式树木分割与建模方法研究[J]. 图学学报, 2021, 42(4): 599-607.
	ZHANG Y K, YU W J, ZHAO X Z, et al. Interactive tree segmentation and modeling from ALS point clouds[J]. Journal of Graphics, 2021, 42(4): 599-607 (in Chinese). DOI
[3]	牛辰庚, 刘玉杰, 李宗民, 等. 基于点云数据的三维目标识别和模型分割方法[J]. 图学学报, 2019, 40(2): 274-281. DOI
	NIU C Y, LIU Y J, LI Z M, et al. 3D object recognition and model segmentation based on point cloud data[J]. Journal of Graphics, 2019, 40(2): 274-281 (in Chinese).
[4]	ZHU X G, ZHOU H, WANG T, et al. Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9939-9948.
[5]	YAN X, GAO J T, LI J, et al. Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion[C]// The 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3101-3109.
[6]	XU C F, WU B C, WANG Z N, et al. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 1-19.
[7]	CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[8]	CHARLES R Q, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[EB/OL]. [2024-04-20]. http://arxiv.org/pdf/1706.02413.
[9]	THOMAS H, CHARLES R Q, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6410-6419.
[10]	TANG H T, LIU Z J, ZHAO S Y, et al. Searching efficient 3D architectures with sparse point-voxel convolution[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 685-702.
[11]	TAO A, DUAN Y Q, WEI Y, et al. SegGroup: seg-level supervision for 3D instance and semantic segmentation[J]. IEEE Transactions on Image Processing, 2022, 31: 4952-4965. DOI PMID
[12]	GRAHAM B, ENGELCKE M, VAN DER MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018.
[13]	FENG, M T, LIANG Z, LIN X F, et al.. Point attention network for semantic segmentation of 3D point clouds[J]. Pattern Recognit. 2020, 107: 107446.
[14]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2024-04-20]. https://dblp.uni-trier.de/db/conf/iclr/iclr2017.html#KipfW17.
[15]	ZOU Y, YU Z D, VIJAYA KUMAR B V K, et al. Domain adaptation for semantic segmentation via class-balanced self-training[EB/OL]. [2024-04-20]. http://arxiv.org/abs/1810.07911?context=cs.LG.
[16]	HOU H Y, SHEN M Y, HSU C C, et al. Ensemble fusion for small object detection[C]// The 18th International Conference on Machine Vision and Applications. New York: IEEE Press, 2023: 1-6.
[17]	XIE Y X, TIAN J J, ZHU X X. Linking points with labels in 3D: a review of point cloud semantic segmentation[J]. IEEE Geoscience and Remote Sensing Magazine, 2020, 8(4): 38-59.
[18]	HU Q Y, YANG B, XIE L H, et al. RandLA-net: efficient semantic segmentation of large-scale point clouds[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11108-11117.
[19]	WANG L, HUANG Y C, HOU Y L, et al. Graph attention convolution for point cloud semantic segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10296-10305.
[20]	WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146.
[21]	LI Y Y, BU R, SUN M C, et al. PointCNN: convolution on X-transformed points[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 828-838.
[22]	FAN S Q, DONG Q L, ZHU F H, et al. SCF-net: learning spatial contextual features for large-scale point cloud segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14499-14508.
[23]	LEI H, AKHTAR N, MIAN A. Spherical kernel for efficient graph convolution on 3D point clouds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3664-3680.
[24]	ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2921-2929.
[25]	ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16239-16248.
[26]	WU X Y, LAO Y X, JIANG L, et al. Point transformer V2:grouped vector attention and partition-based pooling[EB/OL]. [2024-04-17]. https://doi.org/10.48550/arXiv.2210.05666.
[27]	WU X Y, JIANG L, WANG P S, et al. Point transformer V3: simpler, faster, stronger[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 4840-4851.
[28]	BI Y X, LIU P, SHI J L, et al. A multi-modal fusion 3D semantic segmentation method[C]// 2023 3rd International Conference on Electronic Information Engineering and Computer Science. New York: IEEE Press, 2023: 542-545.
[29]	CARDACE A, CONTI A, RAMIREZ P Z, et al. Boosting multi-modal unsupervised domain adaptation for LiDAR semantic segmentation by self-supervised depth completion[J]. IEEE Access, 2023, 11: 85155-85164.
[30]	DU S Q, WANG W X, GUO R Z, et al. AsymFormer: asymmetrical cross-modal representation learning for mobile platform real-time RGB-D semantic segmentation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2024: 7608-7615.
[31]	XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678. PMID
[32]	STAUFFER C, GRIMSION W E L. Adaptive background mixture models for real-time tracking[C]// 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 1999: 246-252.
[33]	LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.
[34]	ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// The 2nd International Conference on Knowledge Discovery and Data Mining. Palo Alto: AAAI Press, 1996: 226-231.
[35]	COMANICIU D, RAMESH V, MEER P. Real-time tracking of non-rigid objects using mean shift[C]// IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2000: 142-149.
[36]	MELZER T. Non-parametric segmentation of ALS point clouds using mean shift[J]. Journal of Applied Geodesy, 2007, 1(3): 159-170.
[37]	ZHANG Z X, ZHANG L Q, TONG X H, et al. A multilevel point-cluster-based discriminative feature for ALS point cloud classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(6): 3309-3321.
[38]	YUE W L, LU J G, ZHOU W H, et al. A new plane segmentation method of point cloud based on mean shift and RANSAC[C]// 2018 Chinese Control and Decision Conference. New York: IEEE Press, 2018: 1658-1663.
[39]	ZHANG Z X, ZHANG L Q, TONG X H, et al. Discriminative- dictionary-learning-based multilevel point-cluster features for ALS point-cloud classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7309-7322.
[40]	CHEN C, LI G B, XU R J, et al. ClusterNet: deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4994-5002.
[41]	SALGADO-UGARTE I H, PÉREZ-HERNÁNDEZ M A. Exploring the use of variable bandwidth kernel density estimators[J]. The Stata Journal, 2003, 3(2): 133-147.
[42]	SILVERMAN B W. Density estimation for statistics and data analysis[M]. New York: Routledge, 1998: 95-119.
[43]	WU D Y, DING Y, ZHANG M F, et al. Multi-features refinement and aggregation for medical brain segmentation[J]. IEEE Access, 2020, 8: 57483-57496.
[44]	ARMENI I, SENER O, ZAMIR A R, et al. 3D semantic parsing of large-scale indoor spaces[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 27-30.
[45]	COMANICIU D, MEER P. Mean shift analysis and applications[C]// The 7th IEEE International Conference on Computer Vision. New York: IEEE Press, 1999: 1197-1203.

软硬件	实验环境
CPU	12th Gen Intel(R) Core(TM) i5-12600KF 3.70 GHz
内存	16 GB
显卡	NVIDIA GeForce RTX 3070
操作系统	Windows 11
编译器	Visual Studio Code
开发库	Python，C++，CUDA，Pytroch， Tensorflow，Numpy

软硬件	实验环境
CPU	12th Gen Intel(R) Core(TM) i5-12600KF 3.70 GHz
内存	16 GB
显卡	NVIDIA GeForce RTX 3070
操作系统	Windows 11
编译器	Visual Studio Code
开发库	Python，C++，CUDA，Pytroch， Tensorflow，Numpy

算法	数据集	训练周期	批量大小	样本数量
PointNet++^[6]	S3DIS	32	10	4 096
PointNet^[5]	SCU	20	4	1 024
K-means+PointNet^[5]	SCU	20	4	1 024
DBSCAN^[32]+PointNet^[5]	SCU	20	4	1 024
EM^[29]+PointNet^[5]	SCU	20	4	1 024
Ours+PointNet++^[6]	S3DIS	32	10	4 096
Ours+PointNet^[5]	SCU	20	4	1 024

算法	数据集	训练周期	批量大小	样本数量
PointNet++^[6]	S3DIS	32	10	4 096
PointNet^[5]	SCU	20	4	1 024
K-means+PointNet^[5]	SCU	20	4	1 024
DBSCAN^[32]+PointNet^[5]	SCU	20	4	1 024
EM^[29]+PointNet^[5]	SCU	20	4	1 024
Ours+PointNet++^[6]	S3DIS	32	10	4 096
Ours+PointNet^[5]	SCU	20	4	1 024

算法	mIoU	Table (5.5)	Chair (2.9)	Sofa (1.7)	Ceiling (18.9)	Floor (15.4)	Wall (28.8)	Column (2.8)	Window (7.3)	Door (69.5)	Bookcase (16.3)	Board (2.5)	Clutter (9.0)
KPConv rigid^[9]	65.4	80.2	90.1	66.4	92.6	97.3	81.4	16.5	54.5	69.5	74.6	63.7	58.1
RandLA^[18]	63.0	77.2	85.2	71.5	92.4	96.7	80.6	18.3	61.3	43.3	71.0	69.2	52.3
PointCNN^[21]	57.3	74.4	80.6	31.7	92.3	62.1	79.4	17.6	22.8	62.1	66.7	62.1	56.7
SCF-Net^[22]	63.3	72.2	81.1	62.1	93.2	95.4	78.1	43.8	51.2	60.4	70.7	65.8	56.4
SPH3D^[23]	59.5	79.9	86.9	33.2	93.3	97.1	81.1	33.2	45.8	43.8	71.5	54.1	53.7
Ours	63.7	84.9	90.4	74.3	93.7	96.7	78.3	21.8	54.7	64.8	61.1	54.7	52.8