Semantic segmentation of small-scale point clouds based on integration of mean shift and deep learning

doi:10.11996/JG.j.2095-302X.2025050998

Abstract

Abstract:

In the field of point cloud semantic segmentation, accurate segmentation of small semantic objects has always been an important and challenging task. Point cloud data is typically sparse and irregular, and when small or distant objects are processed, existing fully-supervised point cloud segmentation algorithms often fail to effectively capture the features of these small semantic objects, leading to lower segmentation accuracy. This issue is particularly prominent in applications such as autonomous driving, robot navigation, and urban modeling, given their reliance on the accurate identification and localization of small objects. To address this problem, a small semantic point cloud segmentation algorithm integrating mean shift clustering with deep learning was proposed. The shortcomings of existing point cloud segmentation algorithms in handling small semantic objects were analyzed, emphasizing that due to the sparsity and weak local features of small objects, current methods are often unable to effectively extract their semantic information. To overcome this, mean shift was integrated into deep neural networks as a feature extraction module to enhance the model’s attention to small semantic objects. In terms of network architecture, a feature processing module and a small semantic object neighborhood capture module were also specifically designed. The feature processing module effectively enhanced the local features of small objects, facilitating the network to better distinguish small from large objects in complex backgrounds. Meanwhile, the small semantic object neighborhood capture module focused on the contextual information surrounding small objects, enabling the model to capture more precise semantic features in local regions. Through experimental evaluation on multiple point cloud datasets, the results demonstrated that the proposed method significantly improved segmentation accuracy, especially in sparse and small-object-dense scenarios. In conclusion, the small semantic point cloud segmentation algorithm based on the integration of mean shift and deep learning provided an effective solution for accurate segmentation of small semantic objects, with broad application prospects and practical significance.

Key words: point cloud processing, semantic segmentation, mean shift, deep learning, small semantic object features

CLC Number:

ZHU Hongmiao, ZHONG Guojie, ZHANG Yanci. Semantic segmentation of small-scale point clouds based on integration of mean shift and deep learning[J]. Journal of Graphics, 2025, 46(5): 998-1009.

Figures/Tables 21

References 45

[1]	HUANG W, LIANG H, LIN L, et al. A fast point cloud ground segmentation approach based on coarse-to-fine Markov random field[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7841-7854.
[2]	张玥焜, 余文杰, 赵习之, 等. 基于机载激光雷达点云的交互式树木分割与建模方法研究[J]. 图学学报, 2021, 42(4): 599-607.
	ZHANG Y K, YU W J, ZHAO X Z, et al. Interactive tree segmentation and modeling from ALS point clouds[J]. Journal of Graphics, 2021, 42(4): 599-607 (in Chinese). DOI
[3]	牛辰庚, 刘玉杰, 李宗民, 等. 基于点云数据的三维目标识别和模型分割方法[J]. 图学学报, 2019, 40(2): 274-281. DOI
	NIU C Y, LIU Y J, LI Z M, et al. 3D object recognition and model segmentation based on point cloud data[J]. Journal of Graphics, 2019, 40(2): 274-281 (in Chinese).
[4]	ZHU X G, ZHOU H, WANG T, et al. Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9939-9948.
[5]	YAN X, GAO J T, LI J, et al. Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion[C]// The 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 3101-3109.
[6]	XU C F, WU B C, WANG Z N, et al. SqueezeSegV3: spatially-adaptive convolution for efficient point-cloud segmentation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 1-19.
[7]	CHARLES R Q, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[8]	CHARLES R Q, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[EB/OL]. [2024-04-20]. http://arxiv.org/pdf/1706.02413.
[9]	THOMAS H, CHARLES R Q, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6410-6419.
[10]	TANG H T, LIU Z J, ZHAO S Y, et al. Searching efficient 3D architectures with sparse point-voxel convolution[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 685-702.
[11]	TAO A, DUAN Y Q, WEI Y, et al. SegGroup: seg-level supervision for 3D instance and semantic segmentation[J]. IEEE Transactions on Image Processing, 2022, 31: 4952-4965. DOI PMID
[12]	GRAHAM B, ENGELCKE M, VAN DER MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018.
[13]	FENG, M T, LIANG Z, LIN X F, et al.. Point attention network for semantic segmentation of 3D point clouds[J]. Pattern Recognit. 2020, 107: 107446.
[14]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. [2024-04-20]. https://dblp.uni-trier.de/db/conf/iclr/iclr2017.html#KipfW17.
[15]	ZOU Y, YU Z D, VIJAYA KUMAR B V K, et al. Domain adaptation for semantic segmentation via class-balanced self-training[EB/OL]. [2024-04-20]. http://arxiv.org/abs/1810.07911?context=cs.LG.
[16]	HOU H Y, SHEN M Y, HSU C C, et al. Ensemble fusion for small object detection[C]// The 18th International Conference on Machine Vision and Applications. New York: IEEE Press, 2023: 1-6.
[17]	XIE Y X, TIAN J J, ZHU X X. Linking points with labels in 3D: a review of point cloud semantic segmentation[J]. IEEE Geoscience and Remote Sensing Magazine, 2020, 8(4): 38-59.
[18]	HU Q Y, YANG B, XIE L H, et al. RandLA-net: efficient semantic segmentation of large-scale point clouds[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11108-11117.
[19]	WANG L, HUANG Y C, HOU Y L, et al. Graph attention convolution for point cloud semantic segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10296-10305.
[20]	WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146.
[21]	LI Y Y, BU R, SUN M C, et al. PointCNN: convolution on X-transformed points[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 828-838.
[22]	FAN S Q, DONG Q L, ZHU F H, et al. SCF-net: learning spatial contextual features for large-scale point cloud segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14499-14508.
[23]	LEI H, AKHTAR N, MIAN A. Spherical kernel for efficient graph convolution on 3D point clouds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3664-3680.
[24]	ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2921-2929.
[25]	ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16239-16248.
[26]	WU X Y, LAO Y X, JIANG L, et al. Point transformer V2:grouped vector attention and partition-based pooling[EB/OL]. [2024-04-17]. https://doi.org/10.48550/arXiv.2210.05666.
[27]	WU X Y, JIANG L, WANG P S, et al. Point transformer V3: simpler, faster, stronger[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 4840-4851.
[28]	BI Y X, LIU P, SHI J L, et al. A multi-modal fusion 3D semantic segmentation method[C]// 2023 3rd International Conference on Electronic Information Engineering and Computer Science. New York: IEEE Press, 2023: 542-545.
[29]	CARDACE A, CONTI A, RAMIREZ P Z, et al. Boosting multi-modal unsupervised domain adaptation for LiDAR semantic segmentation by self-supervised depth completion[J]. IEEE Access, 2023, 11: 85155-85164.
[30]	DU S Q, WANG W X, GUO R Z, et al. AsymFormer: asymmetrical cross-modal representation learning for mobile platform real-time RGB-D semantic segmentation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2024: 7608-7615.
[31]	XU R, WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645-678. PMID
[32]	STAUFFER C, GRIMSION W E L. Adaptive background mixture models for real-time tracking[C]// 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 1999: 246-252.
[33]	LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.
[34]	ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// The 2nd International Conference on Knowledge Discovery and Data Mining. Palo Alto: AAAI Press, 1996: 226-231.
[35]	COMANICIU D, RAMESH V, MEER P. Real-time tracking of non-rigid objects using mean shift[C]// IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2000: 142-149.
[36]	MELZER T. Non-parametric segmentation of ALS point clouds using mean shift[J]. Journal of Applied Geodesy, 2007, 1(3): 159-170.
[37]	ZHANG Z X, ZHANG L Q, TONG X H, et al. A multilevel point-cluster-based discriminative feature for ALS point cloud classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(6): 3309-3321.
[38]	YUE W L, LU J G, ZHOU W H, et al. A new plane segmentation method of point cloud based on mean shift and RANSAC[C]// 2018 Chinese Control and Decision Conference. New York: IEEE Press, 2018: 1658-1663.
[39]	ZHANG Z X, ZHANG L Q, TONG X H, et al. Discriminative- dictionary-learning-based multilevel point-cluster features for ALS point-cloud classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7309-7322.
[40]	CHEN C, LI G B, XU R J, et al. ClusterNet: deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4994-5002.
[41]	SALGADO-UGARTE I H, PÉREZ-HERNÁNDEZ M A. Exploring the use of variable bandwidth kernel density estimators[J]. The Stata Journal, 2003, 3(2): 133-147.
[42]	SILVERMAN B W. Density estimation for statistics and data analysis[M]. New York: Routledge, 1998: 95-119.
[43]	WU D Y, DING Y, ZHANG M F, et al. Multi-features refinement and aggregation for medical brain segmentation[J]. IEEE Access, 2020, 8: 57483-57496.
[44]	ARMENI I, SENER O, ZAMIR A R, et al. 3D semantic parsing of large-scale indoor spaces[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 27-30.
[45]	COMANICIU D, MEER P. Mean shift analysis and applications[C]// The 7th IEEE International Conference on Computer Vision. New York: IEEE Press, 1999: 1197-1203.

软硬件	实验环境
CPU	12th Gen Intel(R) Core(TM) i5-12600KF 3.70 GHz
内存	16 GB
显卡	NVIDIA GeForce RTX 3070
操作系统	Windows 11
编译器	Visual Studio Code
开发库	Python，C++，CUDA，Pytroch， Tensorflow，Numpy

软硬件	实验环境
CPU	12th Gen Intel(R) Core(TM) i5-12600KF 3.70 GHz
内存	16 GB
显卡	NVIDIA GeForce RTX 3070
操作系统	Windows 11
编译器	Visual Studio Code
开发库	Python，C++，CUDA，Pytroch， Tensorflow，Numpy

算法	数据集	训练周期	批量大小	样本数量
PointNet++^[6]	S3DIS	32	10	4 096
PointNet^[5]	SCU	20	4	1 024
K-means+PointNet^[5]	SCU	20	4	1 024
DBSCAN^[32]+PointNet^[5]	SCU	20	4	1 024
EM^[29]+PointNet^[5]	SCU	20	4	1 024
Ours+PointNet++^[6]	S3DIS	32	10	4 096
Ours+PointNet^[5]	SCU	20	4	1 024

算法	数据集	训练周期	批量大小	样本数量
PointNet++^[6]	S3DIS	32	10	4 096
PointNet^[5]	SCU	20	4	1 024
K-means+PointNet^[5]	SCU	20	4	1 024
DBSCAN^[32]+PointNet^[5]	SCU	20	4	1 024
EM^[29]+PointNet^[5]	SCU	20	4	1 024
Ours+PointNet++^[6]	S3DIS	32	10	4 096
Ours+PointNet^[5]	SCU	20	4	1 024

算法	mIoU	Table (5.5)	Chair (2.9)	Sofa (1.7)	Ceiling (18.9)	Floor (15.4)	Wall (28.8)	Column (2.8)	Window (7.3)	Door (69.5)	Bookcase (16.3)	Board (2.5)	Clutter (9.0)
KPConv rigid^[9]	65.4	80.2	90.1	66.4	92.6	97.3	81.4	16.5	54.5	69.5	74.6	63.7	58.1
RandLA^[18]	63.0	77.2	85.2	71.5	92.4	96.7	80.6	18.3	61.3	43.3	71.0	69.2	52.3
PointCNN^[21]	57.3	74.4	80.6	31.7	92.3	62.1	79.4	17.6	22.8	62.1	66.7	62.1	56.7
SCF-Net^[22]	63.3	72.2	81.1	62.1	93.2	95.4	78.1	43.8	51.2	60.4	70.7	65.8	56.4
SPH3D^[23]	59.5	79.9	86.9	33.2	93.3	97.1	81.1	33.2	45.8	43.8	71.5	54.1	53.7
Ours	63.7	84.9	90.4	74.3	93.7	96.7	78.3	21.8	54.7	64.8	61.1	54.7	52.8