DEMF-Net: dual-branch feature enhancement and multi-scale fusion for semantic segmentation of large-scale point clouds

doi:10.11996/JG.j.2095-302X.2025020259

Abstract

Abstract:

Large-scale point cloud semantic segmentation serves as a critical task in the domain of 3D vision, with broad applications across fields such as autonomous driving, robotic navigation, smart city construction, and virtual reality. However, existing methods relying on down-sampling and exhibiting excessive disparities between multi-scale features often suffer from a substantial loss in the ability to capture fine-grained details and local structures. This degradation in the model’s capacity to preserve such local features impairs the accuracy of semantic segmentation. To address these issues, a novel semantic segmentation framework, DEMF-Net was proposed, which integrated dual-branch feature enhancement and multi-scale fusion strategies. The network incorporated a dual-branch enhanced aggregation module, which was designed to jointly encode point cloud attribute information and semantic features from the local neighborhood. Bilateral features were leveraged and embedded into corresponding original features, thereby improving the model’s ability to capture local details with higher fidelity. Furthermore, a multi-scale feature fusion module was introduced to effectively reduce the semantic gap between features at different scales. This module facilitated the fusion of adjacent multi-scale features, resulting in a global feature representation that synthesized information across all encoding layers. Such a design significantly enhanced the model’s global context awareness and enabled the integration of both upper and lower layer encoding, thereby enhancing the feature recognition capabilities. Comprehensive experiments were conducted on two widely used point cloud datasets, SensatUrban and S3DIS, to validate the performance of the proposed approach. Experimental results demonstrated that the mean Intersection over Union (mIoU) could be achieved by DEMF-Net at 61.6% and 66.7%, respectively, outperforming existing state-of-the-art methods.

Key words: three-dimensional vision, semantic segmentation, large-scale point cloud, urban scene, feature encoding

CLC Number:

TP391

LI Zhihuan, NING Xiaojuan, LV Zhiyong, SHI Zhenghao, JIN Haiyan, WANG Yinghui, ZHOU Wenming. DEMF-Net: dual-branch feature enhancement and multi-scale fusion for semantic segmentation of large-scale point clouds[J]. Journal of Graphics, 2025, 46(2): 259-269.

Figures/Tables 11

References 34

[1]	LI Z Y, CHEN Z H, LI A, et al. Unsupervised domain adaptation for monocular 3D object detection via self-training[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 245-262.
[2]	YAN X, GAO J T, ZHENG C D, et al. 2DPASS: 2D priors assisted semantic segmentation on LiDAR point clouds[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 677-695.
[3]	HAN X, DONG Z, YANG B S. A point-based deep learning network for semantic segmentation of MLS point clouds[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175: 199-214.
[4]	WAN J, XIE Z, XU Y Y, et al. DGANet: a dilated graph attention-based network for local feature extraction on 3D point clouds[J]. Remote Sensing, 2021, 13(17): 3484.
[5]	SHAO J, ZHANG W M, SHEN A J, et al. Seed point set-based building roof extraction from airborne LiDAR point clouds using a top-down strategy[J]. Automation in Construction, 2021, 126: 103660.
[6]	YANG G Q, XUE F Y, ZHANG Q, et al. UrbanBIS: a large-scale benchmark for fine-grained urban building instance segmentation[C]// ACM SIGGRAPH 2023 Conference Proceedings. New York: ACM, 2023: 16.
[7]	MARSOCCI V, COLETTA V, RAVANELLI R, et al. New trends in urban change detection:detecting 3D changes from bitemporal optical images[EB/OL]. [2024-06-22]. https://meetingorganizer.copernicus.org/EGU23/EGU23-13357.html.
[8]	崔振东, 李宗民, 杨树林, 等. 基于语义分割引导的三维目标检测[J]. 图学学报, 2022, 43(6): 1134-1142.
	CUI Z D, LI Z M, YANG S L, et al. 3D object detection based on semantic segmentation guidance[J]. Journal of Graphics, 2022, 43(6): 1134-1142 (in Chinese).
[9]	王江安, 庞大为, 黄乐, 等. 基于多尺度特征递归卷积的稠密点云重建网络[J]. 图学学报, 2022, 43(5): 875-883.
	WANG J A, PANG D W, HUANG L, et al. Dense point cloud reconstruction network using multi-scale feature recursive convolution[J]. Journal of Graphics, 2022, 43(5): 875-883 (in Chinese).
[10]	QI C R, SU H, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[11]	HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11105-11114.
[12]	BOULCH A, GUERRY J, LE SAUX B et al. SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks[J]. Computers & Graphics, 2018, 71: 189-198.
[13]	TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3D[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 3887-3896.
[14]	JARITZ M, GU J Y, SU H. Multi-view PointNet for 3D scene understanding[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop. New York: IEEE Press, 2019: 3395-4003.
[15]	ZHOU W G, JIANG X, LIU Y H. MVPointNet: multi-view network for 3D object based on point cloud[J]. IEEE Sensors Journal, 2019, 19(24): 12145-12152. DOI
[16]	HUANG J, YOU S Y. Point cloud labeling using 3D convolutional neural network[C]// The 23rd International Conference on Pattern Recognition. New York: IEEE Press, 2016: 2670-2675.
[17]	WANG P S, LIU Y, GUO Y X, et al. O-CNN: octree-based convolutional neural networks for 3D shape analysis[J]. ACM Transactions on Graphics, 2017, 36(4): 72.
[18]	TCHAPMI L, CHOY C, ARMENI I, et al. SEGCloud: semantic segmentation of 3D point clouds[C]// 2017 International Conference on 3D Vision. New York: IEEE Press, 2017: 537-547.
[19]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 3431-3440.
[20]	MENG H Y, GAO L, LAI Y K et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 8499-8507.
[21]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114.
[22]	WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 146.
[23]	THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6410-6419.
[24]	LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs[C]// 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4558-4567.
[25]	QIU S, ANWAR S, BARNES N. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1757-1767.
[26]	ZENG Z Y, XU Y Y, XIE Z, et al. Large-scale point cloud semantic segmentation via local perception and global descriptor vector[J]. Expert Systems with Applications, 2024, 246: 123269.
[27]	ZENG Z Y, XU Y Y, XIE Z, et al. LEARD-Net: semantic segmentation for large-scale point cloud scene[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 112: 102953.
[28]	HU Q Y, YANG B, KHALID S, et al. Towards semantic segmentation of urban-scale 3D point clouds: a dataset, benchmarks and challenges[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021:4975-4985.
[29]	GRAHAM B, ENGELCKE M, VAN DER MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 9224-9232.
[30]	SHUAI H, XU X, LIU Q S. Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 4973-4984.
[31]	XU Y Y, TANG W, ZENG Z Y, et al. NeiEA-NET: semantic segmentation of large-scale point cloud scene via neighbor enhancement and aggregation[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 119: 103285.
[32]	LI H C, GUAN H Y, MA L F, et al. MVPNet: a multi-scale voxel-point adaptive fusion network for point cloud semantic segmentation in urban scenes[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 122: 103391.
[33]	LI Y Y, BU R, SUN M C, et al. PointCNN: convolution on Χ-transformed points[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 828-838.
[34]	FAN S Q, DONG Q L, ZHU F H, et al. SCF-Net: learning spatial contextual features for large-scale point cloud segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14499-14508.

方法	OA	mIoU	地面	植被	建筑	墙	桥	停车场	铁轨	公路	街道设施	汽车	人行道	自行车	水
PointNet^[10]	80.8	23.7	67.9	89.5	80.1	0.0	0.0	3.9	0.0	31.6	0.0	35.0	0.0	0.0	0.0
PointNet++^[21]	84.3	32.9	72.5	94.2	84.8	2.7	2.1	25.8	0.0	31.5	11.4	38.8	7.1	0.0	56.9
TangetConv^[13]	76.9	33.3	71.5	91.4	75.9	35.2	0.0	45.3	0.0	26.7	19.2	67.6	0.0	0.0	0.0
SPGraph^[24]	85.3	37.3	69.9	94.6	88.9	32.8	12.6	15.8	15.5	30.6	22.9	56.4	0.5	0.0	44.2
SparseCpnv^[29]	88.7	42.7	74.1	97.9	94.2	63.3	7.5	24.2	0.0	30.1	34.0	74.4	0.0	0.0	54.8
KPConv^[23]	93.2	57.6	87.1	98.9	95.3	74.4	28.7	41.4	0.0	55.9	54.4	85.7	40.4	0.0	86.3
RandLA-Net^[11]	89.8	52.7	80.1	98.1	91.6	48.9	40.6	51.6	0.0	56.7	33.2	80.1	32.6	0.0	71.3
BAF-LAC^[30]	91.5	54.1	84.4	98.4	94.1	57.2	27.6	42.5	15.0	51.6	39.5	78.1	40.1	0.0	75.2
BAAF-Net^[25]	91.8	56.1	83.3	98.2	94.0	54.2	51.0	57.0	0.0	60.4	40.0	81.3	41.6	0.0	68.0
NeiEA-Net^[31]	91.7	57.0	83.3	98.1	93.4	50.1	61.3	57.8	0.0	60.0	41.6	82.4	42.1	0.0	71.0
MVP-Net^[32]	93.3	59.4	85.1	98.5	95.9	66.6	57.5	52.7	0.0	61.9	49.7	81.8	43.9	0.0	78.2
LACV-Net^[26]	93.2	61.3	85.5	98.4	95.6	61.9	58.6	64.0	28.5	62.8	45.4	81.9	42.4	4.8	67.7
DEMF-Net	92.8	61.6	85.4	98.4	95.1	59.5	57.4	60.5	30.8	59.1	45.2	81.2	41.2	10.3	76.1

方法	OA	mIoU	地面	植被	建筑	墙	桥	停车场	铁轨	公路	街道设施	汽车	人行道	自行车	水
PointNet^[10]	80.8	23.7	67.9	89.5	80.1	0.0	0.0	3.9	0.0	31.6	0.0	35.0	0.0	0.0	0.0
PointNet++^[21]	84.3	32.9	72.5	94.2	84.8	2.7	2.1	25.8	0.0	31.5	11.4	38.8	7.1	0.0	56.9
TangetConv^[13]	76.9	33.3	71.5	91.4	75.9	35.2	0.0	45.3	0.0	26.7	19.2	67.6	0.0	0.0	0.0
SPGraph^[24]	85.3	37.3	69.9	94.6	88.9	32.8	12.6	15.8	15.5	30.6	22.9	56.4	0.5	0.0	44.2
SparseCpnv^[29]	88.7	42.7	74.1	97.9	94.2	63.3	7.5	24.2	0.0	30.1	34.0	74.4	0.0	0.0	54.8
KPConv^[23]	93.2	57.6	87.1	98.9	95.3	74.4	28.7	41.4	0.0	55.9	54.4	85.7	40.4	0.0	86.3
RandLA-Net^[11]	89.8	52.7	80.1	98.1	91.6	48.9	40.6	51.6	0.0	56.7	33.2	80.1	32.6	0.0	71.3
BAF-LAC^[30]	91.5	54.1	84.4	98.4	94.1	57.2	27.6	42.5	15.0	51.6	39.5	78.1	40.1	0.0	75.2
BAAF-Net^[25]	91.8	56.1	83.3	98.2	94.0	54.2	51.0	57.0	0.0	60.4	40.0	81.3	41.6	0.0	68.0
NeiEA-Net^[31]	91.7	57.0	83.3	98.1	93.4	50.1	61.3	57.8	0.0	60.0	41.6	82.4	42.1	0.0	71.0
MVP-Net^[32]	93.3	59.4	85.1	98.5	95.9	66.6	57.5	52.7	0.0	61.9	49.7	81.8	43.9	0.0	78.2
LACV-Net^[26]	93.2	61.3	85.5	98.4	95.6	61.9	58.6	64.0	28.5	62.8	45.4	81.9	42.4	4.8	67.7
DEMF-Net	92.8	61.6	85.4	98.4	95.1	59.5	57.4	60.5	30.8	59.1	45.2	81.2	41.2	10.3	76.1

方法	OA	mAcc	mIoU	天花板	地板	墙	梁	柱	窗	门	桌	椅	沙发	书柜	板	其他
PointNet^[10]	-	49.0	41.1	88.8	97.3	69.8	0.1	3.9	46.3	10.8	59.0	52.6	5.9	40.3	26.4	33.2
PointNet++^[21]	-	-	47.8	90.3	95.6	69.3	0.1	13.8	26.7	44.1	64.3	70.0	27.8	47.8	30.8	38.1
SegCloud^[18]	-	57.4	48.9	90.1	96.1	69.9	0.0	18.4	38.4	23.1	70.4	75.9	40.9	58.4	13.0	41.6
TangentConv^[13]	-	62.2	52.6	90.5	97.7	74.0	0.0	20.7	39.0	31.3	77.5	69.4	57.3	38.5	48.8	39.8
PointCNN^[33]	85.9	63.9	57.3	92.3	98.2	79.4	0.0	17.6	28.8	62.1	70.4	80.6	39.7	66.7	62.1	56.7
SPGraph^[24]	86.4	66.5	58.0	89.4	96.9	78.1	0.0	42.8	48.9	61.6	84.7	75.4	69.8	52.6	2.1	52.5
RandLA-Net^[11]	87.6	70.6	62.7	92.6	97.9	81.2	0.0	21.8	60.9	43.4	77.6	86.8	64.6	70.0	66.0	52.2
SCF-Net^[34]	87.2	71.8	63.7	90.8	97.0	80.9	0.0	19.9	60.7	44.6	79.4	87.9	76.1	71.5	68.8	50.4
BAAF-Net^[25]	88.2	73.0	64.4	93.7	97.7	82.1	0.0	33.1	61.7	51.1	79.2	86.6	62.4	69.8	64.9	54.6
BAF-LAC^[30]	-	-	65.1	92.7	98.1	81.5	0.0	34.2	61.0	44.8	78.5	87.5	76.3	70.2	68.4	52.8
NeiEA-Net^[31]	88.5	74.4	66.1	92.9	97.4	83.3	0.0	34.9	61.8	55.3	78.8	86.7	77.1	69.5	67.9	54.2
DEMF-Net	88.3	76.0	66.7	92.3	98.0	82.1	0.1	35.0	63.4	48.7	80.0	89.7	81.5	71.5	71.6	53.3

方法	OA	mAcc	mIoU	天花板	地板	墙	梁	柱	窗	门	桌	椅	沙发	书柜	板	其他
PointNet^[10]	-	49.0	41.1	88.8	97.3	69.8	0.1	3.9	46.3	10.8	59.0	52.6	5.9	40.3	26.4	33.2
PointNet++^[21]	-	-	47.8	90.3	95.6	69.3	0.1	13.8	26.7	44.1	64.3	70.0	27.8	47.8	30.8	38.1
SegCloud^[18]	-	57.4	48.9	90.1	96.1	69.9	0.0	18.4	38.4	23.1	70.4	75.9	40.9	58.4	13.0	41.6
TangentConv^[13]	-	62.2	52.6	90.5	97.7	74.0	0.0	20.7	39.0	31.3	77.5	69.4	57.3	38.5	48.8	39.8
PointCNN^[33]	85.9	63.9	57.3	92.3	98.2	79.4	0.0	17.6	28.8	62.1	70.4	80.6	39.7	66.7	62.1	56.7
SPGraph^[24]	86.4	66.5	58.0	89.4	96.9	78.1	0.0	42.8	48.9	61.6	84.7	75.4	69.8	52.6	2.1	52.5
RandLA-Net^[11]	87.6	70.6	62.7	92.6	97.9	81.2	0.0	21.8	60.9	43.4	77.6	86.8	64.6	70.0	66.0	52.2
SCF-Net^[34]	87.2	71.8	63.7	90.8	97.0	80.9	0.0	19.9	60.7	44.6	79.4	87.9	76.1	71.5	68.8	50.4
BAAF-Net^[25]	88.2	73.0	64.4	93.7	97.7	82.1	0.0	33.1	61.7	51.1	79.2	86.6	62.4	69.8	64.9	54.6
BAF-LAC^[30]	-	-	65.1	92.7	98.1	81.5	0.0	34.2	61.0	44.8	78.5	87.5	76.3	70.2	68.4	52.8
NeiEA-Net^[31]	88.5	74.4	66.1	92.9	97.4	83.3	0.0	34.9	61.8	55.3	78.8	86.7	77.1	69.5	67.9	54.2
DEMF-Net	88.3	76.0	66.7	92.3	98.0	82.1	0.1	35.0	63.4	48.7	80.0	89.7	81.5	71.5	71.6	53.3

模型	DEA	DRM	MFF	mIoU/%
Baseline	-	-	-	62.7
A1	√	-	-	65.2
A2	√	√	-	65.9
A3	-	-	√	64.3
DEMF-Net	√	√	√	66.7