DCSplat：一种深度约束的稀疏视角三维重建方法

doi:10.11996/JG.j.2095-302X.2025030510

摘要/Abstract

摘要：

针对稀疏视角三维重建中的挑战，尤其是高斯椭球数量不足引起的重建孔洞和精度衰减等问题，提出了一种深度约束的3D Gaussian splatting (3DGS)稀疏视角三维重建方法(DCSplat)，利用深度约束自适应的补全3DGS初始化时所需的点云，设计了一种随机结构相似性损失，实现了稀疏视角图像的快速高精质量重建。其核心在于利用提出的前馈神经网络来完善SFM过程中产生的稀疏点云。首先，通过预训练的单目深度估计网络从图像中预测深度信息。其次，利用相机参数构建投影矩阵，将稀疏点云投影到图像上，建立点云z值与深度值关联关系，进一步构建和训练图像像素深度值与点云z值映射的深度神经网络，用于优化和补全3DGS所需的点云信息。再次，为克服3DGS逐点优化损失的局限性，引入了一种随机结构相似性损失函数，该函数将对应于像素的多个高斯视为整体来处理，能够全局考虑点云结构，促进更连贯和精确的三维重建。DCSplat在LLFF, DTU和MipNeRF360标准数据集上的测试结果表明，其在关键评价指标上，包括峰值信噪比(PSNR)、结构相似性(SSIM)以及学习感知图像块相似度(LPIPS)，均达到甚至超越了现有方法的性能水平，能够有效提升重建质量。此外，该方法基于深度约束完成点云补全，从全局到局部利用深度信息优化重建质量，并在多项指标上表现出良好的性能提升，展现了一定的应用潜力。

关键词: 三维重建, 稀疏视角, DCSplat, 3DGS, 深度信息

Abstract:

To address the challenges in sparse-view 3D reconstruction, particularly reconstruction holes and accuracy degradation caused by insufficient Gaussians, a sparse-view 3D reconstruction method based on 3D Gaussian Splatting (3DGS) technology was proposed, namely DCSplat. This method utilized depth constraints to adaptively complete the point cloud required for 3DGS initialization and designed a random structural similarity loss to achieve fast and high-precision reconstruction of sparse-view images. The core of the method lay in the use of a proposed feedforward neural network to improve the sparse point cloud generated during the structure from motion (SFM) process. Firstly, a pre-trained monocular depth estimation network was used to predict depth information from the images. Secondly, a projection matrix was constructed using camera parameters to project the sparse point clouds onto the images, thereby establishing a correlation between point cloud’s z-values and depth values. Furthermore, a deep neural network was constructed and trained to map the depth values of image pixels to point cloud z-values, which was used to optimize and complete the point cloud information required for 3DGS. Additionally, to overcome the limitations of point-by-point optimization loss in 3DGS, a random structural similarity loss function was introduced, treating multiple Gaussians corresponding to pixels as a whole for processing. This enabled global consideration of the point cloud structure, thereby promoting more coherent and accurate 3D reconstruction. The test results of DCSplat on the local light field fusion (LLFF), large scale multi view stereotaxis evaluation (DTU), and unbounded anti aliasing neural radiance fields (MipNeRF360) standard datasets demonstrated that it achieved or even surpassed the performance level of existing methods on key evaluation indicators, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and learned perceptual image patch similarity (LPIPS), effectively improving the reconstruction quality. In addition, this method completed point cloud completion based on depth constraints, optimized reconstruction quality from global to local scales using depth information, and exhibited significant performance improvements across multiple indicators, thereby demonstrating certain application potential.

Key words: 3D reconstruction, sparse viewpoints, DCSplat, 3DGS, depth information

中图分类号:

TP391

黄志勇, 佘雅丽, 华喜锋, 向梦丽, 杨晨龙, 丁妥君. DCSplat：一种深度约束的稀疏视角三维重建方法[J]. 图学学报, 2025, 46(3): 510-519.

HUANG Zhiyong, SHE Yali, HUA Xifeng, XIANG Mengli, YANG Chenlong, DING Tuojun. DCSplat: Gaussian splatting with depth information constraints under sparse viewpoints[J]. Journal of Graphics, 2025, 46(3): 510-519.

图/表 10

图1 DCSplat方法框架图

Fig. 1 The framework diagram of the DCSplat method

表1 不同方法在LLFF数据集上的实验结果

Table 1 Experimental results of different methods on the LLFF dataset

模型	3-views			6-views			9-views
模型	SSIM↑	PSNR↑	LPIPS↓	SSIM↑	PSNR↑	LPIPS↓	SSIM↑	PSNR↑	LPIPS↓
3DGS	0.447	14.975	0.428	0.620	18.888	0.300	0.680	24.072	0.258
SparseNeRF	0.613	19.311	0.341	0.743	22.980	0.262	0.784	24.183	0.236
RegNeRF	0.677	19.038	0.358	0.809	23.004	0.240	0.849	24.475	0.216
DNGaussian	0.651	18.638	0.316	0.605	20.019	0.350	0.682	22.131	0.348
DCSplat	0.678	19.054	0.302	0.733	22.077	0.220	0.758	23.191	0.195

图2 LLFF数据集实验结果对比

Fig. 2 Comparison of experimental results on the LLFF dataset ((a) 3DGS; (b) SparseNeRF; (c) RegNeRF;(d) DNGaussian; (e) Ours; (f) GT)

表2 不同方法在DTU数据集上的实验结果

Table 2 Experimental results of different methods on the DTU dataset

模型	3-views			6-views			9-views
模型	SSIM↑	PSNR↑	LPIPS↓	SSIM↑	PSNR↑	LPIPS↓	SSIM↑	PSNR↑	LPIPS↓
3DGS	0.467	12.800	0.482	0.543	17.535	0.346	0.550	17.975	0.324
SparseNeRF	0.448	14.249	0.479	0.525	16.561	0.391	0.495	17.895	0.435
RegNeRF	0.455	12.391	0.533	0.568	16.222	0.469	0.577	17.373	0.471
DCSplat	0.474	14.549	0.440	0.544	17.223	0.361	0.556	17.775	0.326

图3 DTU数据集实验结果对比

Fig. 3 Comparison of experimental results on the DTU dataset ((a) 3DGS; (b) SparseNeRF; (c) RegNeRF; (d) Ours; (e) GT)

图4 MipNeRF360数据集实验结果对比

Fig. 4 Comparison of experimental results on the MipNeRF360 dataset ((a) 3DGS; (b) MipNeGF-360; (c) Ours; (d) GT)

表3 不同方法在MipNeRF360数据集上的实验结果

Table 3 Experimental results of different methods on the MipNeRF360 dataset

模型	12-views
模型	SSIM↑	PSNR↑	LPIPS↓
3DGS	0.441	15.384	0.506
MiPNeRF360	0.446	17.104	0.575
SparseGS	0.489	16.689	0.484
DCSplat	0.501	17.800	0.410

图5 采用点云生成网络的对比图((a)采用点云生成网络前的稀疏点云(5 568个点)；(b)采用点云生成网络后的稠密点云(21 136个点))

Fig. 5 Comparison images using point cloud generation network ((a) Sparse point cloud (with 5 568 points) before using the point cloud generation network; (b) Dense point cloud (with 21 136 points) after using the point cloud gneration network)

图6 在LLFF数据集上，不同输入视图下的性能

Fig. 6 Performance under different input views on the LLFF dataset

表4 各模块对性能的影响

Table 4 The impact of each module on performance

点云生成网络	深度约束采样	随机结构性损失	SSIM↑	PSNR↑	LPIPS↓
√			0.594	17.693	0.376
√	√		0.657	18.798	0.315
√	√	√	0.678	19.054	0.302

参考文献 37

[1]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.
[2]	周婧怡, 张栖桐, 冯结青. 基于混合结构的多视图三维场景重建[J]. 图学学报, 2024, 45(1): 199-208. DOI
	ZHOU J Y, ZHANG Q T, FENG J Q. Hybrid-structure based multi-view 3D scene reconstruction[J]. Journal of Graphics, 2024, 45(1): 199-208 (in Chinese). DOI
[3]	GAO K, GAO Y N, HE H J, et al. NeRF:neural radiance field in 3D vision, a comprehensive review[EB/OL]. [2023-11-30]https://arxiv.org/abs/2210.00379.
[4]	TEWARI A, THIES J, MILDENHALL B, et al. Advances in neural rendering[J]. Computer Graphics Forum, 2022, 41(2): 703-735.
[5]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[6]	SOMRAJ N, SOUNDARARAJAN R. ViP-NeRF: visibility prior for sparse input neural radiance fields[C]// ACM SIGGRAPH 2023 Conference Proceedings. New York: ACM, 2023: 71.
[7]	WANG G C, CHEN Z X, LOY C C, et al. SparseNeRF: distilling depth ranking for few-shot novel view synthesis[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 9031-9042.
[8]	CHEN A P, XU Z X, ZHAO F Q, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14104-14113.
[9]	CHIBANE J, BANSAL A, LAZOVA V, et al. Stereo radiance fields (SRF): learning view synthesis for sparse views of novel scenes[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7907-7916.
[10]	LIU Y, PENG S D, LIU L J, et al. Neural rays for occlusion- aware image-based rendering[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 7814-7823.
[11]	TREVITHICK A, YANG B. GRF: learning a general radiance field for 3D representation and rendering[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 15162-15172.
[12]	YU A, YE V, TANCIK M, et al. pixelNeRF: neural radiance fields from one or few images[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4576-4585.
[13]	DENG K L, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12872-12881.
[14]	ROESSLE B, BARRON J T, MILDENHALL B, et al. Dense depth priors for neural radiance fields from sparse input views[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12882-12891.
[15]	NIEMEYER M, BARRON J T, MILDENHALL B, et al. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5470-5480.
[16]	KIM M, SEO S, HAN B. InfoNeRF: ray entropy minimization for few-shot neural volume rendering[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12902-12911.
[17]	KERBL B, KOPANAS G, LEIMKUEHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics (TOG), 2023, 42(4): 139.
[18]	XIONG H L, MUTTUKURU S, UPADHYAY R, et al. SparseGS:real-time 360° sparse view synthesis using Gaussian splatting[EB/OL]. [2023-11-30]https://arxiv.org/abs/2312.00206
[19]	FAN Z W, CONG W Y, WEN K R, et al. InstantSplat:unbounded sparse-view pose-free Gaussian splatting in 40 seconds[EB/OL]. [2024-06-30]https://arxiv.org/html/2403.20309v1
[20]	PALIWAL A, YE W, XIONG J H, et al. CoherentGS: sparse novel view synthesis with coherent 3D Gaussians[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 19-37.
[21]	CHEN Y D, XU H F, ZHENG C X, et al. MVSplat: efficient 3D Gaussian splatting from sparse multi-view images[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 370-386.
[22]	LI J H, ZHANG J W, BAI X, et al. DNGaussian: optimizing sparse-view 3D Gaussian radiance fields with global-local depth normalization[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20775-20785.
[23]	刘高屹, 胡瑞珍, 刘利刚. 基于2D特征蒸馏的3D高斯泼溅语义分割与编辑[J]. 图学学报, 2025, 46(2): 312-321. DOI
	LIU G Y, HU R Z, LIU L G. 3D Gaussian splatting semantic segmentation and editing based on 2D feature distillation[J]. Journal of Graphics, 2025, 46(2): 312-321 (in Chinese). DOI
[24]	CHUNG J, OH J, LEE K M. Depth-regularized optimization for 3D Gaussian splatting in few-shot images[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 811-820.
[25]	ZHU Z H, FAN Z W, JIANG Y F, et al. FSGS: real-time few-shot view synthesis using Gaussian splatting[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 145-163.
[26]	XIONG H L, MUTTUKURU S, UPADHYAY R, et al. SparseGS:real-time 360° sparse view synthesis using Gaussian splatting[EB/OL]. [2024-05-13]https://arxiv.org/abs/2312.00206.
[27]	HUANG S S, ZOU Z X, ZHANG Y C, et al. SC-NeuS: consistent neural surface reconstruction from sparse and noisy views[EB/OL]. [2024-06-17]https://ojs.aaai.org/index.php/AAAI/article/view/28010.
[28]	ZOU Z X, CHENG W H, CAO Y P, et al. Sparse3D:distilling multiview-consistent diffusion for object reconstruction from sparse views[EB/OL]. [2024-06-17]https://ojs.aaai.org/index.php/AAAI/article/view/28626.
[29]	TRUONG P, RAKOTOSAONA M J, MANHARDT F, et al. SPARF: neural radiance fields from sparse and noisy poses[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4190-4200.
[30]	XIAO Y X, XUE N, WU T F, et al. Level-S²fM: structure from motion on neural level set of implicit surfaces[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17205-17214.
[31]	BARRON J T, MILDENHALL B, VERBIN D, et al. Mip-NeRF 360: unbounded anti-aliased neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5460-5469.
[32]	MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics (TOG), 2019, 38(4): 29.
[33]	JENSEN R, DAHL A, VOGIATZIS G, et al. Large scale multi-view stereopsis evaluation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 406-413.
[34]	KE B X, OBUKHOV A, HUANG S Y, et al. Repurposing diffusion-based image generators for monocular depth estimation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 9492-9502.
[35]	XIE Z K, YANG X D, YANG Y J, et al. S3IM: stochastic structural similarity and its unreasonable effectiveness for neural fields[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 17978-17988.
[36]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. DOI PMID
[37]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595.