图学学报 ›› 2025, Vol. 46 ›› Issue (3): 510-519.DOI: 10.11996/JG.j.2095-302X.2025030510
黄志勇(), 佘雅丽, 华喜锋, 向梦丽, 杨晨龙, 丁妥君
收稿日期:
2024-08-17
接受日期:
2024-12-27
出版日期:
2025-06-30
发布日期:
2025-06-13
第一作者:
黄志勇(1979-),男,副教授,博士。主要研究方向为计算机视觉、计算机图形学。E-mail:hzy@hzy.org.cn
基金资助:
HUANG Zhiyong(), SHE Yali, HUA Xifeng, XIANG Mengli, YANG Chenlong, DING Tuojun
Received:
2024-08-17
Accepted:
2024-12-27
Published:
2025-06-30
Online:
2025-06-13
First author:
HUANG Zhiyong (1979-), associate professor, Ph.D. His main research interests cover computer vision, computer graphics. E-mail:hzy@hzy.org.cn
Supported by:
摘要:
针对稀疏视角三维重建中的挑战,尤其是高斯椭球数量不足引起的重建孔洞和精度衰减等问题,提出了一种深度约束的3D Gaussian splatting (3DGS)稀疏视角三维重建方法(DCSplat),利用深度约束自适应的补全3DGS初始化时所需的点云,设计了一种随机结构相似性损失,实现了稀疏视角图像的快速高精质量重建。其核心在于利用提出的前馈神经网络来完善SFM过程中产生的稀疏点云。首先,通过预训练的单目深度估计网络从图像中预测深度信息。其次,利用相机参数构建投影矩阵,将稀疏点云投影到图像上,建立点云z值与深度值关联关系,进一步构建和训练图像像素深度值与点云z值映射的深度神经网络,用于优化和补全3DGS所需的点云信息。再次,为克服3DGS逐点优化损失的局限性,引入了一种随机结构相似性损失函数,该函数将对应于像素的多个高斯视为整体来处理,能够全局考虑点云结构,促进更连贯和精确的三维重建。DCSplat在LLFF, DTU和MipNeRF360标准数据集上的测试结果表明,其在关键评价指标上,包括峰值信噪比(PSNR)、结构相似性(SSIM)以及学习感知图像块相似度(LPIPS),均达到甚至超越了现有方法的性能水平,能够有效提升重建质量。此外,该方法基于深度约束完成点云补全,从全局到局部利用深度信息优化重建质量,并在多项指标上表现出良好的性能提升,展现了一定的应用潜力。
中图分类号:
黄志勇, 佘雅丽, 华喜锋, 向梦丽, 杨晨龙, 丁妥君. DCSplat:一种深度约束的稀疏视角三维重建方法[J]. 图学学报, 2025, 46(3): 510-519.
HUANG Zhiyong, SHE Yali, HUA Xifeng, XIANG Mengli, YANG Chenlong, DING Tuojun. DCSplat: Gaussian splatting with depth information constraints under sparse viewpoints[J]. Journal of Graphics, 2025, 46(3): 510-519.
模型 | 3-views | 6-views | 9-views | ||||||
---|---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.447 | 14.975 | 0.428 | 0.620 | 18.888 | 0.300 | 0.680 | 24.072 | 0.258 |
SparseNeRF | 0.613 | 19.311 | 0.341 | 0.743 | 22.980 | 0.262 | 0.784 | 24.183 | 0.236 |
RegNeRF | 0.677 | 19.038 | 0.358 | 0.809 | 23.004 | 0.240 | 0.849 | 24.475 | 0.216 |
DNGaussian | 0.651 | 18.638 | 0.316 | 0.605 | 20.019 | 0.350 | 0.682 | 22.131 | 0.348 |
DCSplat | 0.678 | 19.054 | 0.302 | 0.733 | 22.077 | 0.220 | 0.758 | 23.191 | 0.195 |
表1 不同方法在LLFF数据集上的实验结果
Table 1 Experimental results of different methods on the LLFF dataset
模型 | 3-views | 6-views | 9-views | ||||||
---|---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.447 | 14.975 | 0.428 | 0.620 | 18.888 | 0.300 | 0.680 | 24.072 | 0.258 |
SparseNeRF | 0.613 | 19.311 | 0.341 | 0.743 | 22.980 | 0.262 | 0.784 | 24.183 | 0.236 |
RegNeRF | 0.677 | 19.038 | 0.358 | 0.809 | 23.004 | 0.240 | 0.849 | 24.475 | 0.216 |
DNGaussian | 0.651 | 18.638 | 0.316 | 0.605 | 20.019 | 0.350 | 0.682 | 22.131 | 0.348 |
DCSplat | 0.678 | 19.054 | 0.302 | 0.733 | 22.077 | 0.220 | 0.758 | 23.191 | 0.195 |
图2 LLFF数据集实验结果对比
Fig. 2 Comparison of experimental results on the LLFF dataset ((a) 3DGS; (b) SparseNeRF; (c) RegNeRF;(d) DNGaussian; (e) Ours; (f) GT)
模型 | 3-views | 6-views | 9-views | ||||||
---|---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.467 | 12.800 | 0.482 | 0.543 | 17.535 | 0.346 | 0.550 | 17.975 | 0.324 |
SparseNeRF | 0.448 | 14.249 | 0.479 | 0.525 | 16.561 | 0.391 | 0.495 | 17.895 | 0.435 |
RegNeRF | 0.455 | 12.391 | 0.533 | 0.568 | 16.222 | 0.469 | 0.577 | 17.373 | 0.471 |
DCSplat | 0.474 | 14.549 | 0.440 | 0.544 | 17.223 | 0.361 | 0.556 | 17.775 | 0.326 |
表2 不同方法在DTU数据集上的实验结果
Table 2 Experimental results of different methods on the DTU dataset
模型 | 3-views | 6-views | 9-views | ||||||
---|---|---|---|---|---|---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.467 | 12.800 | 0.482 | 0.543 | 17.535 | 0.346 | 0.550 | 17.975 | 0.324 |
SparseNeRF | 0.448 | 14.249 | 0.479 | 0.525 | 16.561 | 0.391 | 0.495 | 17.895 | 0.435 |
RegNeRF | 0.455 | 12.391 | 0.533 | 0.568 | 16.222 | 0.469 | 0.577 | 17.373 | 0.471 |
DCSplat | 0.474 | 14.549 | 0.440 | 0.544 | 17.223 | 0.361 | 0.556 | 17.775 | 0.326 |
模型 | 12-views | ||
---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.441 | 15.384 | 0.506 |
MiPNeRF360 | 0.446 | 17.104 | 0.575 |
SparseGS | 0.489 | 16.689 | 0.484 |
DCSplat | 0.501 | 17.800 | 0.410 |
表3 不同方法在MipNeRF360数据集上的实验结果
Table 3 Experimental results of different methods on the MipNeRF360 dataset
模型 | 12-views | ||
---|---|---|---|
SSIM↑ | PSNR↑ | LPIPS↓ | |
3DGS | 0.441 | 15.384 | 0.506 |
MiPNeRF360 | 0.446 | 17.104 | 0.575 |
SparseGS | 0.489 | 16.689 | 0.484 |
DCSplat | 0.501 | 17.800 | 0.410 |
图5 采用点云生成网络的对比图((a)采用点云生成网络前的稀疏点云(5 568个点);(b)采用点云生成网络后的稠密点云(21 136个点))
Fig. 5 Comparison images using point cloud generation network ((a) Sparse point cloud (with 5 568 points) before using the point cloud generation network; (b) Dense point cloud (with 21 136 points) after using the point cloud gneration network)
点云生成网络 | 深度约束采样 | 随机结构性损失 | SSIM↑ | PSNR↑ | LPIPS↓ |
---|---|---|---|---|---|
√ | 0.594 | 17.693 | 0.376 | ||
√ | √ | 0.657 | 18.798 | 0.315 | |
√ | √ | √ | 0.678 | 19.054 | 0.302 |
表4 各模块对性能的影响
Table 4 The impact of each module on performance
点云生成网络 | 深度约束采样 | 随机结构性损失 | SSIM↑ | PSNR↑ | LPIPS↓ |
---|---|---|---|---|---|
√ | 0.594 | 17.693 | 0.376 | ||
√ | √ | 0.657 | 18.798 | 0.315 | |
√ | √ | √ | 0.678 | 19.054 | 0.302 |
[1] | SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113. |
[2] |
周婧怡, 张栖桐, 冯结青. 基于混合结构的多视图三维场景重建[J]. 图学学报, 2024, 45(1): 199-208.
DOI |
ZHOU J Y, ZHANG Q T, FENG J Q. Hybrid-structure based multi-view 3D scene reconstruction[J]. Journal of Graphics, 2024, 45(1): 199-208 (in Chinese).
DOI |
|
[3] | GAO K, GAO Y N, HE H J, et al. NeRF:neural radiance field in 3D vision, a comprehensive review[EB/OL]. [2023-11-30]https://arxiv.org/abs/2210.00379. |
[4] | TEWARI A, THIES J, MILDENHALL B, et al. Advances in neural rendering[J]. Computer Graphics Forum, 2022, 41(2): 703-735. |
[5] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106. |
[6] | SOMRAJ N, SOUNDARARAJAN R. ViP-NeRF: visibility prior for sparse input neural radiance fields[C]// ACM SIGGRAPH 2023 Conference Proceedings. New York: ACM, 2023: 71. |
[7] | WANG G C, CHEN Z X, LOY C C, et al. SparseNeRF: distilling depth ranking for few-shot novel view synthesis[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 9031-9042. |
[8] | CHEN A P, XU Z X, ZHAO F Q, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14104-14113. |
[9] | CHIBANE J, BANSAL A, LAZOVA V, et al. Stereo radiance fields (SRF): learning view synthesis for sparse views of novel scenes[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7907-7916. |
[10] | LIU Y, PENG S D, LIU L J, et al. Neural rays for occlusion- aware image-based rendering[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 7814-7823. |
[11] | TREVITHICK A, YANG B. GRF: learning a general radiance field for 3D representation and rendering[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 15162-15172. |
[12] | YU A, YE V, TANCIK M, et al. pixelNeRF: neural radiance fields from one or few images[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4576-4585. |
[13] | DENG K L, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12872-12881. |
[14] | ROESSLE B, BARRON J T, MILDENHALL B, et al. Dense depth priors for neural radiance fields from sparse input views[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12882-12891. |
[15] | NIEMEYER M, BARRON J T, MILDENHALL B, et al. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5470-5480. |
[16] | KIM M, SEO S, HAN B. InfoNeRF: ray entropy minimization for few-shot neural volume rendering[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12902-12911. |
[17] | KERBL B, KOPANAS G, LEIMKUEHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics (TOG), 2023, 42(4): 139. |
[18] | XIONG H L, MUTTUKURU S, UPADHYAY R, et al. SparseGS:real-time 360° sparse view synthesis using Gaussian splatting[EB/OL]. [2023-11-30]https://arxiv.org/abs/2312.00206 |
[19] | FAN Z W, CONG W Y, WEN K R, et al. InstantSplat:unbounded sparse-view pose-free Gaussian splatting in 40 seconds[EB/OL]. [2024-06-30]https://arxiv.org/html/2403.20309v1 |
[20] | PALIWAL A, YE W, XIONG J H, et al. CoherentGS: sparse novel view synthesis with coherent 3D Gaussians[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 19-37. |
[21] | CHEN Y D, XU H F, ZHENG C X, et al. MVSplat: efficient 3D Gaussian splatting from sparse multi-view images[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 370-386. |
[22] | LI J H, ZHANG J W, BAI X, et al. DNGaussian: optimizing sparse-view 3D Gaussian radiance fields with global-local depth normalization[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20775-20785. |
[23] |
刘高屹, 胡瑞珍, 刘利刚. 基于2D特征蒸馏的3D高斯泼溅语义分割与编辑[J]. 图学学报, 2025, 46(2): 312-321.
DOI |
LIU G Y, HU R Z, LIU L G. 3D Gaussian splatting semantic segmentation and editing based on 2D feature distillation[J]. Journal of Graphics, 2025, 46(2): 312-321 (in Chinese).
DOI |
|
[24] | CHUNG J, OH J, LEE K M. Depth-regularized optimization for 3D Gaussian splatting in few-shot images[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 811-820. |
[25] | ZHU Z H, FAN Z W, JIANG Y F, et al. FSGS: real-time few-shot view synthesis using Gaussian splatting[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2024: 145-163. |
[26] | XIONG H L, MUTTUKURU S, UPADHYAY R, et al. SparseGS:real-time 360° sparse view synthesis using Gaussian splatting[EB/OL]. [2024-05-13]https://arxiv.org/abs/2312.00206. |
[27] | HUANG S S, ZOU Z X, ZHANG Y C, et al. SC-NeuS: consistent neural surface reconstruction from sparse and noisy views[EB/OL]. [2024-06-17]https://ojs.aaai.org/index.php/AAAI/article/view/28010. |
[28] | ZOU Z X, CHENG W H, CAO Y P, et al. Sparse3D:distilling multiview-consistent diffusion for object reconstruction from sparse views[EB/OL]. [2024-06-17]https://ojs.aaai.org/index.php/AAAI/article/view/28626. |
[29] | TRUONG P, RAKOTOSAONA M J, MANHARDT F, et al. SPARF: neural radiance fields from sparse and noisy poses[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4190-4200. |
[30] | XIAO Y X, XUE N, WU T F, et al. Level-S2fM: structure from motion on neural level set of implicit surfaces[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17205-17214. |
[31] | BARRON J T, MILDENHALL B, VERBIN D, et al. Mip-NeRF 360: unbounded anti-aliased neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5460-5469. |
[32] | MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics (TOG), 2019, 38(4): 29. |
[33] | JENSEN R, DAHL A, VOGIATZIS G, et al. Large scale multi-view stereopsis evaluation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 406-413. |
[34] | KE B X, OBUKHOV A, HUANG S Y, et al. Repurposing diffusion-based image generators for monocular depth estimation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 9492-9502. |
[35] | XIE Z K, YANG X D, YANG Y J, et al. S3IM: stochastic structural similarity and its unreasonable effectiveness for neural fields[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 17978-17988. |
[36] |
WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
DOI PMID |
[37] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
[1] | 王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509. |
[2] | 胡悦, 孙智达, 黄惠. 面向无人机路径规划的可视分析系统[J]. 图学学报, 2025, 46(3): 655-665. |
[3] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
[4] | 周伟, 苍慜楠, 程浩宗. 基于AR技术的文物数字化三维图像重建方法[J]. 图学学报, 2025, 46(2): 369-381. |
[5] | 邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424. |
[6] | 熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016. |
[7] | 黄家晖, 穆太江. 动态三维场景重建研究综述[J]. 图学学报, 2024, 45(1): 14-25. |
[8] | 石敏, 王炳祺, 李兆歆, 朱登明. 一种带高光处理的无缝纹理映射方法[J]. 图学学报, 2024, 45(1): 148-158. |
[9] | 周婧怡, 张栖桐, 冯结青. 基于混合结构的多视图三维场景重建[J]. 图学学报, 2024, 45(1): 199-208. |
[10] | 王江安, 黄乐, 庞大为, 秦林珍, 梁温茜. 基于自适应聚合循环递归的稠密点云重建网络[J]. 图学学报, 2024, 45(1): 230-239. |
[11] | 成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. |
[12] | 薛皓玮, 王美丽. 融合生物力学约束与多模态数据的手部重建[J]. 图学学报, 2023, 44(4): 794-800. |
[13] | 葛海明, 张维, 王小龙, 朱晶晶, 贾非, 薛亚东. 基于SfM的城市电缆隧道三维重建方法优化研究[J]. 图学学报, 2023, 44(3): 540-550. |
[14] | 王江安, 庞大为, 黄 乐, 秦林珍. 基于多尺度特征递归卷积的稠密点云重建网络 [J]. 图学学报, 2022, 43(5): 875-883. |
[15] | 白静, 孟庆亮, 徐昊, 范有福, 杨瞻源. ST-Rec3D:基于结构和目标感知的三维重建[J]. 图学学报, 2022, 43(3): 469-477. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||