图学学报 ›› 2026, Vol. 47 ›› Issue (1): 29-38.DOI: 10.11996/JG.j.2095-302X.2026010029
收稿日期:2025-04-29
接受日期:2025-06-28
出版日期:2026-02-28
发布日期:2026-03-16
通讯作者:张琳,E-mail:zhanglin@bupt.edu.cn基金资助:
PAN Yuxuan1, JIN Rui1, LIU Yu1, ZHANG Lin1,2(
)
Received:2025-04-29
Accepted:2025-06-28
Published:2026-02-28
Online:2026-03-16
Supported by:摘要:
现有的多视点立体视觉研究利用深度估计算法,通过建立物理世界与数字世界的映射关系来实现立体表征。基于有监督学习的神经网络算法通过训练能够取得准确且高保真的三维重建结果。然而,由于缺乏深度先验信息且图像具备大视场的特性,面向自然场景的视觉重建仍然具有挑战性。研究应用无监督学习网络和基于语义优化的神经辐射场(NeRF)渲染,在没有先验信息的情况下实现对自然采集的多视点图像的深度估计。首先通过无监督学习无参考地生成多视点图像初步的深度信息,进一步在独立的NeRF模型中,利用扩散模型建立表面语义渲染损失来实现细粒度的三维表征。在基准数据集上的实验结果表明,该方法与其他最先进的方案相比整体重建的指标平均提高了24.6%;在宽基线数据集的泛化性能验证中,该方法将现有方法测得的重建误差最多降低了40.8%。
中图分类号:
潘宇轩, 金锐, 刘雨, 张琳. 基于生成模型的无监督多视点立体视觉网络[J]. 图学学报, 2026, 47(1): 29-38.
PAN Yuxuan, JIN Rui, LIU Yu, ZHANG Lin. Generative model based unsupervised multi-view stereo network[J]. Journal of Graphics, 2026, 47(1): 29-38.
| 参数 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| N=3 | 0.352 | 0.276 | 0.314 |
| N=4 | 0.338 | 0.256 | 0.297 |
| N=5 | 0.337 | 0.256 | 0.295 |
| N=6 | 0.340 | 0.261 | 0.300 |
| N=7 | 0.357 | 0.284 | 0.321 |
表1 DTU数据集上的参数优化实验
Table 1 Parameter Optimization on DTU dataset
| 参数 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| N=3 | 0.352 | 0.276 | 0.314 |
| N=4 | 0.338 | 0.256 | 0.297 |
| N=5 | 0.337 | 0.256 | 0.295 |
| N=6 | 0.340 | 0.261 | 0.300 |
| N=7 | 0.357 | 0.284 | 0.321 |
图2 DTU数据集上的点云重建结果((a) 原图;(b) 深度结果;(c) 重建结果)
Fig. 2 Point cloud results on DTU dataset ((a) Original data; (b) Depth estimation result; (c) Reconstruction result)
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| Colmap[ | 0.400 | 0.664 | 0.532 |
| MVSNet[ | 0.396 | 0.527 | 0.462 |
| M3VSNet[ | 0.636 | 0.531 | 0.583 |
| Unsup-MVS[ | 0.881 | 1.073 | 0.977 |
| RC-MVSNet[ | 0.396 | 0.295 | 0.345 |
| CL-MVSNet[ | 0.375 | 0.283 | 0.329 |
| RA-MVSNet[ | 0.326 | 0.268 | 0.297 |
| CT-MVSNet[ | 0.341 | 0.264 | 0.302 |
| ColNeRF[ | 0.384 | 0.378 | 0.381 |
| 本文方案 | 0.337 | 0.256 | 0.295 |
表2 DTU数据集上的重建性能
Table 2 Evaluation metrics on DTU dataset
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| Colmap[ | 0.400 | 0.664 | 0.532 |
| MVSNet[ | 0.396 | 0.527 | 0.462 |
| M3VSNet[ | 0.636 | 0.531 | 0.583 |
| Unsup-MVS[ | 0.881 | 1.073 | 0.977 |
| RC-MVSNet[ | 0.396 | 0.295 | 0.345 |
| CL-MVSNet[ | 0.375 | 0.283 | 0.329 |
| RA-MVSNet[ | 0.326 | 0.268 | 0.297 |
| CT-MVSNet[ | 0.341 | 0.264 | 0.302 |
| ColNeRF[ | 0.384 | 0.378 | 0.381 |
| 本文方案 | 0.337 | 0.256 | 0.295 |
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| LP | 0.432 | 0.349 | 0.391 |
| LP+LFV | 0.391 | 0.285 | 0.338 |
| L and LNeRF | 0.337 | 0.256 | 0.295 |
表3 DTU数据集上的消融实验
Table 3 Ablation study on DTU dataset
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| LP | 0.432 | 0.349 | 0.391 |
| LP+LFV | 0.391 | 0.285 | 0.338 |
| L and LNeRF | 0.337 | 0.256 | 0.295 |
图3 DTU数据集上的消融实验可视化结果((a) Lp;(b) Lp+ LFV;(c) L和LNeRF;(d) 数据集参考结果)
Fig. 3 Visual results of ablation study on DTU dataset ((a) Lp; (b) Lp+ LFV; (c) L and LNeRF; (d) Dataset baseline result)
| 方案 | Lighthouse | Panther | Train |
|---|---|---|---|
| Colmap[ | 56.43 | 46.97 | 42.04 |
| MVSNet[ | 50.79 | 50.86 | 34.69 |
| M3VSNet[ | 44.42 | 44.95 | 30.31 |
| Unsup-MVS[ | 42.03 | 44.00 | 36.45 |
| RC-MVSNet[ | 53.49 | 52.30 | 49.37 |
| CL-MVSNet[ | 60.02 | 59.97 | 52.28 |
| RA-MVSNet[ | 64.78 | 65.60 | 58.08 |
| CT-MVSNet[ | 62.60 | 64.83 | 58.68 |
| ColNeRF[ | 60.23 | 59.46 | 52.57 |
| 本文方案 | |||
| D=128 | 61.17 | 61.20 | 53.14 |
| D=160 | 64.97 | 65.90 | 58.74 |
| D=192 | 64.89 | 65.85 | 58.71 |
表4 Tanks and Temples数据集上的重建性能
Table 4 Evaluation metrics on Tanks and Temples dataset
| 方案 | Lighthouse | Panther | Train |
|---|---|---|---|
| Colmap[ | 56.43 | 46.97 | 42.04 |
| MVSNet[ | 50.79 | 50.86 | 34.69 |
| M3VSNet[ | 44.42 | 44.95 | 30.31 |
| Unsup-MVS[ | 42.03 | 44.00 | 36.45 |
| RC-MVSNet[ | 53.49 | 52.30 | 49.37 |
| CL-MVSNet[ | 60.02 | 59.97 | 52.28 |
| RA-MVSNet[ | 64.78 | 65.60 | 58.08 |
| CT-MVSNet[ | 62.60 | 64.83 | 58.68 |
| ColNeRF[ | 60.23 | 59.46 | 52.57 |
| 本文方案 | |||
| D=128 | 61.17 | 61.20 | 53.14 |
| D=160 | 64.97 | 65.90 | 58.74 |
| D=192 | 64.89 | 65.85 | 58.71 |
图5 NERULN数据集上的点云重建结果((a) 本文方案(完整架构);(b) 本文消融方案(只有无监督MVS网络);(c) Co1NeRF;(d) RC-MVSNet;(e) M3VSNet)
Fig. 5 Point cloud results on NERULN dataset ((a) Proposed system (Full); (b) Ablation proposed system (L only); (c) Co1NeRF; (d) RC-MVSNet; (e) M3VSNet)
| 方案 | 点数量 | 面数量 | 重建 误差/px | 处理 时间/s | 模型 规模/MB |
|---|---|---|---|---|---|
| M3VSNet[ | 519 291 | 103 854 | 0.284 | 354.9 | 6320 |
| RC-MVSNet[ | 504 690 | 100 574 | 0.189 | 294.5 | 9189 |
| ColNeRF[ | 538 372 | 119 578 | 0.204 | 341.5 | 5964 |
| 本文-L only | 530 249 | 118 988 | 0.202 | 286.7 | 5970 |
| 本文-Full | 560 560 | 130 609 | 0.168 | 321.3 | 8672 |
表5 NERULN数据集上的重建性能
Table 5 Evaluation metrics on NERULN dataset
| 方案 | 点数量 | 面数量 | 重建 误差/px | 处理 时间/s | 模型 规模/MB |
|---|---|---|---|---|---|
| M3VSNet[ | 519 291 | 103 854 | 0.284 | 354.9 | 6320 |
| RC-MVSNet[ | 504 690 | 100 574 | 0.189 | 294.5 | 9189 |
| ColNeRF[ | 538 372 | 119 578 | 0.204 | 341.5 | 5964 |
| 本文-L only | 530 249 | 118 988 | 0.202 | 286.7 | 5970 |
| 本文-Full | 560 560 | 130 609 | 0.168 | 321.3 | 8672 |
| [1] |
LEE L H, BRAUD T, ZHOU P Y, et al. All one needs to know about metaverse: a complete survey on technological singularity, virtual ecosystem, and research agenda[J]. Foundations and Trends® in Human-Computer Interaction, 2024, 18(2/3): 100-337.
DOI URL |
| [2] | YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]// The 15th European Conference on Computer Vision - ECCV 2018. Cham: Springer, 2018: 785-801. |
| [3] | HUANG B C, YI H W, HUANG C, et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C]// 2021 IEEE International Conference on Image Processing. New York: IEEE Press, 2021: 3163-3167. |
| [4] | LI J L, LU Z D, WANG Y Q, et al. DS-MVSNet: unsupervised multi-view stereo via depth synthesis[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 5593-5601. |
| [5] | XIONG K Q, PENG R, ZHANG Z, et al. CL-MVSNet: unsupervised multi-view stereo with dual-level contrastive learning[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3746-3757. |
| [6] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 405-421. |
| [7] |
王道累, 丁子健, 杨君, 等. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.
DOI |
|
WANG D L, DING Z J, YANG J, et al. Large scene reconstruction method based on voxel grid feature of NeRF[J]. Journal of Graphics, 2025, 46(3): 502-509 (in Chinese).
DOI |
|
| [8] |
ZAWISH M, DHAREJO F A, KHOWAJA S A, et al. AI and 6G into the Metaverse: fundamentals, challenges and future research trends[J]. IEEE Open Journal of the Communications Society, 2024, 5: 730-778.
DOI URL |
| [9] |
刘鑫, 李洋, 冯胜杰, 等. 面向RGB-D数据的特征线提取和表示算法[J]. 图学学报, 2025, 46(3): 542-550.
DOI |
|
LIU X, LI Y, FENG S J, et al. Line extraction and representation algorithm for RGB-D data[J]. Journal of Graphics, 2025, 46(3): 542-550 (in Chinese).
DOI |
|
| [10] | SCHÖNBERGER J L, ZHENG E L, FRAHM J M, et al. Pixelwise view selection for unstructured multi-view stereo[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 501-518. |
| [11] |
HEEP M, ZELL E. ShadowPatch: shadow based segmentation for reliable depth discontinuities in photometric stereo[J]. Computer Graphics Forum, 2022, 41(7): 635-646.
DOI URL |
| [12] | LIANG J, WANG R J, PENG R, et al. High fidelity aggregated planar prior assisted PatchMatch multi-view stereo[C]// The 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 3141-3150. |
| [13] | TANG J Y, CAI Y G, GAO X S, et al. Generalized sampling of non-local textural clues multi-view stereo framework[C]// The 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 11222-11225. |
| [14] |
XU H B, CHEN W T, SUN B G, et al. RobustMVS: single domain generalized deep multi-view stereo[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(10): 9181-9194.
DOI URL |
| [15] |
ZHU J, PENG B, LIU B Z, et al. Self-constructing stereo correspondences for unsupervised multi-view stereo[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(11): 10732-10742.
DOI URL |
| [16] | JIANG J F, CAO M F, YI J, et al. DI-MVS: learning efficient multi-view stereo with depth-aware iterations[C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing New York: IEEE Press, 2024: 3180-3184. |
| [17] | KHOT T, AGRAWAL S, TULSIANI S, et al. Learning unsupervised multi-view stereopsis via robust photometric consistency[EB/OL]. (2019-06-06)[2025-01-27]. https://arxiv.org/abs/1905.02706. |
| [18] | RENDLE G, KRESKOWSKI A, FROEHLICH B. Volumetric avatar reconstruction with spatio-temporally offset RGBD cameras[C]// 2023 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2023: 72-82. |
| [19] | WANG S C, JIANG H, XIANG L. CT-MVSNet: efficient multi-view stereo with cross-scale transformer[C]// The 30th International Conference on Multimedia Modeling. Cham: Springer, 2024: 394-408. |
| [20] | CHANG D, BOŽIČ A, ZHANG T, et al. RC-MVSNet: unsupervised multi-view stereo with neural rendering[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 665-680. |
| [21] | DENG K L, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12872-12881. |
| [22] | TOSI F, TONIONI A, DE GREGORIO D, et al. Nerf-supervised deep stereo[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 855-866. |
| [23] | SANTO H, OKURA F, MATSUSHITA Y. MVCPS-NeuS: multi-view constrained photometric stereo for neural surface reconstruction[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20475-20484. |
| [24] |
ZHU D X, KONG H R, QIU Q, et al. Multi-view stereo network based on attention mechanism and neural volume rendering[J]. Electronics, 2023, 12(22): 4603.
DOI URL |
| [25] |
WEI Y, LIU S H, ZHOU J, et al. Depth-guided optimization of neural radiance fields for indoor multi-view stereo[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10835-10849.
DOI URL |
| [26] |
ITO S, MIURA K, ITO K, et al. Neural radiance field-inspired depth map refinement for accurate multi-view stereo[J]. Journal of Imaging, 2024, 10(3): 68.
DOI URL |
| [27] | ZHU H X, CHEN Z B. CMC: few-shot novel view synthesis via cross-view multiplane consistency[C]// 2024 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2024: 960-968. |
| [28] | SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113. |
| [29] | CAO T S, KREIS K, FIDLER S, et al. TexFusion: synthesizing 3D textures with text-guided image diffusion models[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4146-4158. |
| [30] |
AANÆS H, JENSEN R R, VOGIATZIS G, et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168.
DOI URL |
| [31] | ZHANG Y S, ZHU J K, LIN L X. Multi-view stereo representation revist: region-aware MVSNet[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17376-17385. |
| [32] | KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78. |
| [33] | PAN Y X, LIU Y, ZHANG L. LiTrix: a lightweight live light field video scheme for metaverse stereoscopic applications[J]. IEEE Internet of Things Magazine, 2023, 6(2): 137-142. |
| [1] | 任皓, 李少波, 弓茂, 王博. 基于特征点引导干扰物识别的神经辐射场重建[J]. 图学学报, 2026, 47(1): 111-119. |
| [2] | 黄敬, 时瑞浩, 宋文明, 郭和攀, 魏璜, 魏小松, 姚剑. 自动驾驶图像合成方法综述:从模拟器到新范式[J]. 图学学报, 2025, 46(5): 931-949. |
| [3] | 王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509. |
| [4] | 黄志勇, 佘雅丽, 华喜锋, 向梦丽, 杨晨龙, 丁妥君. DCSplat:一种深度约束的稀疏视角三维重建方法[J]. 图学学报, 2025, 46(3): 510-519. |
| [5] | 周峥, 戴亚桥, 易任娇, 蓝龙, 朱晨阳. 基于RGB特征的下一个最优视图导航技术[J]. 图学学报, 2025, 46(3): 551-557. |
| [6] | 胡悦, 孙智达, 黄惠. 面向无人机路径规划的可视分析系统[J]. 图学学报, 2025, 46(3): 655-665. |
| [7] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
| [8] | 周伟, 苍慜楠, 程浩宗. 基于AR技术的文物数字化三维图像重建方法[J]. 图学学报, 2025, 46(2): 369-381. |
| [9] | 邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424. |
| [10] | 谢文想, 许威威. 辐射场表面物点引导的主动视图选择[J]. 图学学报, 2025, 46(1): 179-187. |
| [11] | 熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016. |
| [12] | 董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. |
| [13] | 王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. |
| [14] | 黄家晖, 穆太江. 动态三维场景重建研究综述[J]. 图学学报, 2024, 45(1): 14-25. |
| [15] | 石敏, 王炳祺, 李兆歆, 朱登明. 一种带高光处理的无缝纹理映射方法[J]. 图学学报, 2024, 45(1): 148-158. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||