Journal of Graphics ›› 2026, Vol. 47 ›› Issue (1): 29-38.DOI: 10.11996/JG.j.2095-302X.2026010029
• Image Processing and Computer Vision • Previous Articles Next Articles
PAN Yuxuan1, JIN Rui1, LIU Yu1, ZHANG Lin1,2(
)
Received:2025-04-29
Accepted:2025-06-28
Online:2026-02-28
Published:2026-03-16
Contact:
ZHANG Lin
Supported by:CLC Number:
PAN Yuxuan, JIN Rui, LIU Yu, ZHANG Lin. Generative model based unsupervised multi-view stereo network[J]. Journal of Graphics, 2026, 47(1): 29-38.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2026010029
| 参数 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| N=3 | 0.352 | 0.276 | 0.314 |
| N=4 | 0.338 | 0.256 | 0.297 |
| N=5 | 0.337 | 0.256 | 0.295 |
| N=6 | 0.340 | 0.261 | 0.300 |
| N=7 | 0.357 | 0.284 | 0.321 |
Table 1 Parameter Optimization on DTU dataset
| 参数 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| N=3 | 0.352 | 0.276 | 0.314 |
| N=4 | 0.338 | 0.256 | 0.297 |
| N=5 | 0.337 | 0.256 | 0.295 |
| N=6 | 0.340 | 0.261 | 0.300 |
| N=7 | 0.357 | 0.284 | 0.321 |
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| Colmap[ | 0.400 | 0.664 | 0.532 |
| MVSNet[ | 0.396 | 0.527 | 0.462 |
| M3VSNet[ | 0.636 | 0.531 | 0.583 |
| Unsup-MVS[ | 0.881 | 1.073 | 0.977 |
| RC-MVSNet[ | 0.396 | 0.295 | 0.345 |
| CL-MVSNet[ | 0.375 | 0.283 | 0.329 |
| RA-MVSNet[ | 0.326 | 0.268 | 0.297 |
| CT-MVSNet[ | 0.341 | 0.264 | 0.302 |
| ColNeRF[ | 0.384 | 0.378 | 0.381 |
| 本文方案 | 0.337 | 0.256 | 0.295 |
Table 2 Evaluation metrics on DTU dataset
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| Colmap[ | 0.400 | 0.664 | 0.532 |
| MVSNet[ | 0.396 | 0.527 | 0.462 |
| M3VSNet[ | 0.636 | 0.531 | 0.583 |
| Unsup-MVS[ | 0.881 | 1.073 | 0.977 |
| RC-MVSNet[ | 0.396 | 0.295 | 0.345 |
| CL-MVSNet[ | 0.375 | 0.283 | 0.329 |
| RA-MVSNet[ | 0.326 | 0.268 | 0.297 |
| CT-MVSNet[ | 0.341 | 0.264 | 0.302 |
| ColNeRF[ | 0.384 | 0.378 | 0.381 |
| 本文方案 | 0.337 | 0.256 | 0.295 |
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| LP | 0.432 | 0.349 | 0.391 |
| LP+LFV | 0.391 | 0.285 | 0.338 |
| L and LNeRF | 0.337 | 0.256 | 0.295 |
Table 3 Ablation study on DTU dataset
| 方案 | 准确率 | 完整性 | 总体指标 |
|---|---|---|---|
| LP | 0.432 | 0.349 | 0.391 |
| LP+LFV | 0.391 | 0.285 | 0.338 |
| L and LNeRF | 0.337 | 0.256 | 0.295 |
| 方案 | Lighthouse | Panther | Train |
|---|---|---|---|
| Colmap[ | 56.43 | 46.97 | 42.04 |
| MVSNet[ | 50.79 | 50.86 | 34.69 |
| M3VSNet[ | 44.42 | 44.95 | 30.31 |
| Unsup-MVS[ | 42.03 | 44.00 | 36.45 |
| RC-MVSNet[ | 53.49 | 52.30 | 49.37 |
| CL-MVSNet[ | 60.02 | 59.97 | 52.28 |
| RA-MVSNet[ | 64.78 | 65.60 | 58.08 |
| CT-MVSNet[ | 62.60 | 64.83 | 58.68 |
| ColNeRF[ | 60.23 | 59.46 | 52.57 |
| 本文方案 | |||
| D=128 | 61.17 | 61.20 | 53.14 |
| D=160 | 64.97 | 65.90 | 58.74 |
| D=192 | 64.89 | 65.85 | 58.71 |
Table 4 Evaluation metrics on Tanks and Temples dataset
| 方案 | Lighthouse | Panther | Train |
|---|---|---|---|
| Colmap[ | 56.43 | 46.97 | 42.04 |
| MVSNet[ | 50.79 | 50.86 | 34.69 |
| M3VSNet[ | 44.42 | 44.95 | 30.31 |
| Unsup-MVS[ | 42.03 | 44.00 | 36.45 |
| RC-MVSNet[ | 53.49 | 52.30 | 49.37 |
| CL-MVSNet[ | 60.02 | 59.97 | 52.28 |
| RA-MVSNet[ | 64.78 | 65.60 | 58.08 |
| CT-MVSNet[ | 62.60 | 64.83 | 58.68 |
| ColNeRF[ | 60.23 | 59.46 | 52.57 |
| 本文方案 | |||
| D=128 | 61.17 | 61.20 | 53.14 |
| D=160 | 64.97 | 65.90 | 58.74 |
| D=192 | 64.89 | 65.85 | 58.71 |
Fig. 5 Point cloud results on NERULN dataset ((a) Proposed system (Full); (b) Ablation proposed system (L only); (c) Co1NeRF; (d) RC-MVSNet; (e) M3VSNet)
| 方案 | 点数量 | 面数量 | 重建 误差/px | 处理 时间/s | 模型 规模/MB |
|---|---|---|---|---|---|
| M3VSNet[ | 519 291 | 103 854 | 0.284 | 354.9 | 6320 |
| RC-MVSNet[ | 504 690 | 100 574 | 0.189 | 294.5 | 9189 |
| ColNeRF[ | 538 372 | 119 578 | 0.204 | 341.5 | 5964 |
| 本文-L only | 530 249 | 118 988 | 0.202 | 286.7 | 5970 |
| 本文-Full | 560 560 | 130 609 | 0.168 | 321.3 | 8672 |
Table 5 Evaluation metrics on NERULN dataset
| 方案 | 点数量 | 面数量 | 重建 误差/px | 处理 时间/s | 模型 规模/MB |
|---|---|---|---|---|---|
| M3VSNet[ | 519 291 | 103 854 | 0.284 | 354.9 | 6320 |
| RC-MVSNet[ | 504 690 | 100 574 | 0.189 | 294.5 | 9189 |
| ColNeRF[ | 538 372 | 119 578 | 0.204 | 341.5 | 5964 |
| 本文-L only | 530 249 | 118 988 | 0.202 | 286.7 | 5970 |
| 本文-Full | 560 560 | 130 609 | 0.168 | 321.3 | 8672 |
| [1] |
LEE L H, BRAUD T, ZHOU P Y, et al. All one needs to know about metaverse: a complete survey on technological singularity, virtual ecosystem, and research agenda[J]. Foundations and Trends® in Human-Computer Interaction, 2024, 18(2/3): 100-337.
DOI URL |
| [2] | YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]// The 15th European Conference on Computer Vision - ECCV 2018. Cham: Springer, 2018: 785-801. |
| [3] | HUANG B C, YI H W, HUANG C, et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C]// 2021 IEEE International Conference on Image Processing. New York: IEEE Press, 2021: 3163-3167. |
| [4] | LI J L, LU Z D, WANG Y Q, et al. DS-MVSNet: unsupervised multi-view stereo via depth synthesis[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 5593-5601. |
| [5] | XIONG K Q, PENG R, ZHANG Z, et al. CL-MVSNet: unsupervised multi-view stereo with dual-level contrastive learning[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3746-3757. |
| [6] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 405-421. |
| [7] |
王道累, 丁子健, 杨君, 等. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.
DOI |
|
WANG D L, DING Z J, YANG J, et al. Large scene reconstruction method based on voxel grid feature of NeRF[J]. Journal of Graphics, 2025, 46(3): 502-509 (in Chinese).
DOI |
|
| [8] |
ZAWISH M, DHAREJO F A, KHOWAJA S A, et al. AI and 6G into the Metaverse: fundamentals, challenges and future research trends[J]. IEEE Open Journal of the Communications Society, 2024, 5: 730-778.
DOI URL |
| [9] |
刘鑫, 李洋, 冯胜杰, 等. 面向RGB-D数据的特征线提取和表示算法[J]. 图学学报, 2025, 46(3): 542-550.
DOI |
|
LIU X, LI Y, FENG S J, et al. Line extraction and representation algorithm for RGB-D data[J]. Journal of Graphics, 2025, 46(3): 542-550 (in Chinese).
DOI |
|
| [10] | SCHÖNBERGER J L, ZHENG E L, FRAHM J M, et al. Pixelwise view selection for unstructured multi-view stereo[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 501-518. |
| [11] |
HEEP M, ZELL E. ShadowPatch: shadow based segmentation for reliable depth discontinuities in photometric stereo[J]. Computer Graphics Forum, 2022, 41(7): 635-646.
DOI URL |
| [12] | LIANG J, WANG R J, PENG R, et al. High fidelity aggregated planar prior assisted PatchMatch multi-view stereo[C]// The 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 3141-3150. |
| [13] | TANG J Y, CAI Y G, GAO X S, et al. Generalized sampling of non-local textural clues multi-view stereo framework[C]// The 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 11222-11225. |
| [14] |
XU H B, CHEN W T, SUN B G, et al. RobustMVS: single domain generalized deep multi-view stereo[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(10): 9181-9194.
DOI URL |
| [15] |
ZHU J, PENG B, LIU B Z, et al. Self-constructing stereo correspondences for unsupervised multi-view stereo[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(11): 10732-10742.
DOI URL |
| [16] | JIANG J F, CAO M F, YI J, et al. DI-MVS: learning efficient multi-view stereo with depth-aware iterations[C]// 2024 IEEE International Conference on Acoustics, Speech and Signal Processing New York: IEEE Press, 2024: 3180-3184. |
| [17] | KHOT T, AGRAWAL S, TULSIANI S, et al. Learning unsupervised multi-view stereopsis via robust photometric consistency[EB/OL]. (2019-06-06)[2025-01-27]. https://arxiv.org/abs/1905.02706. |
| [18] | RENDLE G, KRESKOWSKI A, FROEHLICH B. Volumetric avatar reconstruction with spatio-temporally offset RGBD cameras[C]// 2023 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2023: 72-82. |
| [19] | WANG S C, JIANG H, XIANG L. CT-MVSNet: efficient multi-view stereo with cross-scale transformer[C]// The 30th International Conference on Multimedia Modeling. Cham: Springer, 2024: 394-408. |
| [20] | CHANG D, BOŽIČ A, ZHANG T, et al. RC-MVSNet: unsupervised multi-view stereo with neural rendering[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 665-680. |
| [21] | DENG K L, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12872-12881. |
| [22] | TOSI F, TONIONI A, DE GREGORIO D, et al. Nerf-supervised deep stereo[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 855-866. |
| [23] | SANTO H, OKURA F, MATSUSHITA Y. MVCPS-NeuS: multi-view constrained photometric stereo for neural surface reconstruction[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20475-20484. |
| [24] |
ZHU D X, KONG H R, QIU Q, et al. Multi-view stereo network based on attention mechanism and neural volume rendering[J]. Electronics, 2023, 12(22): 4603.
DOI URL |
| [25] |
WEI Y, LIU S H, ZHOU J, et al. Depth-guided optimization of neural radiance fields for indoor multi-view stereo[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10835-10849.
DOI URL |
| [26] |
ITO S, MIURA K, ITO K, et al. Neural radiance field-inspired depth map refinement for accurate multi-view stereo[J]. Journal of Imaging, 2024, 10(3): 68.
DOI URL |
| [27] | ZHU H X, CHEN Z B. CMC: few-shot novel view synthesis via cross-view multiplane consistency[C]// 2024 IEEE Conference Virtual Reality and 3D User Interfaces. New York: IEEE Press, 2024: 960-968. |
| [28] | SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113. |
| [29] | CAO T S, KREIS K, FIDLER S, et al. TexFusion: synthesizing 3D textures with text-guided image diffusion models[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4146-4158. |
| [30] |
AANÆS H, JENSEN R R, VOGIATZIS G, et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168.
DOI URL |
| [31] | ZHANG Y S, ZHU J K, LIN L X. Multi-view stereo representation revist: region-aware MVSNet[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17376-17385. |
| [32] | KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 2017, 36(4): 78. |
| [33] | PAN Y X, LIU Y, ZHANG L. LiTrix: a lightweight live light field video scheme for metaverse stereoscopic applications[J]. IEEE Internet of Things Magazine, 2023, 6(2): 137-142. |
| [1] | REN Hao, LI Shaobo, GONG Mao, WANG Bo. Neural radiation field reconstruction based on feature point-guided interference identification [J]. Journal of Graphics, 2026, 47(1): 111-119. |
| [2] | WANG Daolei, DING Zijian, YANG Jun, ZHENG Shaokai, ZHU Rui, ZHAO Wenbin. Large scene reconstruction method based on voxel grid feature of NeRF [J]. Journal of Graphics, 2025, 46(3): 502-509. |
| [3] | HUANG Zhiyong, SHE Yali, HUA Xifeng, XIANG Mengli, YANG Chenlong, DING Tuojun. DCSplat: Gaussian splatting with depth information constraints under sparse viewpoints [J]. Journal of Graphics, 2025, 46(3): 510-519. |
| [4] | HU Yue, SUN Zhida, HUANG Hui. Visual analysis system for UAV path planning [J]. Journal of Graphics, 2025, 46(3): 655-665. |
| [5] | SUN Heyi, LI Yixiao, TIAN Xi, ZHANG Songhai. Image to 3D vase generation technology combining procedural content generation and diffusion models [J]. Journal of Graphics, 2025, 46(2): 332-344. |
| [6] | ZHOU Wei, CANG Minnan, CHENG Haozong. Research on the method of 3D image reconstruction for cultural relics based on AR technology [J]. Journal of Graphics, 2025, 46(2): 369-381. |
| [7] | QIU Jiaxin, SONG Qianyun, XU Dan. A neural radiation field-based approach to ethnic dance reconstruction [J]. Journal of Graphics, 2025, 46(2): 415-424. |
| [8] | XIONG Chao, WANG Yunyan, LUO Yuhao. Multi-view stereo network reconstruction with feature alignment and context-guided [J]. Journal of Graphics, 2024, 45(5): 1008-1016. |
| [9] | HUANG Jiahui, MU Taijiang. A survey of dynamic 3D scene reconstruction [J]. Journal of Graphics, 2024, 45(1): 14-25. |
| [10] | SHI Min, WANG Bingqi, LI Zhaoxin, ZHU Dengming. A seamless texture mapping method with highlight processing [J]. Journal of Graphics, 2024, 45(1): 148-158. |
| [11] | ZHOU Jingyi, ZHANG Qitong, FENG Jieqing. Hybrid-structure based multi-view 3D scene reconstruction [J]. Journal of Graphics, 2024, 45(1): 199-208. |
| [12] | WANG Jiang’an, HUANG Le, PANG Dawei, QIN Linzhen, LIANG Wenqian. Dense point cloud reconstruction network based on adaptive aggregation recurrent recursion [J]. Journal of Graphics, 2024, 45(1): 230-239. |
| [13] | CHENG Huan, WANG Shuo, LI Meng, QIN Lun-ming, ZHAO Fang. A review of neural radiance field for autonomous driving scene [J]. Journal of Graphics, 2023, 44(6): 1091-1103. |
| [14] | XUE Hao-wei, WANG Mei-li. Hand reconstruction incorporating biomechanical constraints and multi-modal data [J]. Journal of Graphics, 2023, 44(4): 794-800. |
| [15] | WANG Jiang-an, PANG Da-wei, HUANG Le, QING Lin-zhen. Dense point cloud reconstruction network using multi-scale feature recursive convolution [J]. Journal of Graphics, 2022, 43(5): 875-883. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||