Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2026, Vol. 47 ›› Issue (1): 29-38.DOI: 10.11996/JG.j.2095-302X.2026010029

• Image Processing and Computer Vision • Previous Articles     Next Articles

Generative model based unsupervised multi-view stereo network

PAN Yuxuan1, JIN Rui1, LIU Yu1, ZHANG Lin1,2()   

  1. 1 School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2 Beijing Big Data Center, Beijing 100086, China
  • Received:2025-04-29 Accepted:2025-06-28 Online:2026-02-28 Published:2026-03-16
  • Contact: ZHANG Lin
  • Supported by:
    National Key Research and Development Program of China(2023YFB2704500);Beijing Natural Science Foundation(4222033)

Abstract:

Existing research on multi-view stereo scheme utilizes depth-estimation algorithms to achieve stereo representation by establishing a mapping relationship between the physical and digital worlds. Supervised learning-based neural networks have achieved accurate and high-fidelity 3D reconstruction results through training. However, in-the-wild visual reconstruction remains challenging due to the lack of rendered depth priors and wide-baseline characteristics of images. A novel system was proposed to obtain optimized depth for naturally collected multi-view images without prior information by applying an unsupervised learning network and semantically optimized Neural Radiation Field (NeRF) rendering. First, preliminary depth information for wild multi-view images were produced without ground truth based on unsupervised deep learning. Subsequently, in a separate NeRF module, a diffusion model was used to construct a surface semantic rendering loss, enabling a fine-grained volumetric representation. Experimental results on the benchmark dataset validated the performance of the proposed system by improving an average of 24.6% of the overall metrics, compared with other state-of-the-art schemes. A novel wild wide-baseline dataset was also applied to verify the generalization performance, and the proposed system reduced the reconstruction error by up to 40.8% compared with all methods.

Key words: unsupervised deep learning, multi-view stereo, 3D reconstruction, neural radiation field, depth optimization

CLC Number: