欢迎访问《图学学报》 分享到:

图学学报 ›› 2026, Vol. 47 ›› Issue (1): 111-119.DOI: 10.11996/JG.j.2095-302X.2026010111

• 图像处理与计算机视觉 • 上一篇    下一篇

基于特征点引导干扰物识别的神经辐射场重建

任皓, 李少波(), 弓茂, 王博   

  1. 内蒙古科技大学自动化与电气学院内蒙古 包头 014010
  • 收稿日期:2025-05-30 接受日期:2025-09-08 出版日期:2026-02-28 发布日期:2026-03-16
  • 通讯作者:李少波,E-mail:12965874@qq.com
  • 基金资助:
    内蒙古自然科学基金(2022LHMS06002)

Neural radiation field reconstruction based on feature point-guided interference identification

REN Hao, LI Shaobo(), GONG Mao, WANG Bo   

  1. School of Automation and Electrical Engineering, Inner Mongolia University of Science and Technology, Baotou Inner Mongolia 014010, China
  • Received:2025-05-30 Accepted:2025-09-08 Published:2026-02-28 Online:2026-03-16
  • Supported by:
    Inner Mongolia Natural Science Foundation(2022LHMS06002)

摘要:

针对神经辐射场(NeRF)在干扰物体影响下难以实现高质量三维重建的问题,提出一种基于运动恢复结构(SfM)与多视图立体匹配(SAM)模型协同优化的方法。以SfM重建过程中的SIFT算法为基础,利用动态场景中的几何不一致性进行特征点的识别与匹配,将未匹配的特征点视为动态干扰物,进而引导可以接受点引导分割的SAM模型实现动态遮挡物分割,生成静态场景掩膜。基于分割结果,使用掩码感知体积渲染技术预测颜色,并建立四重损失函数:重建损失、结构一致性损失、对抗损失和自监督修补损失。通过联合优化目标的方式约束被修补区域的颜色输出,经多次迭代训练后,实现多视角下被遮挡区域的几何结构与外观的一致性修复,保证辐射场完整性的同时,实现遮挡物的消除。经公开动态场景数据验证表明,利用掩膜体积渲染和联合优化后的重建效果相较于基线模型和主流遮挡物消除方法峰值信噪比(PSNR)平均提升了5.24 dB,感知图像相似度(LPIPS)降低35%,该方法为复杂动态环境下的三维重建提供了新范式。

关键词: 神经辐射场, 三维重建, 动态场景, 遮挡物消除, 计算机视觉

Abstract:

To address the challenge of achieving high-quality 3D reconstruction with Neural Radiation Fields (NeRF) under the influence of occluding objects, a method based on the collaborative optimization of Structure-from-Motion (SfM) and the Segment Anything Model (SAM) was propose. Building upon the Scale-Invariant Feature Transform (SIFT) algorithm within the SfM reconstruction process, geometric inconsistencies in dynamic scenes were leveraged for feature point identification and matching. Unmatched feature points were treated as dynamic occluders, guiding the SAM model—capable of point-guided segmentation—to perform dynamic occluder segmentation and generate a static scene mask. Based on the segmentation results, mask-aware volumetric rendering was used to predict colors and a quadruple loss function was established: comprising reconstruction loss, structural consistency loss, adversarial loss, and self-supervised patching loss. These objectives were jointly optimized to constrain the color output in patched regions. After iterative training, consistent restoration of geometric structure and appearance in occluded areas across multiple viewpoints was achieved. The radiometric integrity was preserved while occlusions were removed. Validation on public dynamic scene datasets demonstrated that the mask-based volumetric rendering combined with joint optimization produced an average Peak Signal-to-Noise Ratio (PSNR) improvement of 5.24 dB over baseline models and mainstream occlusion removal methods, alongside a 35% reduction in Learned Perceptual Image Patch Similarity (LPIPS). This approach established a new paradigm for 3D reconstruction in complex dynamic environments.

Key words: neural radiation field, 3D reconstruction, dynamic scene, occlusion removal, computer vision

中图分类号: