基于体素网格特征的NeRF大场景重建方法

doi:10.11996/JG.j.2095-302X.2025030502

摘要/Abstract

摘要：

针对神经辐射场(NeRF)在大场景下的渲染模糊、细节缺失等问题，提出了一种以体素网格特征为指导，并驱动光线采样的适用于大场景的渲染方法，可以有效提升三维模型的精度，对大场景重建尤为重要，可用于建筑设计、城市规划等多种应用场景。首先，对于重建的场景进行网格化处理，根据场景大小分配场景边界并细化体素单元。其次，对体素包含的信息进行张量分解，并提取网格化后的场景特征，NeRF将根据提取的特征进行侧重采样。最后，将采样结果传入神经网络，MLP渲染器将特征转换为色彩和密度信息，并合成各种新视角下的视图渲染结果。实验通过多种数据集验证，实验结果表明，与其他方法相比，该方法PSNR和SSIM分别平均提高了11%和12%左右，LPIPS则平均降低了15%左右，且视觉效果有明显提升。

关键词: 神经辐射场, 大场景, 三维重建, 深度学习, 图像渲染

Abstract:

To address the problems of blurred rendering and missing details problems in neural radiation fields for large scenes, a rendering method suitable for large scenes was proposed that was guided by voxel mesh features and driven by ray sampling. This method can effectively enhance the accuracy of 3D models, which was particularly crucial for large-scale scene reconstruction and can be applicable to various scenarios such as architectural design and urban planning. Firstly, grid processing was performed on the reconstructed scene by allocating scene boundaries based on scene size and refining voxel units. Secondly, tensor decomposition was conducted on the information contained in the voxels, and gridded scene features were extracted. Neural radiance fields then focused on sampling based on the extracted features. Finally, the sampling results were fed into a neural network, and a Multilayer Perceptron renderer converted the features into color and density information, synthesizing view rendering results from various new perspectives. Multiple datasets were used for validation in the experiment. The experimental results demonstrated that, compared with other methods, the proposed approach achieved an average improvement of approximately 11% in PSNR, an average increase of about 12% in SSIM, and an average reduction of around 15% in LPIPS, with significantly enhanced visual effects.

Key words: neural radiation fields, large scene, 3D reconstruction, deep learning, image rendering

中图分类号:

TP391

王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.

WANG Daolei, DING Zijian, YANG Jun, ZHENG Shaokai, ZHU Rui, ZHAO Wenbin. Large scene reconstruction method based on voxel grid feature of NeRF[J]. Journal of Graphics, 2025, 46(3): 502-509.

图/表 11

图1 VoxelNeRF流程图

Fig. 1 VoxelNeRF flow diagram

图2 相机位姿示意图((a)公开数据集；(b)自制数据集)

Fig. 2 Camera pose figure ((a) Open data sets; (b) Self-made data sets)

图3 占用网格预测模型

Fig. 3 Occupancy grid prediction model

图4 网络结构对比图((a) NeRF网络架构；(b) VoxelNeRF网络架构)

Fig. 4 Network structure comparison figure ((a) NeRF network structure; (b) VoxelNeRF network structure)

图5 MatrixCity公开数据集

Fig. 5 MatrixCity public dataset

图6 自制数据集

Fig. 6 Custom dataset

图7 VoxelNeRF在不同数据集中的重建效果((a)~(b)公开数据集；(c)~(d)自制数据集)

Fig. 7 The reconstruction effect of VoxelNeRF on different dataset ((a)~(b) Public data sets; (c)~(d) Self-made data sets)

图8 VoxelNeRF与其他方法视觉效果对比

Fig. 8 Comparison of visual effects between VoxelNeRF and other methods ((a) Ground truth; (b) MipNeRF; (c) TensoRF; (d) Instant-NGP; (e) Ours)

表1 VoxelNeRF与其他方法评价指标对比

Table 1 Comparison of evaluation indicators between VoxelNeRF and other methods

Method	PSNR↑	SSIM↑	LPIPS(VGG) ↓	LPIPS(Alex) ↓	Size	Time
NeRF	23.15	0.56	0.65	0.64	8 M	>36 h
MipNeRF	24.64	0.69	0.55	0.53	5 M	>25 h
TensoRF	25.96	0.72	0.46	0.49	412 M	18 h
Instant-NGP	27.21	0.79	0.38	0.36	15.9 G	10 min
Ours	28.17	0.76	0.37	0.33	2.3 G	6 h

表2 消融实验性能对比

Table 2 Comparison of ablation experiment performance

Model	PSNR/db	SSIM	LPIPS	Time/h
NeRF	16.54	0.52	0.65	8
VoxelFeature+NeRF	22.53	0.67	0.53	3
VoxelFeature+NeRF+Occupancy	24.07	0.69	0.50	3

图9 消融实验视觉效果对比((a)真值；(b) NeRF；(c)网格特征和NeRF；(d)网格特征加占用网格预测和NeRF)

Fig. 9 Visual comparison of ablation experiments ((a) Ground truth; (b) NeRF; (c) Voxel feature and NeRF; (d) Voxel feature and Occupancy prediction and NeRF)

参考文献 16

[1]	马汉声, 祝玉华, 李智慧, 等. 神经辐射场多视图合成技术综述[J]. 计算机工程与应用, 2024, 60(4): 21-38. DOI
	MA H S, ZHU Y H, LI Z H, et al. Survey of neural radiance fields for multi-view synthesis technologies[J]. Computer Engineering and Applications, 2024, 60(4): 21-38 (in Chinese). DOI
[2]	董相涛, 马鑫, 潘成伟, 等. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. DOI
	DONG X T, MA X, PAN C W, et al. A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649 (in Chinese). DOI
[3]	范腾, 杨浩, 尹稳, 等. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148. DOI
	FAN T, YANG H, YIN W, et al. Multi-scale view synthesis based on neural radiance field[J]. Journal of Graphics, 2023, 44(6): 1140-1148 (in Chinese). DOI
[4]	LIU L J, GU J T, LIN K Z, et al. Neural sparse voxel fields[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 1313.
[5]	TANCIK M, CASSER V, YAN X C, et al. Block-NeRF: scalable large scene neural view synthesis[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8238-8248.
[6]	TURKI H, RAMANAN D, SATYANARAYANAN M. Mega-NeRF: scalable construction of large-scale NeRFs for virtual fly-throughs[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12912-12921.
[7]	CHEN A P, XU Z X, GEIGER A, et al. TensoRF: tensorial radiance fields[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 333-350.
[8]	XU L N, XIANGLI Y B, PENG S D, et al. Grid-guided neural radiance fields for large urban scenes[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8296-8306.
[9]	KERBL B, KOPANAS G, LEIMKÜEHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139.
[10]	LIN J Q, LI Z H, TANG X, et al. VastGaussian: vast 3D Gaussians for large scene reconstruction[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 5166-5175.
[11]	LIU Y, LUO C C, FAN L, et al. CityGaussian: real-time high-quality large-scale scene rendering with Gaussians[C]// The 18th European Conference on Computer Vision. Cham: Springer, 2025: 265-282.
[12]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[13]	BARRON J T, MILDENHALL B, VERBIN D, et al. Mip-NeRF 360: unbounded anti-aliased neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5460-5469.
[14]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 102.
[15]	LIU Z, ZHANG F. BALM: Bundle adjustment for lidar mapping[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 3184-3191.
[16]	LI Y X, JIANG L H, XU L N, et al. MatrixCity: a large-scale city dataset for city-scale neural rendering and beyond[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 3182-3192.