基于神经辐射场的多尺度视图合成研究

doi:10.11996/JG.j.2095-302X.2023061140

图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1140-1148.DOI: 10.11996/JG.j.2095-302X.2023061140

• 图像处理与计算机视觉 • 上一篇下一篇

基于神经辐射场的多尺度视图合成研究

范腾(), 杨浩, 尹稳, 周冬明()

云南大学信息学院，云南昆明 650500

收稿日期:2023-06-27 接受日期:2023-09-12 出版日期:2023-12-31 发布日期:2023-12-17
通讯作者: 周冬明(1963-)，男，教授，博士。主要研究方向为基于深度学习的图像处理、基于机器学习的生物信息处理和计算机视觉等。E-mail：zhoudm@ynu.edu.cn
作者简介:
范腾(1995-)，男，硕士研究生。主要研究方向为计算机图形学、基于深度学习的图像处理。Email：fanteng@mail.ynu.edu.cn
基金资助:
云南大学科研创新基金项目

Multi-scale view synthesis based on neural radiance field

FAN Teng(), YANG Hao, YIN Wen, ZHOU Dong-ming()

School of Information Science & Engineering, Yunnan University, Kunming Yunnan 650500, China

Received:2023-06-27 Accepted:2023-09-12 Online:2023-12-31 Published:2023-12-17
Contact: ZHOU Dong-ming (1963-), professor, Ph.D. His main research interests cover image processing based on deep learning, biological information processing based on machine learning and compute vision, etc. E-mail：zhoudm@ynu.edu.cn
About author:
FAN Teng (1995-), master student. His main research interests cover computer graphics, image processing based on deep learning.
E-mail：fanteng@mail.ynu.edu.cn
Supported by:
Research and Innovation Foundation of Yunnan University

摘要/Abstract

摘要：

针对神经辐射场(NeRF)在多尺度的视图合成任务中产生模糊和锯齿的问题，提出一种融合不同尺度的视图特征和视点特征作为先验提高合成目标视图质量的多尺度神经辐射场(MS-NeRF)。首先，对于不同尺度的目标视图，利用多级小波卷积神经网络提取目标视图特征，将视图特征作为先验对网络合成目标场景视图进行监督。其次，扩大视点相机发出的光线在目标视图像素点上的采样面积，避免在每个像素上只对单束光线进行采样导致渲染结果产生模糊和锯齿。最后，在训练时加入不同尺度的视图特征和视点特征，提升网络合成不同尺度视图的泛化能力，并利用渐进式结构的深度神经网络拟合视图特征和视点特征到目标视图的映射关系。实验结果表明，与相关方法相比，MS-NeRF减少了训练成本，提升了合成目标视图的视觉效果。

关键词: 神经辐射场, 多尺度视图合成, 新视角视图合成, 深度神经网络, 小波变换

Abstract:

To address the problem of blurring and jaggedness in neural radiance fields (NeRF) for multi-scale view synthesis tasks, we proposed multi-scale neural radiance fields (MS-NeRF). This learning framework enhanced the quality of synthesized target views by incorporating view features and viewpoint features of different scales. First, for target views at different scales, a multi-level wavelet convolutional neural network was employed to extract target view features. Additionally, view features served as priors to supervise network in synthesizing target scene views. Second, the sampling region of the light from the viewpoint camera was enlarged at the pixel points in the target view, thus preventing blurred and jagged rendering results caused by sampling only a single ray per pixel. Finally, through training with view features and viewpoint features at different scales, the deep neural network with a progressive structure learned the mapping relationship between view features and viewpoint features to the target view, enhancing the robustness of the network to synthesize views at different scales. Experimental results demonstrated that MS-NeRF could reduce training costs and improve the visual effect of synthesized target views compared to existing methods.

Key words: neural radiance fields, multi-scale view synthesis, novel view synthesis, deep neural network, wavelet transform

中图分类号:

TP391

范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148.

FAN Teng, YANG Hao, YIN Wen, ZHOU Dong-ming. Multi-scale view synthesis based on neural radiance field[J]. Journal of Graphics, 2023, 44(6): 1140-1148.

图/表 10

图1 Transamerica Pyramid数据集

Fig. 1 Transamerica Pyramid datasets ((a) StageⅠ; (b) Stage Ⅱ; (c) Stage Ⅲ; (d) Stage Ⅳ)

图2 MS-NeRF网络结构

Fig. 2 Network structure of MS-NeRF

图3 Transamerica Pyramid数据集合成效果((a)示例1；(b)示例2)

Fig. 3 Results on Transamerica Pyramid datasets ((a) Example 1; (b) Example 2)

图4 MS-NeRF与MipNeRF[13]，BungeeNeRF[14]和MS-NeRF方法合成效果对比((a)~(c)近距离视图；(d)~(e)远距离视图)

Fig. 4 Image quality comparisons between MipNeRF[13], BungeeNeRF[14] and MS-NeRF ((a)~(c) Close-up view; (d)~(e) Remote view)

表1 MS-NeRF与MipNeRF[13]，BungeeNeRF[14]和MS-NeRF方法评价指标对比

Table 1 Evaluation metrics comparisons between NeRF[5], MipNeRF[13], BungeeNeRF[14] and MS-NeRF

方法	Transamerica (PSNR↑)				Transamerica (Avg)
方法	Stage Ⅰ	Stage Ⅱ	Stage Ⅲ	Stage Ⅳ	PSNR↑	SSIM↑	LPIPS↓
NeRF^[5]	22.71	22.81	22.97	21.58	22.64	0.69	0.59
MipNeRF^[13]	23.25	23.37	22.70	21.56	20.22	0.46	0.60
BungeeNeRF^[14]	23.36	23.37	23.11	23.57	22.61	0.67	0.46
Ours	24.19	23.99	24.18	24.75	23.53	0.74	0.39

图5 Blender Synthetic Ship数据集合成效果

Fig. 5 Results on Blender Synthetic Ship datasets ((a) Effect 1; (b) Effect 2; (c) Effect 3; (d) Effect 4)

表2 Blender Synthetic Ship数据集测量指标

Table 2 Evaluation metrics on Blender Synthetic Ship

网络结构	PSNR(Avg) ↑	SSIM(Avg) ↑	LPIPS(Avg) ↓
NeRF^[5]	28.99	0.86	0.18
MipNeRF^[13]	28.26	0.80	0.20
BungeeNeRF^[14]	28.78	0.84	0.18
MS-NeRF	29.05	0.84	0.18

表3 加入视图特征和残差块的性能对比

Table 3 Evaluation metrics of using view features and residual blocks

网络结构	PSNR↑	SSIM↑	LPIPS↓
NeRF^[5]	11.79	0.58	0.39
NeRF(视图/残差块)	19.84	0.81	0.21
BungeeNeRF^[14]	22.61	0.66	0.45
BungeeNeRF(视图)	23.54	0.74	0.39

表4 残差块数量对合成效果的影响

Table 4 Evaluation metrics of different number of residual blocks

残差块数量	Transamerica (PSNR↑)
残差块数量	StageⅠ	Stage Ⅱ	Stage Ⅲ	Stage Ⅳ
2	22.63	23.49	23.75	24.12
3	23.09	24.00	24.03	24.18
4	23.54	24.19	24.19	24.75

图6 渐进式网络结构合成效果((a)合成效果1；(b)合成效果2)

Fig. 6 Image quality of progressive network structure ((a) Synthesis effect 1; (b) Synthesis effect 2)

参考文献 32

[1]	SHUM H Y, HE L W. Rendering with concentric mosaics[C]// The 26th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1999: 299-306.
[2]	DEBEVEC P, DOWNING G, BOLAS M, et al. Spherical light field environment capture for virtual reality using a motorized pan/tilt head and offset camera[EB/OL]. (2021-01-20) [2023-01-08]. http://dx.doc.org/10.1145/2787626.2787648.
[3]	SZELISKI R, SHUM H Y. Creating full view panoramic image mosaics and environment maps[C]// The 24th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1997: 251-258.
[4]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
	CHANG Y, GAI M. A review on neural radiance fields based view synthesis[J]. Journal of Graphics, 2021, 42(3): 376-384 (in Chinese).
[5]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// European Conference on Computer Vision. Cham: Springer, 2020: 405-421.
[6]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 1-15.
[7]	REISER C, PENG S Y, LIAO Y Y, et al. KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPs[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14315-14325.
[8]	TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer: an all-MLP architecture for vision[EB/OL]. [2023-01-08]. https://arxiv.org/abs/2105.01601.pdf.
[9]	GARBIN S J, KOWALSKI M, JOHNSON M, et al. FastNeRF: high-fidelity neural rendering at 200FPS[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14326-14335.
[10]	LIU L J, GU J T, LIN K Z, et al. Neural sparse voxel fields[EB/OL]. [2023-01-08]. https://arxiv.org/abs/2007.11571.
[11]	YU A, LI R L, TANCIK M, et al. PlenOctrees for real-time rendering of neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 5732-5741.
[12]	FRIDOVICH-KEIL S, YU A, TANCIK M, et al. Plenoxels: radiance fields without neural networks[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5491-5500.
[13]	BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 5835-5844.
[14]	XIANGLI Y B, XU L N, PAN X G, et al. BungeeNeRF: progressive neural radiance field for extreme multi-scale scene rendering[C]// European Conference on Computer Vision. Cham: Springer, 2022: 106-122.
[15]	YU A, YE V, TANCIK M, et al. pixelNeRF: neural radiance fields from one or few images[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4576-4585.
[16]	GUARNERA D, GUARNERA G C, GHOSH A, et al. BRDF representation and acquisition[J]. Computer Graphics Forum, 2016, 35(2): 625-650. DOI URL
[17]	ASMAIL C. Bidirectional scattering distribution function (BSDF): a systematized bibliography[J]. Journal of Research of the National Institute of Standards and Technology, 1991, 96(2): 215-223. DOI PMID
[18]	RIBARDIÈRE M, BRINGIER B, SIMONOT L, et al. Microfacet BSDFs generated from NDFs and explicit microgeometry[J]. ACM Transactions on Graphics, 2019, 38(5): 143. 1-143.15.
[19]	WANG Q Q, WANG Z C, GENOVA K, et al. IBRNet: learning multi-view image-based rendering[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4688-4697.
[20]	CHEN A P, XU Z X, ZHAO F Q, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14104-14113.
[21]	YAO Y, ZIXIN L, SHIWEI L, et al. MVSNet: depth inference for unstructured multi-view stereo[C]// IEEE/CVF Conference on International Conference on Computer Vision. New York: IEEE Press, 2018: 767-783.
[22]	XU D J, JIANG Y F, WANG P H, et al. SinNeRF: training neural radiance fields on complex scenes from a single image[EB/OL]. [2023-01-13]. https://arxiv.org/abs/2204.00928.pdf.
[23]	HUANG B C, YI H W, HUANG C, et al. M3VSNET: unsupervised multi-metric multi-view stereo network[C]// 2021 IEEE International Conference on Image Processing. New York: IEEE Press, 2021: 3163-3167.
[24]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2023-01-08]. https://arxiv.org/abs/2010.11929.pdf.
[25]	XU L N, XIANGLI Y B, PENG S D, et al. Grid-guided neural radiance fields for large urban scenes[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8296-8306.
[26]	BARRON J T, MILDENHALL B, VERBIN D, et al. Zip-NeRF: anti-aliased grid-based neural radiance fields[EB/OL]. (2023-04-13) [2023-05-08]. https://arxiv.org/abs/2304.06706.pdf.
[27]	LIU P J, ZHANG H Z, LIAN W, et al. Multi-level wavelet convolutional neural networks[J]. IEEE Access, 2019, 7: 74973-74985. DOI
[28]	SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958.
[29]	KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2023-01-13]. https://arxiv.org/pdf/1412.6980.pdf.
[30]	HORÉ A, ZIOU D. Image quality metrics:PSNR vs. SSIM[C]// 2010 20th International Conference on Pattern Recognition. New York: IEEE Press, 2010: 2366-2369.
[31]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2004, 13(4): 600-612. DOI URL
[32]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595.

基于神经辐射场的多尺度视图合成研究

Multi-scale view synthesis based on neural radiance field

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王稚儒, 常远, 鲁鹏, 潘成伟 . 神经辐射场加速算法综述 [J]. 图学学报, 2024, 45(1): 1-13.
[2]	成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103.
[3]	夏晓华, 刘希恒, 岳鹏举, 邹易清, 蒋立军. 细节增强的多曝光图像融合方法[J]. 图学学报, 2023, 44(6): 1130-1139.
[4]	蒋武君, 支力佳, 张少敏, 周涛. 基于通道残差嵌套U结构的CT影像肺结节分割方法[J]. 图学学报, 2023, 44(5): 879-889.
[5]	张晨阳, 曹艳华, 杨晓忠. 基于分数阶小波与引导滤波的多聚焦图像融合方法[J]. 图学学报, 2023, 44(1): 77-87.
[6]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
[7]	常东良 , 尹军辉 , 谢吉洋 , 孙维亚 , 马占宇 . 面向图像分类的基于注意力引导的 Dropout[J]. 图学学报, 2021, 42(1): 32-36.
[8]	张胜虎, 马惠敏 . 遮挡对于目标检测的影响分析[J]. 图学学报, 2020, 41(6): 891-896.
[9]	谷昱良，羿旭明 . 基于小波变换的权重自适应图像分割模型[J]. 图学学报, 2020, 41(5): 733-739.
[10]	刘建新 1，曾嫱 1，徐可 2，王亚威 1 . 基于形态学和小波变换的烟叶病斑分割[J]. 图学学报, 2018, 39(5): 933-938.
[11]	纪峰，李泽仁，常霞，吴之亮. 基于PCA 和NSCT 变换的遥感图像融合方法[J]. 图学学报, 2017, 38(2): 247-252.
[12]	刘鑫，车翔玖，林森乔. 基于倾角校正的地震层位追踪算法[J]. 图学学报, 2015, 36(3): 418-424.
[13]	崔汉国，刘健鑫，李彬. 基于DD-DT CWT 和SIFT 的体数据数字水印算法[J]. 图学学报, 2015, 36(2): 148-151.
[14]	朱晓临，李雪艳，邢燕，陈嫚，朱园珠. 基于小波和奇异值分解的图像边缘检测[J]. 图学学报, 2014, 35(4): 563-570.
[15]	刘晙，茹庆云. 基于快速小波变换的高适应性图像检索技术研究[J]. 图学学报, 2014, 35(2): 262-267.