神经辐射场加速算法综述

doi:10.11996/JG.j.2095-302X.2024010001

图学学报 ›› 2024, Vol. 45 ›› Issue (1): 1-13.DOI: 10.11996/JG.j.2095-302X.2024010001

神经辐射场加速算法综述

王稚儒¹(), 常远², 鲁鹏³, 潘成伟¹()

1.北京航空航天大学人工智能研究院，北京 100191
2.中国电信股份有限公司研究院，北京 102209
3.北京邮电大学人工智能学院，北京 100876

收稿日期:2023-09-26 接受日期:2023-12-11 出版日期:2024-02-29 发布日期:2024-02-29
通讯作者:潘成伟(1989-)，男，副教授，博士。主要研究方向为计算机图形学、计算机视觉等。E-mail：pancw@buaa.edu.cn
第一作者:王稚儒(2001-)，男，硕士研究生。主要研究方向为计算机图形学与深度学习。E-mail：19241085@buaa.edu.cn
基金资助:
新一代人工智能国家科技重大专项(2022ZD0116401)

A review on neural radiance fields acceleration

WANG Zhiru¹(), CHANG Yuan², LU Peng³, PAN Chengwei¹()

1. Institute of Artificial Intelligence, Beihang University, Beijing 100191, China
2. China Telecom Research Institute, Beijing 102209, China
3. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2023-09-26 Accepted:2023-12-11 Published:2024-02-29 Online:2024-02-29
First author：WANG Zhiru (2001-), master student. His main research interests cover computer graphics and deep learning. E-mail：19241085@buaa.edu.cn
Supported by:
National Science and Technology Major Project(2022ZD0116401)

摘要/Abstract

摘要：

近年来，神经辐射场(NeRF)已成为计算机图形学和计算机视觉领域中一个重要的研究方向，因其高逼真的视觉合成效果，在真实感渲染、虚拟现实、人体建模、城市地图等领域得到了广泛的应用。NeRF利用神经网络从输入图片集中学习三维场景的隐式表征，并合成高逼真的新视角图像。然而原始NeRF模型的训练和推理速度都很慢，难以在真实环境下部署与应用。针对NeRF的加速问题，研究者们从场景建模方法、光线采样策略等方面展开对NeRF进行提速的研究。该类工作大致可分为以下研究方向：烘焙模型、与离散表示方法结合、提高采样效率、利用哈希编码降低MLP网络复杂度、引入场景泛化性、引入深度监督信息和分解方法。通过介绍NeRF模型提出的背景，对上述思路的代表方法的优势与特点进行了讨论和分析，最后总结了NeRF相关工作在加速方面所取得的进展和对于未来的展望。

北京航空航天大学潘成伟副教授及其学生王稚儒等回顾了国内外近年来神经辐射场加速方面的有关研究，并总结归类了以下加速思路：烘焙模型、与离散表示方法结合、提高采样效率、利用哈希编码降低MLP网络复杂度、引入场景泛化性、引入深度监督信息和分解方法。然后对上述各思路的代表方法进行了介绍和分析，同时比较和讨论了各方法的优势与特点。最后对当前神经辐射场在加速方面取得的进展进行总结并对未来的发展进行展望，为该领域更深入的探索提供参考。

关键词: 神经辐射场, 视点合成, 神经渲染, NeRF加速, 深度学习

Abstract:

Neural radiance field (NeRF) has become an important research area in computer graphics and computer vision in recent years. Due to its highly realistic visual synthesis effects, NeRF has been widely used in photorealistic rendering, virtual reality, human body modeling, urban mapping, and other domains. NeRF employs neural networks to learn implicit representations of 3D scenes from input image sets and to synthesize highly realistic novel view images. However, the training and inference speeds of the primitive NeRF model are very slow, posing challenges for real-condition deployment and application. To address the acceleration problem of NeRF, researchers have studied the acceleration of NeRF from the aspects of scene modeling methods and ray sampling strategies. Those works can be categorized into the following research directions: baking model, integrating models with discrete representation methods, enhancing sampling efficiency, using hash coding to reduce the complexity of MLP network, introducing scene generalization, and introducing deep supervision information and field decomposition methods. After introducing the background of the NeRF model, the advantages and characteristics of the representative methods of the above ideas were discussed and analyzed. Finally, the progress made in the acceleration of NeRF-related work and future prospects were summarized.

Key words: neural radiance field, view synthesis, neural rendering, NeRF acceleration, deep learning

中图分类号:

TP391

王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.

WANG Zhiru, CHANG Yuan, LU Peng, PAN Chengwei. A review on neural radiance fields acceleration[J]. Journal of Graphics, 2024, 45(1): 1-13.

图/表 20

图1 NeRF算法流程[10]

Fig. 1 Pipeline of NeRF[10]

图2 SNeRG算法流程[19]

Fig. 2 Pipeline of SNeRG[19]

图3 NeRF[10]，FastNeRF[21]和SqueezeNeRF[22]模型对比

Fig. 3 Model comparisons of NeRF[10], FastNeRF[21] and SqueezeNeRF[22]

图4 神经点云表示及体渲染[24]

Fig. 4 PointNeRF Representation with Volume rendering[24]

图5 PlenOctrees的转换过程[25]

Fig. 5 Conversion of PlenOctrees[25]

图6 NeX算法流程[27]

Fig. 6 Pipeline of NeX[27]

图7 不同的采样方式：(a)均匀采样；(b)重要区域采样；(c)稀疏体素采样

Fig. 7 Different sampling approaches ((a) Uniform sampling; (b) Importance sampling; (c) Sampling approach based on sparse voxels)

图8 EfficientNeRF在训练和测试过程中的采样策略[32]

Fig. 8 EfficientNeRF's sampling strategy during training and testing[32]

图9 2D多分辨率哈希编码示意图[33]

Fig. 9 Illustration of the multiresolution hash coding in 2D[33]

图10 MVSNeRF算法流程[34]

Fig. 10 Pipeline of MVSNeRF[34]

图11 NeRFusion算法流程[37]

Fig. 11 Pipeline of NeRFusion[37]

图12 X-NeRF的模型架构[41]

Fig. 12 Architecture of X-NeRF model[41]

图13 DS-NeRF算法流程[42]

Fig. 13 Pipeline of DS-NeRF[42]

图14 NerfingMVS算法流程[43]

Fig. 14 Pipeline of NerfingMVS[43]

图15 TensoRF分解[44]

Fig. 15 Decomposition of TensoRF[44]

图16 KiloNeRF[45]与NeRF[10]的对比

Fig. 16 Comparison of KiloNeRF[45]with NeRF[10] ((a) NeRF[10]; (b) KiloNeRF[45])

图17 HexPlane六平面分解方法[46]

Fig. 17 Six feature plane decomposition method of HexPlane[46]

图18 Tensor4D三投影分解示意图[50]

Fig. 18 Illustration of tri-projection decomposition method in Tensor4D[50]

图19 动态MLP maps算法流程[51]

Fig. 19 Pipeline of dynamic MLP maps[51]

表1 文中提到的一些NeRF模型在NeRF synthetic数据集上的比较

Table 1 Comparison of some of the NeRF models mentioned in the paper on the NeRF synthetic dataset

Method	PSNR↑ /dB	SSIM↑	LPIPS↓	训练轮数/k	训练时长	推理速度
Baseline NeRF^[10]	31.01	0.947	0.081	100~300	>12 h	1
SNeRG^[19]	30.38	0.950	0.050	250	>12 h	~9 000
PlenOctree^[25]	31.71	0.958	0.053	2000	>12 h	~3 000
NSVF^[31]	31.74	0.953	0.047	100~150	-	~10
FastNeRF^[21]	29.97	0.941	0.053	300	>12 h	~4 000
Plenoxels^[23]	31.71	0.958	0.049	128	~20 min	45
Instant-NGP^[33]	33.18	-	-	256	~5 min	-
MVSNeRF^[34]	27.07	0.931	0.163	10	~15 min	~1
DS-NeRF^[42]	24.90	0.72	0.34	150~200	-	~1
TensoRF^[44]	33.14	0.963	-	30	17 min	~100
KiloNeRF^[45]	31.00	0.95	0.03	1 750	>12 h	~2 000
3D-Gaussian^[29]	33.32	-	-	30	1 h	~550

参考文献 52

[1]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.
[2]	SEITZ S M, CURLESS B, DIEBEL J, et al. A comparison and evaluation of multi-view stereo reconstruction algorithms[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2006: 519-528.
[3]	LOMBARDI S, SIMON T, SARAGIH J, et al. Neural volumes: learning dynamic renderable volumes from images[EB/OL]. [2023-08-27]. http://arxiv.org/abs/1906.07751.pdf.
[4]	NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3501-3512.
[5]	GENOVA K, COLE F, SUD A, et al. Local deep implicit functions for 3D shape[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4856-4865.
[6]	PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 165-174.
[7]	CHEN W Z, GAO J, LING H, et al. Learning to predict 3D objects with an interpolation-based differentiable renderer[EB/OL]. [2023-08-27]. http://arxiv.org/abs/1908.01210.pdf.
[8]	CHEN W Z, GAO J, LING H, et al. Learning to predict 3D objects with an interpolation-based differentiable renderer[EB/OL]. [2023-08-27]. http://arxiv.org/abs/1908.01210.pdf.
[9]	LOPER M M, BLACK M J. OpenDR: an approximate differentiable renderer[M]// FLEET D, PAJDLA T, SCHIELE B, et al., Eds. Computer Vision - ECCV 2014. Cham: Springer International Publishing, 2014: 154-169.
[10]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// European Conference on Computer Vision. Cham: Springer, 2020: 405-421.
[11]	CORONA-FIGUEROA A, FRAWLEY J, TAYLOR S B, et al. MedNeRF: medical neural radiance fields for reconstructing 3D-aware CT-projections from a single X-ray[C]// 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society. New York: IEEE Press, 2022: 3843-3848.
[12]	ZHAO F Q, YANG W, ZHANG J K, et al. HumanNeRF: efficiently generated human radiance field from sparse inputs[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 7733-7743.
[13]	ZHANG J K, LIU X H, YE X Y, et al. Editable free-viewpoint video using a layered neural representation[J]. ACM Transactions on Graphics, 40(4): 149:1-149:18.
[14]	ZHU Z H, PENG S Y, LARSSON V, et al. NICE-SLAM: neural implicit scalable encoding for SLAM[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12776-12786.
[15]	LI Z P, LI L, ZHU J K. READ: large-scale neural scene rendering for autonomous driving[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(2): 1522-1529. DOI URL
[16]	TANCIK M, CASSER V, YAN X C, et al. Block-NeRF: scalable large scene neural view synthesis[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8238-8248.
[17]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[18]	ZHANG K, RIEGLER G, SNAVELY N, et al. NeRF++: analyzing and improving neural radiance fields[EB/OL]. [2023-08-27]. http://arxiv.org/abs/2010.07492.pdf.
[19]	HEDMAN P, SRINIVASAN P P, MILDENHALL B, et al. Baking neural radiance fields for real-time view synthesis[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5855-5864.
[20]	REISER C, SZELISKI R, VERBIN D, et al. MERF: memory-efficient radiance fields for real-time view synthesis in unbounded scenes[J]. ACM Transactions on Graphics, 42(4): 89:1-89:12.
[21]	GARBIN S J, KOWALSKI M, JOHNSON M, et al. FastNeRF: high-fidelity neural rendering at 200FPS[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 14326-14335.
[22]	WADHWANI K, KOJIMA T. SqueezeNeRF: further factorized FastNeRF for memory-efficient inference[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2022: 2716-2724.
[23]	FRIDOVICH-KEIL S, YU A, TANCIK M, et al. Plenoxels: radiance fields without neural networks[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5491-5500.
[24]	XU Q G, XU Z X, PHILIP J, et al. Point-NeRF: point-based neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5428-5438.
[25]	YU A, LI R L, TANCIK M, et al. PlenOctrees for real-time rendering of neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5732-5741.
[26]	CHEN Z Q, FUNKHOUSER T, HEDMAN P, et al. MobileNeRF: exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16569-16578.
[27]	WIZADWONGSA S, PHONGTHAWEE P, YENPHRAPHAI J, et al. NeX: real-time view synthesis with neural basis expansion[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8530-8539.
[28]	TUCKER R, SNAVELY N. Single-view view synthesis with multiplane images[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 548-557.
[29]	KERBL B, KOPANAS G, LEIMKUEHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139:1-139:14.
[30]	KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 36(4): 78:1-78:13.
[31]	LIU L J, GU J T, LIN K Z, et al. Neural sparse voxel fields[EB/OL]. [2023-08-27]. https://arxiv.org/abs/2007.11571. .
[32]	HU T, LIU S, CHEN Y L, et al. EfficientNeRF - efficient neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12892-12901.
[33]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 1-15.
[34]	CHEN A P, XU Z X, ZHAO F Q, et al. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 14104-14113.
[35]	YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]// European Conference on Computer Vision. Cham: Springer, 2018: 785-801.
[36]	JENSEN R, DAHL A, VOGIATZIS G, et al. Large scale multi-view stereopsis evaluation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 406-413.
[37]	ZHANG X S, BI S, SUNKAVALLI K, et al. NeRFusion: fusing radiance fields for large-scale scene reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5439-5448.
[38]	DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly-annotated 3D reconstructions of indoor scenes[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2432-2443.
[39]	WANG Q Q, WANG Z C, GENOVA K, et al. IBRNet: learning multi-view image-based rendering[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4688-4697.
[40]	LIN H T, PENG S D, XU Z, et al. Efficient neural radiance fields for interactive free-viewpoint video[C]// SA '22: SIGGRAPH Asia 2022 Conference Papers. New York: ACM, 2022: 1-9.
[41]	ZHU H Y. X-NeRF: explicit neural radiance field for multi-scene 360° insufficient RGB-D views[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2023: 5755-5764.
[42]	DENG K L, LIU A, ZHU J Y, et al. Depth-supervised NeRF: fewer views and faster training for free[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12872-12881.
[43]	WEI Y, LIU S H, RAO Y M, et al. NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5590-5599.
[44]	CHEN A P, XU Z X, GEIGER A, et al. TensoRF: tensorial radiance fields[C]// European Conference on Computer Vision. Cham: Springer, 2022: 333-350.
[45]	REISER C, PENG S Y, LIAO Y Y, et al. KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPs[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 14315-14325.
[46]	CAO A, JOHNSON J. HexPlane: a fast representation for dynamic scenes[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 130-141.
[47]	PUMAROLA A, CORONA E, PONS-MOLL G, et al. D-NeRF: neural radiance fields for dynamic scenes[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 10313-10322.
[48]	FRIDOVICH-KEIL S, MEANTI G, WARBURG F R, et al. K-planes: explicit radiance fields in space, time, and appearance[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12479-12488.
[49]	JANG H, KIM D. D-TensoRF: tensorial radiance fields for dynamic scenes[EB/OL]. [2023-08-27]. http://arxiv.org/abs/2212.02375.pdf.
[50]	SHAO R Z, ZHENG Z R, TU H Z, et al. Tensor4D:efficient neural 4D decomposition for high-fidelity dynamic reconstruction and rendering[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16632-16642.
[51]	PENG S D, YAN Y Z, SHUAI Q, et al. Representing volumetric videos as dynamic MLP maps[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4252-4262.
[52]	SHADE J, GORTLER S, HE L W, et al. Layered depth images[C]// The 25th annual conference on Computer graphics and interactive techniques. New York: ACM, 1998: 231-242.

神经辐射场加速算法综述

A review on neural radiance fields acceleration

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 52

相关文章 15

编辑推荐

Metrics

本文评价

[1]	董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649.
[2]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
[3]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[4]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[5]	朱光辉, 缪君, 胡宏利, 申基, 杜荣华. 基于自增强注意力机制的室内单图像分段平面三维重建[J]. 图学学报, 2024, 45(3): 464-471.
[6]	王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77.
[7]	李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络[J]. 图学学报, 2024, 45(1): 78-89.
[8]	韩亚振, 尹梦晓, 马伟钊, 杨诗耕, 胡锦飞, 朱丛洋. DGOA：基于动态图和偏移注意力的点云上采样[J]. 图学学报, 2024, 45(1): 219-229.
[9]	王江安, 黄乐, 庞大为, 秦林珍, 梁温茜. 基于自适应聚合循环递归的稠密点云重建网络[J]. 图学学报, 2024, 45(1): 230-239.
[10]	成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103.
[11]	范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148.
[12]	周锐闯, 田瑾, 闫丰亭, 朱天晓, 张玉金. 融合外部注意力和图卷积的点云分类模型[J]. 图学学报, 2023, 44(6): 1162-1172.
[13]	王吉, 王森, 蒋智文, 谢志峰, 李梦甜. 基于深度条件扩散模型的零样本文本驱动虚拟人生成方法[J]. 图学学报, 2023, 44(6): 1218-1226.
[14]	杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867.
[15]	党宏社, 许怀彪, 张选德. 融合结构信息的深度学习立体匹配算法[J]. 图学学报, 2023, 44(5): 899-906.