基于RGB特征的下一个最优视图导航技术

doi:10.11996/JG.j.2095-302X.2025030551

图学学报 ›› 2025, Vol. 46 ›› Issue (3): 551-557.DOI: 10.11996/JG.j.2095-302X.2025030551

• 图像处理与计算机视觉 • 上一篇下一篇

基于RGB特征的下一个最优视图导航技术

周峥(), 戴亚桥, 易任娇, 蓝龙, 朱晨阳()

国防科技大学计算机学院，湖南长沙 410000

收稿日期:2024-08-23 接受日期:2025-03-03 出版日期:2025-06-30 发布日期:2025-06-13
通讯作者:朱晨阳(1990-)，男，副教授，博士。主要研究方向为计算机图形学、计算机视觉等。E-mail：zhuchenyang07@nudt.edu.cn
第一作者:周峥(1997-)，男，硕士研究生。主要研究方向为计算机图形学。E-mail：zhouzheng@nudt.edu.cn
基金资助:
国家自然科学基金(62325221);国家自然科学基金(62132021);国家自然科学基金(62372457);中国科学院青年精英科学家资助项目(2023QNRC001);湖南省自然科学基金(2021RC3071);湖南省自然科学基金(2022RC1104);国防科技大学研究资助项目(ZK22-52);高性能计算国家重点实验室基金(2023KJWHPCL02)

The next best view navigation technology based on RGB features

ZHOU Zheng(), DAI Yaqiao, YI Renjiao, LAN Long, ZHU Chenyang()

School of Computer Science, National University of Defense Technology, Changsha Hunan 410000, China

Received:2024-08-23 Accepted:2025-03-03 Published:2025-06-30 Online:2025-06-13
Contact: ZHU Chenyang (1990-), associate professor, Ph.D. His main research interests cover computer graphics, computer vision, etc. E-mail：zhuchenyang07@nudt.edu.cn
First author：ZHOU Zheng (1997-), master student. His main research interest covers computer graphics. E-mail：zhouzheng@nudt.edu.cn
Supported by:
National Natural Science Foundation of China(62325221);National Natural Science Foundation of China(62132021);National Natural Science Foundation of China(62372457);Young Elite Scientists Sponsorship Program by CAST(2023QNRC001);Natural Science Foundation of Hunan Province of China(2021RC3071);Natural Science Foundation of Hunan Province of China(2022RC1104);NUDT Research Grants(ZK22-52);State Key Laboratory of High Performance Computing Foundation(2023KJWHPCL02)

摘要/Abstract

摘要：

神经辐射场(NeRF)在二维图像到三维场景重建领域展现出优异的性能，使用二维图像作为训练数据，能够重建出场景的三维结构，并能进行高质量的新视图渲染。尽管NeRF在三维场景重建领域是十分有效的，但也存在训练速度慢、推理时间长的问题，并且样本质量与三维场景重建质量密切关联。为解决NeRF在低样本质量情况下的高质量三维重建问题，本文使用2组不同哈希编码的NeRF来学习同一个场景，评估候选视图信息增益之间的差距来引导视图采样。提出一种基于RGB特征的下一个最优视图(next best view)导航技术新框架，该框架在稀疏训练数据上具有很强的鲁棒性，能够通过RGB特征评估捕获高信息增益的下一个最优视图，并优化NeRF训练，可以用最少的额外视图来提高新视图合成质量。通过对NeRF训练流程的优化，网络收敛速度提升大约10倍，显存占用降低39.8%，大量实验验证了该模型的有效性和鲁棒性。

关键词: 神经辐射场, 哈希编码, 稀疏重建, 信息增益, 主动学习

Abstract:

Neural radiance field (NeRF) has shown excellent performance in reconstructing 3D scenes from 2D images. Using 2D images as training data, the 3D structure of scenes could be reconstructed and new views could be rendered with high quality. Although NeRF is very effective in reconstructing 3D scenes, issues of slow training speed and long inference time are encountered, and the sample quality is closely related to the quality of 3D scene reconstruction. In order to address the challenge of high-quality 3D reconstruction of NeRF under conditions of low sample quality, two sets of NeRFs with different hash codes were employed to learn the same scene and to evaluate the gap between the information gain of candidate views to guide view sampling. A new framework of Next Best View navigation technology based on RGB features was proposed. This framework exhibited strong robustness with sparse training data, was capable of capturing the next best view with high information gain through RGB feature evaluation, and optimized NeRF training, thereby improving the quality of new view synthesis with a minimal number of additional views. By optimizing the NeRF training process, the network convergence speed was increased by approximately 10 times, and the memory usage was reduced by 39.8%. A large number of experiments have verified the effectiveness and robustness of the proposed model.

Key words: neural radiance field, hash coding, sparse reconstruction, information gain, active learning

中图分类号:

TP391

周峥, 戴亚桥, 易任娇, 蓝龙, 朱晨阳. 基于RGB特征的下一个最优视图导航技术[J]. 图学学报, 2025, 46(3): 551-557.

ZHOU Zheng, DAI Yaqiao, YI Renjiao, LAN Long, ZHU Chenyang. The next best view navigation technology based on RGB features[J]. Journal of Graphics, 2025, 46(3): 551-557.

图/表 6

参考文献 24

[1]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. Nerf: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[2]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.
[3]	GOESELE M, CURLESS B, SEITZ S M. Multi-view stereo revisited[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2006: 2402-2409.
[4]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
	CHANG Y, GAI M. A review on neural radiance fields based view synthesis[J]. Journal of Graphics, 2021, 42(3): 376-384 (in Chinese).
[5]	董相涛, 马鑫, 潘成伟, 等. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. DOI
	DONG X T, MA X, PAN C W, et al. A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649 (in Chinese). DOI
[6]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 1-15.
[7]	BAJCSY R, ALOIMONOS Y, TSOTSOS J K. Revisiting active perception[J]. Autonomous Robots, 2018, 42(2): 177-196. DOI PMID
[8]	LIU M, SHI Y F, ZHENG L T, et al. Recurrent 3D attentional networks for end-to-end active object recognition[J]. Computational Visual Media, 2019, 5(1): 91-104.
[9]	ISLER S, SABZEVARI R, DELMERICO J, et al. An information gain formulation for active volumetric 3D reconstruction[C]// 2016 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2016: 3477-3484.
[10]	BIRCHER A, KAMEL M, ALEXIS K, et al. Receding horizon “next-best-view” planner for 3D exploration[C]// 2016 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2016: 1462-1468.
[11]	ZAENKER T, SMITT C, MCCOOL C, et al. Viewpoint planning for fruit size and position estimation[C]// 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2021: 3271-3277.
[12]	ZENG R, ZHAO W, LIU Y J. PC-NBV: a point cloud based deep network for efficient next best view planning[C]// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2020: 7050-7057.
[13]	SONG S, JO S. Surface-based exploration for autonomous 3D modeling[C]// 2018 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2018: 4319-4326.
[14]	WU Q Y, MANOCHA D, WANG J, et al. NeoNav: improving the generalization of visual navigation via generating next expected observations[C]// The Thirty-Fourth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 10001-10008.
[15]	PAN X R, LAI Z H, SONG S J, et al. ActiveNERF: learning where to see with uncertainty estimation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 230-246.
[16]	JIN L R, CHEN X Y L, RÜCKIN J, et al. NeU-NBV: next best view planning using uncertainty estimation in image-based neural rendering[C]// 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2023: 11305-11312.
[17]	KAJIYA J T, VON HERZEN B P. Ray tracing volume densities[J]. ACM SIGGRAPH Computer Graphics, 1984, 18(3): 165-174.
[18]	SHEN J X, RUIZ A, AGUDO A, et al. Stochastic neural radiance fields: quantifying uncertainty in implicit 3D representations[C]// 2021 International Conference on 3D Vision. New York: IEEE Press, 2021: 972-981.
[19]	MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al. NeRF in the wild: neural radiance fields for unconstrained photo collections[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7210-7219.
[20]	RAN Y L, ZENG J, HE S B, et al. NeurAR: neural uncertainty for autonomous 3D reconstruction with implicit neural representations[J]. IEEE Robotics and Automation Letters, 2023, 8(2): 1125-1132.
[21]	LEE S, CHEN L, WANG J H, et al. Uncertainty guided policy for active robotic 3D reconstruction using neural radiance fields[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 12070-12077.
[22]	ZHAN H Y, ZHENG J Y, XU Y, et al. ActiveRMAP: radiance field for active mapping and planning[EB/OL]. [2024-06-23]https://ar5iv.labs.arxiv.org/html/2211.12656,2022.
[23]	SITZMANN V, ZOLLHÖFER M, WETZSTEIN G. Scene representation networks: continuous 3D-structure-aware neural scene representations[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 101.
[24]	MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics, 2019, 38(4): 1-14.

方法		合成场景
方法		PSNR	SSIM	LIPIS
设定1	SRN^[23]	22.26	0.846	0.170
	LLFF^[24]	24.88	0.911	0.114
	ActiveNeRF^[15]	30.45	0.954	0.072
	Ours	29.65	0.986	0.038
设定2	ActiveNeRF^[15]	28.07	0.931	0.064
设定2	Ours	28.14	0.986	0.050
设定3	ActiveNeRF^[15]	21.77	0.954	0.118
设定3	Ours	23.15	0.972	0.146

方法		合成场景
方法		PSNR	SSIM	LIPIS
设定1	SRN^[23]	22.26	0.846	0.170
	LLFF^[24]	24.88	0.911	0.114
	ActiveNeRF^[15]	30.45	0.954	0.072
	Ours	29.65	0.986	0.038
设定2	ActiveNeRF^[15]	28.07	0.931	0.064
设定2	Ours	28.14	0.986	0.050
设定3	ActiveNeRF^[15]	21.77	0.954	0.118
设定3	Ours	23.15	0.972	0.146

方法	迭代次数/K	训练时间/h	显存占用/GB
ActiveNeRF^[15]	500	26.0	5.45
Ours	50	2.5	3.28

方法	迭代次数/K	训练时间/h	显存占用/GB
ActiveNeRF^[15]	500	26.0	5.45
Ours	50	2.5	3.28

采样策略		合成场景
采样策略		PSNR	SSIM	LIPIS
设定1	随机采样	27.81	0.937	0.047
	最远视图采样	27.51	0.929	0.058
	Ours	28.14	0.986	0.050
设定2	随机采样	14.58	0.908	0.462
	最远视图采样	14.77	0.900	0.404
	Ours	23.15	0.972	0.146

基于RGB特征的下一个最优视图导航技术

The next best view navigation technology based on RGB features

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 24

相关文章 13

编辑推荐

Metrics

本文评价

[1]	王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509.
[2]	邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424.
[3]	谢文想, 许威威. 辐射场表面物点引导的主动视图选择[J]. 图学学报, 2025, 46(1): 179-187.
[4]	董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649.
[5]	林晓, 张秋阳, 郑晓妹, 杨启哲. 基于自监督的主动标签清洗[J]. 图学学报, 2024, 45(3): 495-504.
[6]	王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.
[7]	成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103.
[8]	范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148.
[9]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
[10]	周荣安, 符纯明. 融合式教学模式在机械制图课程中的应用研究[J]. 图学学报, 2020, 41(6): 1039-1043.
[11]	何蕊，高岱，栾英艳. 土木工程制图课程中“主动学习”教学模式实践[J]. 图学学报, 2018, 39(4): 782-785.
[12]	谢志峰1,杜胜 1,郭雨辰 1,黄东晋 1,马利庄 2. 基于字典学习的 HDR 照片风格转移方法[J]. 图学学报, 2017, 38(5): 706-714.
[13]	何蕊，栾英艳，高岱. 基于BIM 人才培养的土木工程课程体系改革研究[J]. 图学学报, 2017, 38(1): 102-108.