The next best view navigation technology based on RGB features

doi:10.11996/JG.j.2095-302X.2025030551

Abstract

Abstract:

Neural radiance field (NeRF) has shown excellent performance in reconstructing 3D scenes from 2D images. Using 2D images as training data, the 3D structure of scenes could be reconstructed and new views could be rendered with high quality. Although NeRF is very effective in reconstructing 3D scenes, issues of slow training speed and long inference time are encountered, and the sample quality is closely related to the quality of 3D scene reconstruction. In order to address the challenge of high-quality 3D reconstruction of NeRF under conditions of low sample quality, two sets of NeRFs with different hash codes were employed to learn the same scene and to evaluate the gap between the information gain of candidate views to guide view sampling. A new framework of Next Best View navigation technology based on RGB features was proposed. This framework exhibited strong robustness with sparse training data, was capable of capturing the next best view with high information gain through RGB feature evaluation, and optimized NeRF training, thereby improving the quality of new view synthesis with a minimal number of additional views. By optimizing the NeRF training process, the network convergence speed was increased by approximately 10 times, and the memory usage was reduced by 39.8%. A large number of experiments have verified the effectiveness and robustness of the proposed model.

Key words: neural radiance field, hash coding, sparse reconstruction, information gain, active learning

CLC Number:

TP391

ZHOU Zheng, DAI Yaqiao, YI Renjiao, LAN Long, ZHU Chenyang. The next best view navigation technology based on RGB features[J]. Journal of Graphics, 2025, 46(3): 551-557.

Figures/Tables 6

References 24

[1]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. Nerf: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[2]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.
[3]	GOESELE M, CURLESS B, SEITZ S M. Multi-view stereo revisited[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2006: 2402-2409.
[4]	常远, 盖孟. 基于神经辐射场的视点合成算法综述[J]. 图学学报, 2021, 42(3): 376-384.
	CHANG Y, GAI M. A review on neural radiance fields based view synthesis[J]. Journal of Graphics, 2021, 42(3): 376-384 (in Chinese).
[5]	董相涛, 马鑫, 潘成伟, 等. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. DOI
	DONG X T, MA X, PAN C W, et al. A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649 (in Chinese). DOI
[6]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 1-15.
[7]	BAJCSY R, ALOIMONOS Y, TSOTSOS J K. Revisiting active perception[J]. Autonomous Robots, 2018, 42(2): 177-196. DOI PMID
[8]	LIU M, SHI Y F, ZHENG L T, et al. Recurrent 3D attentional networks for end-to-end active object recognition[J]. Computational Visual Media, 2019, 5(1): 91-104.
[9]	ISLER S, SABZEVARI R, DELMERICO J, et al. An information gain formulation for active volumetric 3D reconstruction[C]// 2016 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2016: 3477-3484.
[10]	BIRCHER A, KAMEL M, ALEXIS K, et al. Receding horizon “next-best-view” planner for 3D exploration[C]// 2016 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2016: 1462-1468.
[11]	ZAENKER T, SMITT C, MCCOOL C, et al. Viewpoint planning for fruit size and position estimation[C]// 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2021: 3271-3277.
[12]	ZENG R, ZHAO W, LIU Y J. PC-NBV: a point cloud based deep network for efficient next best view planning[C]// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2020: 7050-7057.
[13]	SONG S, JO S. Surface-based exploration for autonomous 3D modeling[C]// 2018 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2018: 4319-4326.
[14]	WU Q Y, MANOCHA D, WANG J, et al. NeoNav: improving the generalization of visual navigation via generating next expected observations[C]// The Thirty-Fourth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 10001-10008.
[15]	PAN X R, LAI Z H, SONG S J, et al. ActiveNERF: learning where to see with uncertainty estimation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 230-246.
[16]	JIN L R, CHEN X Y L, RÜCKIN J, et al. NeU-NBV: next best view planning using uncertainty estimation in image-based neural rendering[C]// 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2023: 11305-11312.
[17]	KAJIYA J T, VON HERZEN B P. Ray tracing volume densities[J]. ACM SIGGRAPH Computer Graphics, 1984, 18(3): 165-174.
[18]	SHEN J X, RUIZ A, AGUDO A, et al. Stochastic neural radiance fields: quantifying uncertainty in implicit 3D representations[C]// 2021 International Conference on 3D Vision. New York: IEEE Press, 2021: 972-981.
[19]	MARTIN-BRUALLA R, RADWAN N, SAJJADI M S M, et al. NeRF in the wild: neural radiance fields for unconstrained photo collections[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7210-7219.
[20]	RAN Y L, ZENG J, HE S B, et al. NeurAR: neural uncertainty for autonomous 3D reconstruction with implicit neural representations[J]. IEEE Robotics and Automation Letters, 2023, 8(2): 1125-1132.
[21]	LEE S, CHEN L, WANG J H, et al. Uncertainty guided policy for active robotic 3D reconstruction using neural radiance fields[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 12070-12077.
[22]	ZHAN H Y, ZHENG J Y, XU Y, et al. ActiveRMAP: radiance field for active mapping and planning[EB/OL]. [2024-06-23]https://ar5iv.labs.arxiv.org/html/2211.12656,2022.
[23]	SITZMANN V, ZOLLHÖFER M, WETZSTEIN G. Scene representation networks: continuous 3D-structure-aware neural scene representations[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 101.
[24]	MILDENHALL B, SRINIVASAN P P, ORTIZ-CAYON R, et al. Local light field fusion: practical view synthesis with prescriptive sampling guidelines[J]. ACM Transactions on Graphics, 2019, 38(4): 1-14.

方法		合成场景
方法		PSNR	SSIM	LIPIS
设定1	SRN^[23]	22.26	0.846	0.170
	LLFF^[24]	24.88	0.911	0.114
	ActiveNeRF^[15]	30.45	0.954	0.072
	Ours	29.65	0.986	0.038
设定2	ActiveNeRF^[15]	28.07	0.931	0.064
设定2	Ours	28.14	0.986	0.050
设定3	ActiveNeRF^[15]	21.77	0.954	0.118
设定3	Ours	23.15	0.972	0.146

方法		合成场景
方法		PSNR	SSIM	LIPIS
设定1	SRN^[23]	22.26	0.846	0.170
	LLFF^[24]	24.88	0.911	0.114
	ActiveNeRF^[15]	30.45	0.954	0.072
	Ours	29.65	0.986	0.038
设定2	ActiveNeRF^[15]	28.07	0.931	0.064
设定2	Ours	28.14	0.986	0.050
设定3	ActiveNeRF^[15]	21.77	0.954	0.118
设定3	Ours	23.15	0.972	0.146

方法	迭代次数/K	训练时间/h	显存占用/GB
ActiveNeRF^[15]	500	26.0	5.45
Ours	50	2.5	3.28

方法	迭代次数/K	训练时间/h	显存占用/GB
ActiveNeRF^[15]	500	26.0	5.45
Ours	50	2.5	3.28

采样策略		合成场景
采样策略		PSNR	SSIM	LIPIS
设定1	随机采样	27.81	0.937	0.047
	最远视图采样	27.51	0.929	0.058
	Ours	28.14	0.986	0.050
设定2	随机采样	14.58	0.908	0.462
	最远视图采样	14.77	0.900	0.404
	Ours	23.15	0.972	0.146