A free viewpoint synthesis method based on differentiable rendering

doi:10.11996/JG.j.2095-302X.2024051030

Abstract

Abstract:

To address the challenges posed by highly variable lighting conditions and camera parameters in uncontrolled environments affecting free viewpoint synthesis, an approximate differentiable deferred inverse rendering pipeline (ADDIRP) was proposed. This pipeline incorporated a physics-based camera model to accurately simulate the optical imaging process of the camera. Firstly, we proposed creating photometric and geometric camera models based on the input images and corresponding poses. The photometric camera model was represented by learnable parameters such as exposure and white balance, while the geometric camera model was represented by learnable intrinsic and extrinsic parameters. Next, the components of the pipeline were optimized using image space loss between the rendered and target images, enhancing the robustness of the inverse rendering pipeline to complex lighting and roughly captured images. Finally, our approach generated 3D content reconstructions compatible with traditional graphics engines. Experimental results demonstrated that the ADDIRP outperformed existing methods on real-world datasets, achieving superior visual perception consistency on synthetic datasets while maintaining comparable synthesis quality.

Key words: free viewpoint synthesis, variable environments, deferred inverse rendering pipeline, physics-based camera model, 3D content reconstruction

CLC Number:

TP391.41

ZHU Jie, SONG Ying. A free viewpoint synthesis method based on differentiable rendering[J]. Journal of Graphics, 2024, 45(5): 1030-1039.

Figures/Tables 13

References 33

[1]	MUNKBERG J, CHEN W Z, HASSELGREN J, et al. Extracting triangular 3D models, materials, and lighting from images[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8270-8280.
[2]	HORRY Y, ANJYO K I, ARAI K. Tour into the picture: using a spidery mesh interface to make animation from a single image[C]// The 24th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1997: 225-232.
[3]	OH B M, CHEN M, DORSEY J, et al. Image-based modeling and photo editing[C]// The 28th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 2001: 433-442.
[4]	ZHANG L, DUGAS-PHOCION G, SAMSON J S, et al. Single-view modelling of free-form scenes[J]. The Journal of Visualization and Computer Animation, 2002, 13(4): 225-235.
[5]	KHOLGADE N, SIMON T, EFROS A, et al. 3D object manipulation in a single photograph using stock 3D models[J]. ACM Transactions on graphics (TOG), 2014, 33(4): 127.
[6]	MCMILLAN L. An image-based approach to three-dimensional computer graphics[M]. Chapel Hill: University of North Carolina at Chapel Hill, 1997: 30-59.
[7]	SUTHERLAND I E, SPROULL R F, SCHUMACKER R A. A characterization of ten hidden-surface algorithms[J]. ACM Computing Surveys (CSUR), 1974, 6(1): 1-55.
[8]	LEE P J, EFFENDI . Nongeometric distortion smoothing approach for depth map preprocessing[J]. IEEE Transactions on Multimedia, 2011, 13(2): 246-254.
[9]	CHEN S E, WILLIAMS L. View interpolation for image synthesis[C]// The 20th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1993: 279-288.
[10]	杨金钟, 刘政凯, 俞能海, 等. 基于控制点的图象变形方法及其应用[J]. 中国图象图形学报, 2001, 6A(11): 1070-1074.
	YANG J Z, LIU Z K, YU N H, et al. An image warping method based on control points and its applications[J]. Journal of Image and Graphics, 2001, 6A(11): 1070-1074 (in Chinese).
[11]	CHEN S E. QuickTime VR: an image-based approach to virtual environment navigation[C]// The 22nd Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1995: 29-38.
[12]	SHUM H Y, HE L W. Rendering with concentric mosaics[C]// The 26th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1999: 299-306.
[13]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// The 27th International Conference on Neural Information Processing Systems. New York: ACM, 2014: 2672-2680.
[14]	成欢, 王硕, 李孟, 等. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. DOI
	CHENG H, WANG S, LI M, et al. A review of neural radiance field for autonomous driving scene[J]. Journal of Graphics, 2023, 44(6): 1091-1103 (in Chinese). DOI
[15]	王稚儒, 常远, 鲁鹏, 等. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. DOI
	WANG Z R, CHANG Y, LU P, et al. A review on neural radiance fields acceleration[J]. Journal of Graphics, 2024, 45(1): 1-13 (in Chinese). DOI
[16]	CHOI J, JUNG D, LEE T, et al. TMO: textured mesh acquisition of objects with a mobile device by using differentiable rendering[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16674-16684.
[17]	WU T, WANG J Q, PAN X G, et al. Voxurf: voxel-based efficient and accurate neural surface reconstruction[EB/OL]. (2023-08-13) [2024-06-06]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#WuWPXTLL23.
[18]	XU Q G, XU Z X, PHILIP J, et al. Point-neRF: point-based neural radiance fields[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5428-5438.
[19]	HU T, XU X G, LIU S, et al. Point2pix: photo-realistic point cloud rendering via neural radiance fields[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8349-8358.
[20]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[21]	RÜCKERT D, FRANKE L, STAMMINGER M. ADOP: approximate differentiable one-pixel point rendering[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 99.
[22]	MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4455-4465.
[23]	FENG Q, LIU Y B, LAI Y K, et al. FOF: learning fourier occupancy field for monocular real-time human reconstruction[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 537.
[24]	JIANG H C, XU Y M, ZENG Y H, et al. OpenOcc: open vocabulary 3D scene reconstruction via occupancy representation[EB/OL]. (2024-05-18) [2024-06-06]. https://arxiv.org/abs/2403.11796.
[25]	SHIM J, KANG C, JOO K. Diffusion-based signed distance fields for 3D shape generation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 20887-20897.
[26]	YENAMANDRA T, TEWARI A, YANG N, et al. FIRe: fast inverse rendering using directional and signed distance functions[C]// IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 3065-3075.
[27]	LIU W X, WU Y W, RUAN S P, et al. Marching-primitives: shape abstraction from signed distance function[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8771-8780.
[28]	SHEN T C, GAO J, YIN K X, et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis[C]// The 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 466.
[29]	LAINE S, HELLSTEN J, KARRAS T, et al. Modular primitives for high-performance differentiable rendering[J]. ACM Transactions on Graphics (TOG), 2020, 39(6): 194.
[30]	ENGEL J, KOLTUN V, CREMERS D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611-625. DOI PMID
[31]	RÜCKERT D, STAMMINGER M. Snake-SLAM: efficient global visual inertial SLAM using decoupled nonlinear optimization[C]// 2021 International Conference on Unmanned Aircraft Systems. New York: IEEE Press, 2021: 219-228.
[32]	BOSS M, BRAUN R, JAMPANI V, et al. Nerd: neural reflectance decomposition from image collections[C]// IEEE/ CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 12664-12674.
[33]	KUANG Z F, OLSZEWSKI K, CHAI M L, et al. NeROIC: neural rendering of objects from online image collections[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 56.

名称	规格或型号
CPU	Intel(R) Xeon(R) Gold 6226R CPU @2.90 GHz
GPU	NVIDIA GeForce RTX 3090 24 GB
操作系统	Ubuntu 20.04.1 LTS
Pytorch	1.13.1
NVDIFFRAST	0.3.1
Tiny-cuda-nn	1.7

名称	规格或型号
CPU	Intel(R) Xeon(R) Gold 6226R CPU @2.90 GHz
GPU	NVIDIA GeForce RTX 3090 24 GB
操作系统	Ubuntu 20.04.1 LTS
Pytorch	1.13.1
NVDIFFRAST	0.3.1
Tiny-cuda-nn	1.7

名称	数值
lr_pos	3e-2
lr_material	1e-2
lr_light	1e-2
lr_pose	5e-3
lr_intrinsic	1e-2
lr_exposure	1e-4

名称	数值
lr_pos	3e-2
lr_material	1e-2
lr_light	1e-2
lr_pose	5e-3
lr_intrinsic	1e-2
lr_exposure	1e-4

场景	PSNR↑	SSIM↑	LPIPS↓
GoldCape(NVDIFFREC)	24.027	0.857	0.110
GoldCape(Ours)	23.128	0.824	0.133
EthiopianHead(NVDIFFREC)	25.738	0.915	0.109
EthiopianHead(Ours)	25.854	0.923	0.096
Gnome(NVDIFFREC)	15.699	0.783	0.217
Gnome(Ours)	24.185	0.863	0.143
Statue(NVDIFFREC)	18.464	0.820	0.187
Statue(Ours)	20.746	0.845	0.162
MotherChild(NVDIFFREC)	17.369	0.914	0.121
MotherChild(Ours)	27.665	0.954	0.061