基于可微渲染的自由视点合成方法

doi:10.11996/JG.j.2095-302X.2024051030

图学学报 ›› 2024, Vol. 45 ›› Issue (5): 1030-1039.DOI: 10.11996/JG.j.2095-302X.2024051030

• 计算机图形学与虚拟现实 • 上一篇下一篇

基于可微渲染的自由视点合成方法

朱结(), 宋滢()

浙江理工大学计算机科学与技术学院，浙江杭州 310018

收稿日期:2024-07-04 修回日期:2024-08-10 出版日期:2024-10-31 发布日期:2024-10-31
通讯作者:宋滢(1981-)，女，副教授，博士。主要研究方向为真实感图形学、智能图形计算等。E-mail：ysong@zstu.edu.cn
第一作者:朱结(1998-)，男，硕士研究生。主要研究方向为真实感图形学。E-mail：1904867640@qq.com
基金资助:
浙江省重点研发攻关计划项目(2023C01041)

A free viewpoint synthesis method based on differentiable rendering

ZHU Jie(), SONG Ying()

School of Computer Science & Technology, Zhejiang Sci-Tech University, Hangzhou Zhejiang 310018, China

Received:2024-07-04 Revised:2024-08-10 Published:2024-10-31 Online:2024-10-31
Contact: SONG Ying (1981-), associate professor, Ph.D. Her main research interests cover photorealistic graphics, intelligent graphics computing, etc. E-mail：ysong@zstu.edu.cn
First author：ZHU Jie (1998-), master student. His main research interest covers photorealistic graphics. E-mail：1904867640@qq.com
Supported by:
Key Research and Development Plan of Zhejiang Province(2023C01041)

摘要/Abstract

摘要：

针对目前非受控环境下自由视点合成易受高度可变的照明条件、相机参数等因素影响的问题，提出一种近似可微延迟逆渲染管线(ADDIRP)，通过在延迟逆渲染管线中加入基于物理的相机模型，实现准确模拟相机的光学成像过程。首先，根据输入图像及对应位姿分别创建光度相机模型和几何相机模型，其中光度相机模型由曝光、白平衡等可学习参数表示，几何相机模型由可学习的内参和外参表示。其次，利用渲染图像与目标图像的图像空间损失对管线各组件进行优化，使延迟逆渲染管线对复杂多变的光照和粗糙拍摄的图像具有较强的鲁棒性。最终，生成与传统图形引擎兼容的3D内容重建。实验结果表明，与已有方法相比，ADDIRP在现实世界数据集上具有更优的性能，在合成数据集上，在保证合成质量相近的前提下，具有更出色的视觉感知一致性。

关键词: 自由视点合成, 可变环境, 延迟逆渲染管线, 基于物理的相机模型, 3D内容重建

Abstract:

To address the challenges posed by highly variable lighting conditions and camera parameters in uncontrolled environments affecting free viewpoint synthesis, an approximate differentiable deferred inverse rendering pipeline (ADDIRP) was proposed. This pipeline incorporated a physics-based camera model to accurately simulate the optical imaging process of the camera. Firstly, we proposed creating photometric and geometric camera models based on the input images and corresponding poses. The photometric camera model was represented by learnable parameters such as exposure and white balance, while the geometric camera model was represented by learnable intrinsic and extrinsic parameters. Next, the components of the pipeline were optimized using image space loss between the rendered and target images, enhancing the robustness of the inverse rendering pipeline to complex lighting and roughly captured images. Finally, our approach generated 3D content reconstructions compatible with traditional graphics engines. Experimental results demonstrated that the ADDIRP outperformed existing methods on real-world datasets, achieving superior visual perception consistency on synthetic datasets while maintaining comparable synthesis quality.

Key words: free viewpoint synthesis, variable environments, deferred inverse rendering pipeline, physics-based camera model, 3D content reconstruction

中图分类号:

TP391.41

朱结, 宋滢. 基于可微渲染的自由视点合成方法[J]. 图学学报, 2024, 45(5): 1030-1039.

ZHU Jie, SONG Ying. A free viewpoint synthesis method based on differentiable rendering[J]. Journal of Graphics, 2024, 45(5): 1030-1039.

图/表 13

图1 近似可微延迟逆渲染管线总体框架

Fig. 1 Approximate differentiable deferred rendering pipeline framework

图2 相机成像过程

Fig. 2 The camera imaging process

图3 SE(3)上的扰动模型

Fig. 3 Perturbation model on SE(3)

表1 本文实验环境

Table 1 Experimental environment

名称	规格或型号
CPU	Intel(R) Xeon(R) Gold 6226R CPU @2.90 GHz
GPU	NVIDIA GeForce RTX 3090 24 GB
操作系统	Ubuntu 20.04.1 LTS
Pytorch	1.13.1
NVDIFFRAST	0.3.1
Tiny-cuda-nn	1.7

表2 学习率设置

Table 2 Learning rate settings

名称	数值
lr_pos	3e-2
lr_material	1e-2
lr_light	1e-2
lr_pose	5e-3
lr_intrinsic	1e-2
lr_exposure	1e-4

表3 与NVDIFFREC逐场景定量比较

Table 3 Quantitative comparisons with NVDIFFREC per scene

场景	PSNR↑	SSIM↑	LPIPS↓
GoldCape(NVDIFFREC)	24.027	0.857	0.110
GoldCape(Ours)	23.128	0.824	0.133
EthiopianHead(NVDIFFREC)	25.738	0.915	0.109
EthiopianHead(Ours)	25.854	0.923	0.096
Gnome(NVDIFFREC)	15.699	0.783	0.217
Gnome(Ours)	24.185	0.863	0.143
Statue(NVDIFFREC)	18.464	0.820	0.187
Statue(Ours)	20.746	0.845	0.162
MotherChild(NVDIFFREC)	17.369	0.914	0.121
MotherChild(Ours)	27.665	0.954	0.061

表4 现实场景的定量比较

Table 4 Quantitative comparisons of real-world scenes

方法	PSNR↑	SSIM↑	LPIPS↓
NeRD	22.508	0.829	0.159
NeROIC	25.776	0.892	0.132
NVDIFFREC	20.259	0.858	0.149
Ours	24.316	0.882	0.119

图4 现实场景的定性比较

Fig. 4 Qualitative comparisons of real-world scenes ((a) GT; (b) NeRD; (c) NeROIC; (d) NVDIFFREC; (e) Ours)

表5 合成场景的定量比较

Table 5 Quantitative comparisons of synthetic scenes

方法	PSNR↑	SSIM↑	LPIPS↓
NeRD	25.573	0.895	0.116
NVDIFFREC	26.046	0.936	0.083
Ours	25.580	0.926	0.103

图5 合成场景的定性比较

Fig. 5 Qualitative comparisons of synthetic scenes ((a) GT; (b) NeRD; (c) NVDIFFREC; (e) Ours)

图6 几何相机模型消融研究的定量比较

Fig. 6 Quantitative comparisons of ablation study on geometric camera model

图7 光度相机模型消融研究的定量比较

Fig. 7 Quantitative comparisons of ablation study on photometric camera model

图8 ADDIRP导出的ks贴图

Fig. 8 ks map exported from ADDIRP

参考文献 33

[1]	MUNKBERG J, CHEN W Z, HASSELGREN J, et al. Extracting triangular 3D models, materials, and lighting from images[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8270-8280.
[2]	HORRY Y, ANJYO K I, ARAI K. Tour into the picture: using a spidery mesh interface to make animation from a single image[C]// The 24th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1997: 225-232.
[3]	OH B M, CHEN M, DORSEY J, et al. Image-based modeling and photo editing[C]// The 28th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 2001: 433-442.
[4]	ZHANG L, DUGAS-PHOCION G, SAMSON J S, et al. Single-view modelling of free-form scenes[J]. The Journal of Visualization and Computer Animation, 2002, 13(4): 225-235.
[5]	KHOLGADE N, SIMON T, EFROS A, et al. 3D object manipulation in a single photograph using stock 3D models[J]. ACM Transactions on graphics (TOG), 2014, 33(4): 127.
[6]	MCMILLAN L. An image-based approach to three-dimensional computer graphics[M]. Chapel Hill: University of North Carolina at Chapel Hill, 1997: 30-59.
[7]	SUTHERLAND I E, SPROULL R F, SCHUMACKER R A. A characterization of ten hidden-surface algorithms[J]. ACM Computing Surveys (CSUR), 1974, 6(1): 1-55.
[8]	LEE P J, EFFENDI . Nongeometric distortion smoothing approach for depth map preprocessing[J]. IEEE Transactions on Multimedia, 2011, 13(2): 246-254.
[9]	CHEN S E, WILLIAMS L. View interpolation for image synthesis[C]// The 20th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1993: 279-288.
[10]	杨金钟, 刘政凯, 俞能海, 等. 基于控制点的图象变形方法及其应用[J]. 中国图象图形学报, 2001, 6A(11): 1070-1074.
	YANG J Z, LIU Z K, YU N H, et al. An image warping method based on control points and its applications[J]. Journal of Image and Graphics, 2001, 6A(11): 1070-1074 (in Chinese).
[11]	CHEN S E. QuickTime VR: an image-based approach to virtual environment navigation[C]// The 22nd Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1995: 29-38.
[12]	SHUM H Y, HE L W. Rendering with concentric mosaics[C]// The 26th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1999: 299-306.
[13]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// The 27th International Conference on Neural Information Processing Systems. New York: ACM, 2014: 2672-2680.
[14]	成欢, 王硕, 李孟, 等. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. DOI
	CHENG H, WANG S, LI M, et al. A review of neural radiance field for autonomous driving scene[J]. Journal of Graphics, 2023, 44(6): 1091-1103 (in Chinese). DOI
[15]	王稚儒, 常远, 鲁鹏, 等. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. DOI
	WANG Z R, CHANG Y, LU P, et al. A review on neural radiance fields acceleration[J]. Journal of Graphics, 2024, 45(1): 1-13 (in Chinese). DOI
[16]	CHOI J, JUNG D, LEE T, et al. TMO: textured mesh acquisition of objects with a mobile device by using differentiable rendering[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16674-16684.
[17]	WU T, WANG J Q, PAN X G, et al. Voxurf: voxel-based efficient and accurate neural surface reconstruction[EB/OL]. (2023-08-13) [2024-06-06]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#WuWPXTLL23.
[18]	XU Q G, XU Z X, PHILIP J, et al. Point-neRF: point-based neural radiance fields[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5428-5438.
[19]	HU T, XU X G, LIU S, et al. Point2pix: photo-realistic point cloud rendering via neural radiance fields[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8349-8358.
[20]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[21]	RÜCKERT D, FRANKE L, STAMMINGER M. ADOP: approximate differentiable one-pixel point rendering[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 99.
[22]	MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4455-4465.
[23]	FENG Q, LIU Y B, LAI Y K, et al. FOF: learning fourier occupancy field for monocular real-time human reconstruction[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 537.
[24]	JIANG H C, XU Y M, ZENG Y H, et al. OpenOcc: open vocabulary 3D scene reconstruction via occupancy representation[EB/OL]. (2024-05-18) [2024-06-06]. https://arxiv.org/abs/2403.11796.
[25]	SHIM J, KANG C, JOO K. Diffusion-based signed distance fields for 3D shape generation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 20887-20897.
[26]	YENAMANDRA T, TEWARI A, YANG N, et al. FIRe: fast inverse rendering using directional and signed distance functions[C]// IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 3065-3075.
[27]	LIU W X, WU Y W, RUAN S P, et al. Marching-primitives: shape abstraction from signed distance function[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8771-8780.
[28]	SHEN T C, GAO J, YIN K X, et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3D shape synthesis[C]// The 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 466.
[29]	LAINE S, HELLSTEN J, KARRAS T, et al. Modular primitives for high-performance differentiable rendering[J]. ACM Transactions on Graphics (TOG), 2020, 39(6): 194.
[30]	ENGEL J, KOLTUN V, CREMERS D. Direct sparse odometry[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(3): 611-625. DOI PMID
[31]	RÜCKERT D, STAMMINGER M. Snake-SLAM: efficient global visual inertial SLAM using decoupled nonlinear optimization[C]// 2021 International Conference on Unmanned Aircraft Systems. New York: IEEE Press, 2021: 219-228.
[32]	BOSS M, BRAUN R, JAMPANI V, et al. Nerd: neural reflectance decomposition from image collections[C]// IEEE/ CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 12664-12674.
[33]	KUANG Z F, OLSZEWSKI K, CHAI M L, et al. NeROIC: neural rendering of objects from online image collections[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 56.

基于可微渲染的自由视点合成方法

A free viewpoint synthesis method based on differentiable rendering

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 33

相关文章 13

编辑推荐

Metrics

本文评价

[1]	许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松. 优化医学影像三维渲染可视化效果：技术综述[J]. 图学学报, 2024, 45(5): 879-891.
[2]	王亚茹, 冯利龙, 宋晓轲, 屈卓, 杨珂, 王乾铭, 翟永杰. TFD-YOLOv8：一种用于输电线路的异物检测方法[J]. 图学学报, 2024, 45(5): 901-912.
[3]	吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929.
[4]	翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进Transformer的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940.
[5]	姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967.
[6]	彭文, 林金炜. 基于空间信息关注和纹理增强的短小染色体分类方法[J]. 图学学报, 2024, 45(5): 1017-1029.
[7]	孙己龙, 刘勇, 周黎伟, 路鑫, 侯小龙, 王亚琼, 王志丰. 基于DCNv2和Transformer Decoder的隧道衬砌裂缝高效检测模型研究[J]. 图学学报, 2024, 45(5): 1050-1061.
[8]	刘宗明, 洪唯, 龙睿, 祝越, 张小宇. 基于自注意机制的乳源瑶绣自动生成与应用研究[J]. 图学学报, 2024, 45(5): 1096-1105.
[9]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
[10]	张晨阳, 曹艳华, 杨晓忠. 基于分数阶小波与引导滤波的多聚焦图像融合方法[J]. 图学学报, 2023, 44(1): 77-87.
[11]	王佳婧, 王晨, 朱媛媛, 王笑梅. 基于民国纸币的图元素匹配检索[J]. 图学学报, 2023, 44(3): 492-501.
[12]	朱天晓, 闫丰亭, 史志才. 特征保持的区域分级网格简化算法[J]. 图学学报, 2023, 44(3): 570-578.
[13]	杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215.