动态三维场景重建研究综述

doi:10.11996/JG.j.2095-302X.2024010014

摘要/Abstract

摘要：

三维重建技术旨在通过传感器输入，恢复所观测场景的数字化三维表示，是计算机图形学与视觉领域的重要研究方向，在可视化、模拟、路线规划等各类任务上都有重要应用。相比于静态场景，动态场景额外引入了时间维度，对应的重建任务不仅需要重构每帧细节几何，还需刻画目标随着时间变化的趋势与关联关系用于下游分析任务，为重建算法设计带来了更大的挑战。然而，目前学界就动态场景重建的讨论依然仅处于起步阶段，且关于现有方法的系统性总结也较为欠缺。为了填补上述空缺、进一步启发算法设计，对学界当前最新的动态三维场景重建技术进行整理和归纳，对动态三维场景重建问题及其通用求解框架进行一般性的定义，从动态三维表示方式、优化框架方面对已有技术进行综述，并针对结构化的特殊场景讨论对应的重建方法与处理方式。最终，介绍相关数据集，并对动态三维场景重建现存的问题进行分析总结，对未来工作进行展望。

清华大学穆太江助理研究员及学生黄家晖对现有的动态三维场景重建技术进行了详细的整理和归纳，总结了动态三维场景重建问题常用的求解框架，介绍了相关数据集，并讨论了针对结构化场景的特殊重建方法和处理方式，通过分析动态三维场景重建当前存在的问题，对未来的研究方向进行了展望，为后续研究提供参考。

关键词: 动态三维重建, 研究综述, 动态场景表示, 三维建模, 结构化场景

Abstract:

Three-dimensional reconstruction technology aims to recover the digital 3D representation of an observed scene through sensor input. It is an important research direction in the fields of computer graphics and vision, with significant applications in visualization, simulation, route planning, and various other tasks. Compared to static scenes, dynamic scenes introduce an additional temporal dimension. The reconstruction of dynamic scenes not only requires accurately reconstructing the geometric details of each frame but also capturing the motion trends of the target over time and correlations for downstream analysis tasks, presenting greater challenges to the design of reconstruction algorithms. However, the existing literature pertaining to the reconstruction of dynamic scenes is still in their infancy, and systematic summarizations of existing methodologies are notably lacking. In an endeavor to address these problems and to enlighten future algorithm design, the latest dynamic 3D scene reconstruction technologies in the literature were reviewed and summarized. A general definition of dynamic 3D scene reconstruction and its general solution framework was provided. Existing technologies were reviewed from the perspectives of dynamic 3D representation methods and optimization frameworks, and the reconstruction algorithms and processing methods for structured scenes were discussed. Finally, existing datasets were summarized, the existing problems in dynamic 3D scene reconstruction were identified, and an outlook on future research was provided.

Key words: dynamic 3D reconstruction, literature review, dynamic scene representation, 3D modeling, structured scenes

中图分类号:

TP391

黄家晖, 穆太江. 动态三维场景重建研究综述[J]. 图学学报, 2024, 45(1): 14-25.

HUANG Jiahui, MU Taijiang. A survey of dynamic 3D scene reconstruction[J]. Journal of Graphics, 2024, 45(1): 14-25.

图/表 5

图1 不同动态三维表示方式图示((a)体素网格与变形场；(b)时空点云；(c)神经隐式场)

Fig. 1 Illustration of different dynamic 3D representations ((a) Voxel grid and deformation field; (b) Spatial-temporal point cloud; (c) Neural implicit field)

图2 Occupancy Flow[42]方法人体重建结果

Fig. 2 Human reconstruction result of Occupancy Flow[42]

表1 不同动态重建方法效果对比

Table 1 Performance comparison of different 3D dynamic reconstruction algorithms

方法名称	发表年份	PSNR
Neural 3D Video^[49]	CVPR 2022	29.6
NeRFPlayer^[51]	TVCG 2023	30.7
StreamRF^[52]	NeurIPS 2022	28.3
HyperReel^[50]	CVPR 2023	31.1

图3 动态重建方法效果示意(前两行为HyperReel[50]效果；最后一行为NeRFPlayer[51]效果，左半部分为真值，右半部分为该方法结果。图片均引用自原文)

Fig. 3 Qualitative results of dynamic NeRF methods (the first two rows are from HyperReel[50], and the last row is from NeRFPlayer[51], with result on the right and ground truth on the left)

图4 ClusterVO[71]在室内与室外多刚体场景下的结果(上：室内2个水瓶运动场景的运动分割及动态地图；下：室外驾驶场景的分割及动态地图)

Fig. 4 Results of ClusterVO in indoor and outdoor multibody scenes (upper row: motion segmentation and reconstructed dynamic map of an indoor two-bottle movement scene; lower row: segmentation and reconstructed dynamic map of an outdoor driving scene)

参考文献 105

[1]	NEWCOMBE R A, FOX D, SEITZ S M. DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 343-352.
[2]	GAO W, TEDRAKE R. SurfelWarp: Efficient non-volumetric single view dynamic reconstruction[EB/OL]. [2023-07-19]. https://arxiv.org/abs/1904.13073.
[3]	PARK S, SON M, JANG S, et al. Temporal Interpolation is all You Need for Dynamic Neural Radiance Fields[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4212-4221.
[4]	GUO K W, XU F, YU T, et al. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera[J]. ACM Transactions on Graphics, 36(3): 32:1-32:13.
[5]	YU T, GUO K W, XU F, et al. BodyFusion: real-time capture of human motion and surface geometry using a single depth camera[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 910-919.
[6]	BOŽIČ A, ZOLLHÖFER M, THEOBALT C, et al. DeepDeform: learning non-rigid RGB-D reconstruction with semi-supervised data[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7000-7010.
[7]	TEED Z, DENG J. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2108.10869.pdf.
[8]	HUANG J H, HUANG S S, SONG H X, et al. DI-fusion: online implicit 3D reconstruction with deep priors[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8928-8937.
[9]	ZHU Z H, PENG S Y, LARSSON V, et al. NICE-SLAM: neural implicit scalable encoding for SLAM[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12776-12786.
[10]	ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2005.11052.pdf.
[11]	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 4076-4083. DOI URL
[12]	INGALE A K, J D U. Real-time 3D reconstruction techniques applied in dynamic scenes: a systematic literature review[J]. Computer Science Review, 2021, 39: 100338. DOI URL
[13]	PENG S D, YAN Y Z, SHUAI Q, et al. Representing volumetric videos as dynamic MLP maps[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4252-4262.
[14]	WANG L, HU Q, HE Q H, et al. Neural residual radiance fields for streamably free-viewpoint videos[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 76-87.
[15]	CHOY C, GWAK J, SAVARESE S. 4D spatio-temporal ConvNets:minkowski convolutional neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3070-3079.
[16]	CURLESS B, LEVOY M. A volumetric method for building complex models from range images[C]// The 23rd annual conference on Computer graphics and interactive techniques. New York: ACM, 1996: 303-312.
[17]	YU A, LI R L, TANCIK M, et al. PlenOctrees for real-time rendering of neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5732-5741
[18]	KIM D, LEE M, MUSETH K. NeuralVDB: high-resolution sparse volume representation using hierarchical neural networks[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2208.04448.pdf.
[19]	IZADI S, KIM D, HILLIGES O, et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera[C]// The 24th annual ACM symposium on User interface software and technology. New York: ACM, 2011: 559-568.
[20]	SUN C, SUN M, CHEN H T. Direct voxel grid optimization: super-fast convergence for radiance fields reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5449-5459.
[21]	DOU M S, DAVIDSON P, FANELLO S R, et al. Motion2fusion: real-time volumetric performance capture[J]. ACM Transactions on Graphics, 36(6)Article No. 246,
[22]	DOU M S, KHAMIS S, DEGTYAREV Y, et al. Fusion4D: real-time performance capture of challenging scenes[J]. ACM Transactions on Graphics, 35(4): 114:1-114:13.
[23]	BOULCH A, MARLET R. POCO: point convolution for surface reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 6292-6304.
[24]	RÜCKERT D, FRANKE L, STAMMINGER M. ADOP: approximate differentiable one-pixel point rendering[J]. ACM Transactions on Graphics, 41(4): 99:1-99:14.
[25]	WILLIAMS F, GOJCIC Z, KHAMIS S, et al. Neural fields as learnable kernels for 3D reconstruction[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 18479-18489.
[26]	RAKHIMOV R, ARDELEAN A T, LEMPITSKY V, et al. NPBG: accelerating neural point-based graphics[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 15948-15958.
[27]	WITTNER C, SCHAUERTE B, STIEFELHAGEN R. What’s the point? frame-wise pointing gesture recognition with latent-dynamic conditional random fields[EB/OL]. [2023-07- 19]. https://arxiv.org/abs/1510.05879.pdf.
[28]	LOMBARDI S, SIMON T, SCHWARTZ G, et al. Mixture of volumetric primitives for efficient neural rendering[J]. ACM Transactions on Graphics, 40(4): 59:1-59:13.
[29]	KERBL B, KOPANAS G, LEIMKUEHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 42(4): 139:1-139:14.
[30]	COLLET A, CHUANG M, SWEENEY P, et al. High-quality streamable free-viewpoint video[J]. ACM Transactions on Graphics, 34(4): 69:1-69:13.
[31]	PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 165-174.
[32]	MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4455-4465.
[33]	HORNIK K, STINCHCOMBE M, WHITE H. Multilayer feedforward networks are universal approximators[J]. Neural Networks, 1989, 2(5): 359-366. DOI URL
[34]	TANCIK M, SRINIVASAN P P, MILDENHALL B, et al. Fourier features let networks learn high frequency functions in low dimensional domains[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 7537-7547.
[35]	YU Z H, PENG S Y, NIEMEYER M, et al. MonoSDF: exploring monocular geometric cues for neural implicit surface reconstruction[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2206.00665.pdf.
[36]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[C]// European Conference on Computer Vision. Cham: Springer, 2020: 405-421.
[37]	BARRON J T, MILDENHALL B, TANCIK M, et al. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5835-5844.
[38]	HUANG X, TANG L L, LIU Y, et al. NIST: learning neural implicit surfaces and textures for multi-view reconstruction[C]// Advances in Smart Vehicular Technology, Transportation, Communication and Applications. Singapore: Springer, 2023: 385-395.
[39]	LONG X X, LIN C, WANG P, et al. SparseNeuS: fast generalizable neural surface reconstruction from sparse views[C]// European Conference on Computer Vision. Cham: Springer, 2022: 210-227.
[40]	GU J T, TREVITHICK A, LIN K E, et al. NerfDiff: single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion[C]// The 40th International Conference on Machine Learning. New York: ACM, 2023: 11808-11826.
[41]	YANG G D, KUNDU A, GUIBAS L J, et al. Learning a diffusion prior for NeRFs[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2304.14473.pdf.
[42]	NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Occupancy flow: 4D reconstruction by learning particle dynamics[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 5378-5388.
[43]	PUMAROLA A, CORONA E, PONS-MOLL G, et al. D-NeRF: neural radiance fields for dynamic scenes[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 10313-10322.
[44]	LI Z Q, NIKLAUS S, SNAVELY N, et al. Neural scene flow fields for space-time view synthesis of dynamic scenes[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 6498-6508.
[45]	MÜLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics, 2022, 41(4): 1-15.
[46]	PALAFOX P, BOŽIČ A, THIES J, et al. NPMs: neural parametric models for 3D deformable shapes[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 12675-12685.
[47]	CAO A, JOHNSON J. He_xPlane: a fast representation for dynamic scenes[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 130-141
[48]	CHEN A P, XU Z X, GEIGER A, et al. TensoRF: tensorial radiance fields[C]// European Conference on Computer Vision. Cham: Springer, 2022: 333-350.
[49]	LI T Y, SLAVCHEVA M, ZOLLHOEFER M, et al. Neural 3D video synthesis from multi-view video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5511-5521.
[50]	ATTAL B, HUANG J B, RICHARDT C, et al. HyperReel: high-fidelity 6-DoF video with ray-conditioned sampling[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16610-16620.
[51]	SONG L C, CHEN A P, LI Z, et al. NeRFPlayer: a streamable dynamic scene representation with decomposed neural radiance fields[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29: 2732-2742. DOI URL
[52]	LI L, SHEN Z, WANG Z, et al. Streaming radiance fields for 3D video synthesis[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2210.14831.
[53]	LIU J W, CAO Y P, MAO W J, et al. DeVRF: fast deformable voxel radiance fields for dynamic scenes[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2205.15723.pdf.
[54]	PAN X R, LAI Z H, SONG S J, et al. ActiveNeRF: learning where to see with uncertainty estimation[C]// European Conference on Computer Vision. Cham: Springer, 2022: 230-246.
[55]	NIEMEYER M, BARRON J T, MILDENHALL B, et al. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5470-5480.
[56]	TRUONG P, RAKOTOSAONA M J, MANHARDT F, et al. SPARF: neural radiance fields from sparse and noisy poses[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4190-4200.
[57]	VERBIN D, HEDMAN P, MILDENHALL B, et al. Ref-NeRF: structured view-dependent appearance for neural radiance fields[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 5481-5490.
[58]	LI Z Q, WANG Q Q, COLE F, et al. DynIBaR: neural dynamic image-based rendering[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4273-4284.
[59]	MEULEMAN A, LIU Y L, GAO C, et al. Progressively optimized local radiance fields for robust view synthesis[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2023: 16539-16548.
[60]	LIU H T D, WILLIAMS F, JACOBSON A, et al. Learning smooth neural functions via lipschitz regularization[C]// Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. New York, NY, USA: ACM, 2022: 31:1-31:13.
[61]	DUCHI J C, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. J Mach Learn Res, 2011, 12: 2121-2159.
[62]	KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. [2023-07-19]. https://arxiv.org/pdf/1412.6980v8.pdf.
[63]	YU T, ZHENG Z R, GUO K W, et al. Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 5742-5752.
[64]	YU T, ZHAO J H, ZHENG Z R, et al. DoubleFusion: real-time capture of human performances with inner body shapes from a single depth sensor[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2523-2539. DOI PMID
[65]	HUANG J H, BIRDAL T, GOJCIC Z, et al. Multiway non-rigid point cloud registration via learned functional map synchronization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2038-2053. DOI URL
[66]	COSTEIRA J P, KANADE T. A multibody factorization method for independently moving objects[J]. International Journal of Computer Vision, 1998, 29(3): 159-179. DOI URL
[67]	XU B B, LI W B, TZOUMANIKAS D, et al. MID-fusion: octree-based object-level multi-instance dynamic SLAM[C]// 2019 International Conference on Robotics and Automation. New York: IEEE Press, 2019: 5231-5237.
[68]	HUANG J H, YANG S, ZHAO Z S, et al. ClusterSLAM: a SLAM backend for simultaneous rigid body clustering and motion estimation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 5874-5883.
[69]	BESCOS B, CAMPOS C, TARDÓS J D, et al. DynaSLAM II: tightly-coupled multi-object tracking and SLAM[J]. IEEE Robotics and Automation Letters, 2021, 6(3): 5191-5198. DOI URL
[70]	BALLESTER I, FONTÁN A, CIVERA J, et al. DOT: dynamic object tracking for visual SLAM[C]// 2021 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2021: 11705-11711.
[71]	HUANG J H, YANG S, MU T J, et al. ClusterVO: clustering moving instances and estimating visual odometry for self and surroundings[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2165-2174.
[72]	LONG R, RAUCH C, ZHANG T W, et al. RigidFusion: robot localisation and mapping in environments with large dynamic rigid objects[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 3703-3710. DOI URL
[73]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-07-19]. https://arxiv.org/abs/1804.02767.pdf.
[74]	DENG C Y, LEI J H, SHEN B K, et al. Banana: Banach fixed-point network for pointcloud segmentation with inter-part equivariance[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2305.16314.pdf.
[75]	LEI J H, DENG C Y, SCHMECKPEPER K, et al. EFEM: equivariant neural field expectation maximization for 3D object segmentation without scene supervision[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 4902-4912.
[76]	LI P L, QIN T, SHEN S J. Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving[M]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 664-679.
[77]	REDDY N D, VO M, NARASIMHAN S G. CarFusion: combining point tracking and part detection for dynamic 3D reconstruction of vehicles[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 1906-1915.
[78]	RODDICK T, BIGGS B, REINO D O, et al. On the road to large-scale 3D monocular scene reconstruction using deep implicit functions[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2875-2884.
[79]	HUANG S Y, GOJCIC Z, HUANG J H, et al. Dynamic 3D scene analysis by Point cloud accumulation[M]// Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022: 674-690.
[80]	HU Y H, YANG J Z, CHEN L, et al. Planning-oriented autonomous driving[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17853-17862.
[81]	LI C, ZHAO Z H, GUO X H. ArticulatedFusion: real-time reconstruction of motion, geometry and segmentation using a single depth camera[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 324-340.
[82]	NUNES U M, DEMIRIS Y. Online unsupervised learning of the 3D kinematic structure of arbitrary rigid bodies[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 3808-3816.
[83]	HUANG J H, WANG H, BIRDAL T, et al. MultiBodySync: multi-body segmentation and motion estimation via 3D scan synchronization[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7104-7114.
[84]	XIANG F B, QIN Y Z, MO K C, et al. SAPIEN: a SimulAted part-based interactive ENvironment[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11094-11104.
[85]	MU J T, QIU W C, KORTYLEWSKI A, et al. A-SDF: learning disentangled signed distance functions for articulated shape representation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 12981-12991.
[86]	LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 34(6): 248:1-248:16.
[87]	OSMAN A A A, BOLKART T, BLACK M J. STAR: sparse trained articulated human body regressor[C]// Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VI. New York: ACM, 2020: 598-613.
[88]	ROMERO J, TZIONAS D, BLACK M J. Embodied hands: modeling and capturing hands and bodies together[J]. [2023-07-19]. https://arxiv.org/abs/2201.02610.
[89]	XIU Y L, YANG J L, TZIONAS D, et al. ICON: implicit clothed humans obtained from normals[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13286-13296.
[90]	XIU Y L, YANG J L, CAO X, et al. ECON: explicit Clothed humans Optimized via Normal integration[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 512-523.
[91]	REMPE D, BIRDAL T, HERTZMANN A, et al. HuMoR: 3D human motion model for robust pose estimation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 11468-11479.
[92]	SONG H X, HUANG J H, CAO Y P, et al. HDR-Net-Fusion: real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network[J]. Computational Visual Media, 2021, 7(4): 419-435. DOI
[93]	XU H Y, ALLDIECK T, SMINCHISESCU C. H-NeRF: neural radiance fields for rendering and temporal reconstruction of humans in motion[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2110.13746.pdf.
[94]	ZHANG J Y, PEPOSE S, JOO H, et al. Perceiving 3D human-object spatial arrangements from a single image in the wild[M]// Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 34-51.
[95]	XU X, JOO H, MORI G, et al. D3D-HOI: dynamic 3D human-object interactions from videos[EB/OL]. [2023-07-19]. https://arxiv.org/abs/2108.08420.pdf.
[96]	XIE X H, BHATNAGAR B L, PONS-MOLL G. CHORE:contact, human and Object reconstruction from a Single RGB image[M]// Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022: 125-145.
[97]	BROXTON M, FLYNN J, OVERBECK R, et al. Immersive light field video with a layered mesh representation[J]. ACM Transactions on Graphics, 39(4): 86:1-86:15.
[98]	PARK K, SINHA U, BARRON J T, et al. Nerfies: deformable neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5845-5854.
[99]	YOON J S, KIM K, GALLO O, et al. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5335-5344.
[100]	BOGO F, ROMERO J, PONS-MOLL G, et al. Dynamic FAUST: registering human bodies in motion[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5573-5582.
[101]	RANJAN A, BOLKART T, SANYAL S, et al. Generating 3D faces using convolutional mesh autoencoders[M]// Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 725-741.
[102]	RANJAN A, BOLKART T, SANYAL S, et al. Numerical geometry of non-rigid shapes[DB/OL]. [2023-07-24]. https://link.springer.com/book/10.1007/978-0-387-73301-2.
[103]	SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: waymo open dataset[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2443-2451.
[104]	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3354-3361.
[105]	CAESAR H, BANKITI V, LANG A H, et al. nuScenes: a multimodal dataset for autonomous driving[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11618-11628.