图学学报 ›› 2025, Vol. 46 ›› Issue (2): 415-424.DOI: 10.11996/JG.j.2095-302X.2025020415
收稿日期:
2024-08-17
接受日期:
2025-01-21
出版日期:
2025-04-30
发布日期:
2025-04-24
通讯作者:
徐丹(1968-),女,教授,博士。主要研究方向为计算机视觉、图像分析与理解、文化计算等。E-mail:danxu@ynu.edu.cn第一作者:
邱佳新(1998-),女,硕士研究生。主要研究方向为三维重建。E-mail:12022215169@mail.ynu.edu.cn
基金资助:
QIU Jiaxin(), SONG Qianyun, XU Dan(
)
Received:
2024-08-17
Accepted:
2025-01-21
Published:
2025-04-30
Online:
2025-04-24
First author:
QIU Jiaxin (1998-), master student. Her main research interest covers 3D reconstruction. E-mail:12022215169@mail.ynu.edu.cn
Supported by:
摘要:
中国民族舞蹈作为一种世代传承的艺术形式,诞生于人民群众的日常生活之中,但是随着社会发展,一些传统的舞蹈存在传承不到位的问题,继而面临着濒临失传的尴尬局面。不同民族的舞蹈各具特色,动作变化复杂,为了更好地对民族舞蹈进行保护,于是提出一种基于改进神经辐射场的民族舞蹈三维重建方法。首先通过一个改进的姿态估计算法,在对姿态进行降噪优化后将形变场分解为由深度神经网络产生的刚性运动以及非刚性运动,并通过线性混合蒙皮将姿态由观测空间映射到标准空间,以此得到一个与姿态无关的形变场。然后,使用神经辐射场来对人体进行三维重建,在重建的过程中使用注意力机制加强边缘色彩的学习,同时对姿态估计得到的人体动作进行优化,最后得到舞蹈者每一帧不同视角的新渲染视图。实验结果表明,该方法能较好地对舞蹈者及舞姿进行三维重建,与HumanNeRF相比,提高了还原的精度。相比传统的、二维的舞蹈保护技术,此方法能更好地还原出舞蹈者的动作,以达到对民族舞蹈保护的目的。
中图分类号:
邱佳新, 宋倩云, 徐丹. 基于改进神经辐射场的民族舞蹈重建方法[J]. 图学学报, 2025, 46(2): 415-424.
QIU Jiaxin, SONG Qianyun, XU Dan. A neural radiation field-based approach to ethnic dance reconstruction[J]. Journal of Graphics, 2025, 46(2): 415-424.
图3 优化后姿态对比((a)原姿态;(b) VIBE姿态估计;(c) Neuman优化姿态估计;(d)本方法优化姿态估计)
Fig. 3 Posture comparison after optimization ((a) The original attitude; (b) Estimation of VIBE attitude; (c) Neuman optimized attitude estimation; (d) Ours optimizes attitude estimation)
图4 渲染结果对比((a)本方法渲染已有视角以及新视角;(b) HumanNeRF渲染已有视角以及新视角)
Fig. 4 Render result comparison ((a) Ours renders existing views as well as new views; (b) HumanNeRF renders existing and new perspectives)
方法 | 民族 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
HumanNeRF | 撒尼族 | 28.53 | 0.975 8 | 21.05 |
白族 | 27.36 | 0.974 0 | 24.02 | |
彝族 | 28.58 | 0.974 9 | 23.56 | |
Ours | 撒尼族 | 29.05 | 0.977 3 | 17.89 |
白族 | 27.86 | 0.976 8 | 20.20 | |
彝族 | 28.83 | 0.975 6 | 21.17 |
表1 与HumanNeRF的指标对比
Table 1 Comparison of metrics with HumanNeRF
方法 | 民族 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
HumanNeRF | 撒尼族 | 28.53 | 0.975 8 | 21.05 |
白族 | 27.36 | 0.974 0 | 24.02 | |
彝族 | 28.58 | 0.974 9 | 23.56 | |
Ours | 撒尼族 | 29.05 | 0.977 3 | 17.89 |
白族 | 27.86 | 0.976 8 | 20.20 | |
彝族 | 28.83 | 0.975 6 | 21.17 |
方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|
HumanNeRF | 28.15 | 0.974 9 | 22.88 |
Ours (无注意力模块) | 28.42 | 0.976 0 | 21.78 |
Ours (完整模型) | 28.58 | 0.976 6 | 19.75 |
表2 民族舞蹈数据集平均指标对比
Table 2 Comparison of average indicators for the folk dance dataset
方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|
HumanNeRF | 28.15 | 0.974 9 | 22.88 |
Ours (无注意力模块) | 28.42 | 0.976 0 | 21.78 |
Ours (完整模型) | 28.58 | 0.976 6 | 19.75 |
图5 消融实验((a)真实视角人物照片;(b)完成模型条件下渲染结果;(c)无注意力模块条件下渲染结果)
Fig. 5 Ablation experiment ((a) Photos of people from a real perspective; (b) Complete rendering results under model conditions; (c) Render results without attention module conditions)
图6 ZJU-mocap渲染结果对比((a) Neural Body人物渲染结果;(b) HumanNeRF人物渲染结果;(c)本方法人物渲染结果)
Fig. 6 ZJU-mocap render result comparison ((a) Neural Body character rendering result; (b) HumanNeRF rendering results; (c) Character rendering results of this method)
类型 | 方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
Subject 387 | Neural Body | 31.36 | 0.976 0 | 43.35 |
HumanNeRF | 33.17 | 0.984 7 | 21.24 | |
Ours | 33.21 | 0.985 9 | 19.09 | |
Subject 393 | Neural Body | 32.43 | 0.961 3 | 53.12 |
HumanNeRF | 33.75 | 0.986 1 | 21.69 | |
Ours | 33.72 | 0.986 4 | 21.55 | |
Subject 313 | Neural Body | 27.37 | 0.960 0 | 41.92 |
HumanNeRF | 29.00 | 0.981 3 | 19.23 | |
Ours | 29.07 | 0.984 1 | 18.20 | |
Subject 377 | Neural Body | 32.11 | 0.973 5 | 40.40 |
HumanNeRF | 33.95 | 0.980 7 | 22.44 | |
Ours | 33.97 | 0.984 1 | 19.74 |
表3 ZJU-mocap数据集指标对比
Table 3 Comparison of ZJU-mocap dataset metrics
类型 | 方法 | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|---|
Subject 387 | Neural Body | 31.36 | 0.976 0 | 43.35 |
HumanNeRF | 33.17 | 0.984 7 | 21.24 | |
Ours | 33.21 | 0.985 9 | 19.09 | |
Subject 393 | Neural Body | 32.43 | 0.961 3 | 53.12 |
HumanNeRF | 33.75 | 0.986 1 | 21.69 | |
Ours | 33.72 | 0.986 4 | 21.55 | |
Subject 313 | Neural Body | 27.37 | 0.960 0 | 41.92 |
HumanNeRF | 29.00 | 0.981 3 | 19.23 | |
Ours | 29.07 | 0.984 1 | 18.20 | |
Subject 377 | Neural Body | 32.11 | 0.973 5 | 40.40 |
HumanNeRF | 33.95 | 0.980 7 | 22.44 | |
Ours | 33.97 | 0.984 1 | 19.74 |
[1] | PARK K, SINHA U, HEDMAN P, et al. HyperNeRF: a higher-dimensional representation for topologically varying neural radiance fields[J]. ACM Transactions on Graphics, 2021, 40(6): 238. |
[2] | PENG S D, ZHANG Y Q, XU Y H, et al. Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9050-9059. |
[3] | WENG C Y, CURLESS B, SRINIVASAN P P, et al. HumanNeRF: free-viewpoint rendering of moving people from monocular video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 16189-16199. |
[4] | JIANG W, YI K M, SAMEI G, et al. NeuMan: neural human radiance field from a single video[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 402-418. |
[5] | LIU L J, HABERMANN M, RUDNEV V, et al. Neural actor: neural free-view synthesis of human actors with pose control[J]. ACM Transactions on Graphics, 2021, 40(6): 219. |
[6] | PENG S D, DONG J T, WANG Q Q, et al. Animatable neural radiance fields for modeling dynamic human bodies[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14294-14303. |
[7] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. Seminal Graphics Papers: Pushing the Boundaries, 2023, 2: 88. |
[8] | ZHANG J K, LIU X H, YE X Y, et al. Editable free-viewpoint video using a layered neural representation[J]. ACM Transactions on Graphics, 2021, 40(4): 149. |
[9] | KANADE T, RANDER P, NARAYANAN P J. Virtualized reality: constructing virtual worlds from real scenes[J]. IEEE Multimedia, 1997, 4(1): 34-47. |
[10] | CARRANZA J, THEOBALT C, MAGNOR M A, et al. Free-viewpoint video of human actors[J]. ACM Transactions on Graphics, 2003, 22(3): 569-577. |
[11] | SU S Y, YU F, ZOLLHÖFER M, et al. A-NeRF: articulated neural radiance fields for learning human shape, appearance, and pose[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 939. |
[12] | HU S K, HU T, LIU Z W. GauHuman: articulated Gaussian splatting from monocular human videos[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20418-20431. |
[13] | KERBL B, KOPANAS G, LEIMKÜHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139. |
[14] | ANGUELOV D, SRINIVASAN P, KOLLER D, et al. SCAPE: shape completion and animation of people[J]. ACM Transactions on Graphics, 2005, 24(3): 408-416. |
[15] | PAVLAKOS G, CHOUTAS V, GHORBANI N, et al. Expressive body capture: 3D hands, face, and body from a single image[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10967-10977. |
[16] | BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL: automatic estimation of 3D human pose and shape from a single image[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 561-578. |
[17] | LASSNER C, ROMERO J, KIEFEL M, et al. Unite the people: closing the loop between 3D and 2D human representations[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4704-4713. |
[18] | GÜLER R A, KOKKINOS I. HoloPose: holistic 3D human reconstruction in-the-wild[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10876-10886. |
[19] | KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131. |
[20] | OMRAN M, LASSNER C, PONS-MOLL G, et al. Neural body fitting:unifying deep learning and model based human pose and shape estimation[C]// 2018 International Conference on 3D Vision (3DV). New York: IEEE Press, 2018: 484-494. |
[21] | PAVLAKOS G, ZHU L Y, ZHOU X W, et al. Learning to estimate 3D human pose and shape from a single color image[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 459-468. |
[22] | PISHCHULIN L, INSAFUTDINOV E, TANG S Y, et al. DeepCut: joint subset partition and labeling for multi person pose estimation[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4929-4937. |
[23] | KOLOTOUROS N, PAVLAKOS G, BLACK M, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[24] | KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5252-5262. |
[25] | MAHMOOD N, GHORBANI N, TROJE N F, et al. AMASS: archive of motion capture as surface shapes[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5441-5450. |
[26] | DONG J T, SHUAI Q, ZHANG Y Q, et al. Motion capture from internet videos[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 210-227. |
[27] | ZENG A L, JU X, YANG L, et al. DeciWatch: a simple baseline for 10× efficient 2D and 3D pose estimation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 607-624. |
[28] | SONG Q Y, ZHANG H, LIU Y N, et al. Hybrid attention adaptive sampling network for human pose estimation in videos[J]. Computer Animation & Virtual Worlds, 2024, 35(4): e2244. |
[29] | ZHANG Y X, WANG Y, CAMPS O, et al. Key frame proposal network for efficient pose estimation in videos[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 609-625. |
[30] | MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106. |
[31] | CHEN X, ZHENG Y F, BLACK M J, et al. SNARF: differentiable forward skinning for animating non-rigid neural implicit shapes[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11574-11584. |
[32] | TANCIK M, SRINIVASAN P P, MILDENHALL B, et al. Fourier features let networks learn high frequency functions in low dimensional domains[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 632. |
[33] | PARK K, SINHA U, BARRON J T, et al. Nerfies: deformable neural radiance fields[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 5845-5854. |
[34] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[35] | NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]// The 27th International Conference on Machine Learning (ICML-10). Madison: Omnipress, 2010: 807-814. |
[36] | MAX N. Optical models for direct volume rendering[J]. IEEE Transactions on Visualization and Computer Graphics, 1995, 1(2): 99-108. |
[37] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
[38] |
IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339.
DOI PMID |
[39] | SCHÖNBERGER J L, PRICE T, SATTLER T, et al. A vote-and-verify strategy for fast spatial verification in image retrieval[C]// The 13th Asian Conference on Computer Vision. Cham: Springer, 2016: 321-337. |
[1] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
[2] | 周伟, 苍慜楠, 程浩宗. 基于AR技术的文物数字化三维图像重建方法[J]. 图学学报, 2025, 46(2): 369-381. |
[3] | 谢文想, 许威威. 辐射场表面物点引导的主动视图选择[J]. 图学学报, 2025, 46(1): 179-187. |
[4] | 熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016. |
[5] | 董相涛, 马鑫, 潘成伟, 鲁鹏. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. |
[6] | 王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13. |
[7] | 黄家晖, 穆太江. 动态三维场景重建研究综述[J]. 图学学报, 2024, 45(1): 14-25. |
[8] | 石敏, 王炳祺, 李兆歆, 朱登明. 一种带高光处理的无缝纹理映射方法[J]. 图学学报, 2024, 45(1): 148-158. |
[9] | 周婧怡, 张栖桐, 冯结青. 基于混合结构的多视图三维场景重建[J]. 图学学报, 2024, 45(1): 199-208. |
[10] | 王江安, 黄乐, 庞大为, 秦林珍, 梁温茜. 基于自适应聚合循环递归的稠密点云重建网络[J]. 图学学报, 2024, 45(1): 230-239. |
[11] | 成欢, 王硕, 李孟, 秦伦明, 赵芳. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. |
[12] | 范腾, 杨浩, 尹稳, 周冬明. 基于神经辐射场的多尺度视图合成研究[J]. 图学学报, 2023, 44(6): 1140-1148. |
[13] | 薛皓玮, 王美丽. 融合生物力学约束与多模态数据的手部重建[J]. 图学学报, 2023, 44(4): 794-800. |
[14] | 葛海明, 张维, 王小龙, 朱晶晶, 贾非, 薛亚东. 基于SfM的城市电缆隧道三维重建方法优化研究[J]. 图学学报, 2023, 44(3): 540-550. |
[15] | 王江安, 庞大为, 黄 乐, 秦林珍. 基于多尺度特征递归卷积的稠密点云重建网络 [J]. 图学学报, 2022, 43(5): 875-883. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||