图学学报 ›› 2025, Vol. 46 ›› Issue (3): 625-634.DOI: 10.11996/JG.j.2095-302X.2025030625
于冰1,2(), 程广1,2, 黄东晋1,2, 丁友东1,2
收稿日期:
2024-08-23
接受日期:
2024-12-24
出版日期:
2025-06-30
发布日期:
2025-06-13
第一作者:
于冰(1989-),男,讲师,博士。主要研究方向为图像处理、深度学习。E-mail:yubing@shu.edu.cn
基金资助:
YU Bing1,2(), CHENG Guang1,2, HUANG Dongjin1,2, DING Youdong1,2
Received:
2024-08-23
Accepted:
2024-12-24
Published:
2025-06-30
Online:
2025-06-13
First author:
YU Bing (1989-), lecturer, Ph.D. His main research interests cover image processing, deep learning. E-mail:yubing@shu.edu.cn
Supported by:
摘要:
三维人体网格重建在计算机视觉、动画制作和虚拟现实等领域具有重要的应用价值。然而,目前大多数方法主要聚焦于单幅图像的三维人体重建,如何从视频数据中准确、平滑地重建三维人体动作仍然是一个难题。为此,提出了一种双流网络融合结构,以三维人体姿态为中介,在视频数据中实现三维人体网格重建。首先,利用三维姿态估计流网络对输入视频进行三维关节点估计,获得精确的关节信息;其次,通过时序特征聚合流网络提取视频的时序图像特征,捕获人体运动位置信息和时序姿态特征信息;最后,设计融合解码器,将三维关节点、时序图像特征与SMPL模板提供的网格结构进行回归,预测三维网格顶点坐标。实验结果表明,该方法相对于MPS-Net方法具有更好的预测精度,在3DPW数据集上比MPS-Net的平均关节位置误差(MPJPE)低了9.3%;在MPI-INF-3DHP数据集上比MPS-Net的MPJPE低了9.2%,同时重建结果在视觉效果上更为合理,展现出更高的准确性和平滑性。
中图分类号:
于冰, 程广, 黄东晋, 丁友东. 基于双流网络融合的三维人体网格重建[J]. 图学学报, 2025, 46(3): 625-634.
YU Bing, CHENG Guang, HUANG Dongjin, DING Youdong. 3D human mesh reconstruction based on dual-stream network fusion[J]. Journal of Graphics, 2025, 46(3): 625-634.
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
HMMR[ | 116.5 | 72.6 | 139.3 | 15.2 |
MEVA[ | 86.9 | 54.7 | - | 11.6 |
VIBE[ | 91.9 | 57.6 | 99.1 | 25.4 |
TCMR[ | 86.5 | 52.7 | 102.9 | 7.1 |
MPS-Net[ | 84.3 | 52.1 | 99.7 | 7.4 |
GLOT[ | 80.7 | 50.6 | 96.3 | 6.6 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
表1 3DPW上实验结果对比
Table 1 Comparison of experimental results on 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
HMMR[ | 116.5 | 72.6 | 139.3 | 15.2 |
MEVA[ | 86.9 | 54.7 | - | 11.6 |
VIBE[ | 91.9 | 57.6 | 99.1 | 25.4 |
TCMR[ | 86.5 | 52.7 | 102.9 | 7.1 |
MPS-Net[ | 84.3 | 52.1 | 99.7 | 7.4 |
GLOT[ | 80.7 | 50.6 | 96.3 | 6.6 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 96.4 | 65.4 | 11.1 |
VIBE[ | 103.9 | 68.9 | 27.3 |
TCMR[ | 97.6 | 63.5 | 8.5 |
MPS-Net[ | 96.7 | 62.8 | 9.6 |
GLOT[ | 93.9 | 61.5 | 7.9 |
本文方法 | 87.7 | 54.5 | 7.1 |
表2 MPI-INF-3DHP上实验结果对比
Table 2 Comparison of experimental results on MPI-INF-3DHP
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 96.4 | 65.4 | 11.1 |
VIBE[ | 103.9 | 68.9 | 27.3 |
TCMR[ | 97.6 | 63.5 | 8.5 |
MPS-Net[ | 96.7 | 62.8 | 9.6 |
GLOT[ | 93.9 | 61.5 | 7.9 |
本文方法 | 87.7 | 54.5 | 7.1 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 76.0 | 53.2 | 15.3 |
VIBE[ | 65.9 | 41.5 | 18.3 |
TCMR[ | 62.3 | 41.1 | 5.3 |
MPS-Net[ | 69.4 | 47.4 | 3.6 |
GLOT[ | 67.0 | 46.3 | 3.6 |
本文方法 | 57.9 | 38.9 | 3.3 |
表3 Human3.6M上实验结果对比
Table 3 Comparison of experimental results on Human3.6M
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 76.0 | 53.2 | 15.3 |
VIBE[ | 65.9 | 41.5 | 18.3 |
TCMR[ | 62.3 | 41.1 | 5.3 |
MPS-Net[ | 69.4 | 47.4 | 3.6 |
GLOT[ | 67.0 | 46.3 | 3.6 |
本文方法 | 57.9 | 38.9 | 3.3 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
PQ-GCN[ | 89.2 | 58.3 | - |
Pose2Mesh[ | 88.9 | 58.3 | 22.6 |
GTRS[ | 88.5 | 58.9 | 25.0 |
HybrIK[ | 81.0 | 76.0 | 7.1 |
NIKI[ | 85.5 | 53.5 | - |
ReFit[ | 71.0 | 43.9 | - |
本文方法 | 74.6 | 47.7 | 6.9 |
表4 与基于图像方法在3DPW上的实验结果对比
Table 4 Comparison of experimental results with image-based methods on 3DPW
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
PQ-GCN[ | 89.2 | 58.3 | - |
Pose2Mesh[ | 88.9 | 58.3 | 22.6 |
GTRS[ | 88.5 | 58.9 | 25.0 |
HybrIK[ | 81.0 | 76.0 | 7.1 |
NIKI[ | 85.5 | 53.5 | - |
ReFit[ | 71.0 | 43.9 | - |
本文方法 | 74.6 | 47.7 | 6.9 |
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ |
---|---|---|---|
F(baseline) | 84.3 | 52.1 | 99.7 |
F+ST | 79.1 | 49.6 | 93.6 |
F+Dec | 81.3 | 50.2 | 96.5 |
本文方法 | 76.5 | 46.7 | 90.8 |
表5 3DPW上添加不同模块的消融实验
Table 5 Ablation experiments with different modules added on the 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ |
---|---|---|---|
F(baseline) | 84.3 | 52.1 | 99.7 |
F+ST | 79.1 | 49.6 | 93.6 |
F+Dec | 81.3 | 50.2 | 96.5 |
本文方法 | 76.5 | 46.7 | 90.8 |
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
Crop | 80.2 | 50.2 | 96.4 | 7.3 |
Crop+Bbox | 78.1 | 47.9 | 94.1 | 7.4 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
表6 3DPW上三维姿态估计消融实验
Table 6 Three-dimensional attitude estimation ablation experiments on the 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
Crop | 80.2 | 50.2 | 96.4 | 7.3 |
Crop+Bbox | 78.1 | 47.9 | 94.1 | 7.4 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
图5 不同方法在3DPW数据集上的重建结果((a)输入;(b) MPS-Net;(c) GLOT;(d)本文方法)
Fig. 5 Reconstruction results of different methods on the 3DPW dataset ((a) Input; (b) MPS-Net; (c) GLOT; (d) Ours)
图6 不同方法在挑战视频上的重建结果((a)输入;(b) MPS-Net;(c) GLOT;(d)本文方法)
Fig. 6 Reconstruction results of different methods on challenge videos ((a) Input; (b) MPS-Net; (c) GLOT; (d) Ours)
图7 本文方法在遮挡视频上的重建结果((a)原始画面;(b)重建画面)
Fig. 7 Reconstruction results of the proposed method on occluded videos ((a) Original image; (b) Reconstructed image)
[1] | WANG J B, TAN S J, ZHEN X T, et al. Deep 3D human pose estimation: a review[J]. Computer Vision and Image Understanding, 2021, 210: 103225. |
[2] | DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978. |
[3] |
吕衡, 杨鸿宇. 一种基于时空运动信息交互建模的三维人体姿态估计方法[J]. 图学学报, 2024, 45(1): 159-168.
DOI |
LV H, YANG H Y. A 3D human pose estimation approach based on spatio-temporal motion interaction modeling[J]. Journal of Graphics, 2024, 45(1): 159-168 (in Chinese).
DOI |
|
[4] | LUO Z Y, GOLESTANEH S A, KITANI K M. 3D human motion estimation via motion compression and refinement[C]// The 15th Asian Conference on Computer Vision. Cham: Springer, 2021: 324-340. |
[5] | SUN Y, YE Y, LIU W, et al. Human mesh recovery from monocular images via a skeleton-disentangled representation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5349-5358. |
[6] | KANAZAWA A, ZHANG J Y, FELSEN P, et al. Learning 3D human dynamics from video[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5614-5623. |
[7] |
王玉萍, 曾毅, 李胜辉, 等. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145.
DOI |
WANG Y P, ZENG Y, LI S H, et al. A Transformer-based 3D human pose estimation method[J]. Journal of Graphics, 2023, 44(1): 139-145 (in Chinese). | |
[8] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 248. |
[9] | ANGUELOV D, SRINIVASAN P, KOLLER D, et al. SCAPE: shape completion and animation of people[C]// ACM SIGGRAPH 2005 Papers. New York: ACM, 2005: 408-416. |
[10] | OSMAN A A A, BOLKART T, BLACK M J. STAR: sparse trained articulated human body regressor[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 598-613. |
[11] | KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131. |
[12] | ZHANG J L, TU Z G, YANG J Y, et al. MixSTE: seq2seq mixed spatio-temporal encoder for 3D human pose estimation in video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13232-13242. |
[13] |
黄友文, 林志钦, 章劲, 等. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
DOI |
HUANG Y W, LIN Z Q, ZHANG J, et al. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527 (in Chinese).
DOI |
|
[14] | LI Z, CHEN L L, LIU C L, et al. 3D human avatar digitization from a single image[C]// The 17th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry. New York: ACM, 2019: 12. |
[15] | DEY R, SALEM F M. Gate-variants of gated recurrent unit (GRU) neural networks[C]// The 60th IEEE International Midwest Symposium on Circuits and Systems. New York: IEEE Press, 2017: 1597-1600. |
[16] | WANG J, HU Y Z. An improved enhancement algorithm based on CNN applicable for weak contrast images[J]. IEEE Access, 2020, 8: 8459-8476. |
[17] | LI W H, LIU H, TANG H, et al. MHFormer: multi-hypothesis transformer for 3D human pose estimation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13147-13156. |
[18] | WAN Z N, LI Z J, TIAN M Q, et al. Encoder-decoder with multi-level attention for 3D human shape and pose estimation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 13033-13042. |
[19] | KISSOS I, FRITZ L, GOLDMAN M, et al. Beyond weak perspective for monocular 3D human pose estimation[C]// Computer Vision-ECCV 2020 Workshops. Cham: Springer, 2020: 541-554. |
[20] | SHEN X L, YANG Z X, WANG X H, et al. Global-to-local modeling for video-based 3D human pose and shape estimation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8887-8896. |
[21] | RONG Y, LIU Z W, LI C, et al. Delving deep into hybrid annotations for 3D human recovery in the wild[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5340-5348. |
[22] | LI Z W, XU B, HUANG H, et al. Deep two-stream video inference for human body pose and shape estimation[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 430-439. |
[23] | ZHANG Z X, LU X Q, CAO G J, et al. ViT-YOLO: transformer-based YOLO for object detection[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2799-2808. |
[24] | LI Z H, LIU J Z, ZHANG Z S, et al. CLIFF: carrying location information in full frames into human pose and shape estimation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 590-606. |
[25] | ZHENG C, MENDIETA M, WANG P, et al. A lightweight graph transformer network for human mesh reconstruction from 2D human pose[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 5496-5507. |
[26] | CHENG G, HUANG Y, YU B. Recurrent transformer for 3D human pose estimation[C]// The 4th International Conference on Big Data & Artificial Intelligence & Software Engineering. New York: IEEE Press, 2023: 207-210. |
[27] | KOLOTOUROS N, PAVLAKOS G, BLACK M J, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[28] | CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2024-12-24]https://arxiv.org/abs/1406.1078. |
[29] | VON MARCARD T, HENSCHEL R, BLACK M J, et al. Recovering accurate 3D human pose in the wild using IMUs and a moving camera[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 601-617. |
[30] | MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]// 2017 International Conference on 3D Vision. New York: IEEE Press, 2017: 506-516. |
[31] |
IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339.
DOI PMID |
[32] | WEI W L, LIN J C, LIU T L, et al. Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13211-13220. |
[33] | KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5253-5263. |
[34] | CHOI H, MOON G, LEE K M. Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 769-787. |
[35] | LI J F, XU C, CHEN Z C, et al. HybrIK: a hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 3383-3393. |
[36] | LI J F, BIAN S Y, LIU Q, et al. NIKI: neural inverse kinematics with invertible neural networks for 3D human pose and shape estimation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12933-12942. |
[37] | WANG Y F, DANIILIDIS K. ReFit: recurrent fitting network for 3D human recovery[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 14644-14654. |
[1] | 牛杭, 葛鑫雨, 赵晓瑜, 杨珂, 王乾铭, 翟永杰. 基于改进YOLOv8的防振锤缺陷目标检测算法[J]. 图学学报, 2025, 46(3): 532-541. |
[2] | 雷玉林, 刘利刚. 基于深度强化学习的可缓冲的物体运输和装箱[J]. 图学学报, 2025, 46(3): 697-708. |
[3] | 张立立, 杨康, 张珂, 魏薇, 李晶, 谭洪鑫, 张翔宇. 面向柴油车辆排放黑烟的改进型YOLOv8检测算法研究[J]. 图学学报, 2025, 46(2): 249-258. |
[4] | 崔克彬, 耿佳昌. 基于EE-YOLOv8s的多场景火灾迹象检测算法[J]. 图学学报, 2025, 46(1): 13-27. |
[5] | 陈冠豪, 徐丹, 贺康建, 施洪贞, 张浩. 基于转置注意力和CNN的图像超分辨率重建网络[J]. 图学学报, 2025, 46(1): 35-46. |
[6] | 张文祥, 王夏黎, 王欣仪, 杨宗宝. 一种强化伪造区域关注的深度伪造人脸检测方法[J]. 图学学报, 2025, 46(1): 47-58. |
[7] | 苑朝, 赵明雪, 张丰羿, 冯晓勇, 李冰, 陈瑞. 基于点云特征增强的复杂室内场景3D目标检测[J]. 图学学报, 2025, 46(1): 59-69. |
[8] | 卢洋, 陈林慧, 姜晓恒, 徐明亮. SDENet:基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. |
[9] | 胡凤阔, 叶兰, 谭显峰, 张钦展, 胡志新, 方清, 王磊, 满孝锋. 一种基于改进YOLOv8的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900. |
[10] | 刘义艳, 郝婷楠, 贺晨, 常英杰. 基于DBBR-YOLO的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921. |
[11] | 吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929. |
[12] | 刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956. |
[13] | 章东平, 魏杨悦, 何数技, 徐云超, 胡海苗, 黄文君. 特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978. |
[14] | 谢国波, 林松泽, 林志毅, 吴陈锋, 梁立辉. 基于改进YOLOv7-tiny的道路病害检测算法[J]. 图学学报, 2024, 45(5): 987-997. |
[15] | 熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||