Journal of Graphics ›› 2025, Vol. 46 ›› Issue (3): 625-634.DOI: 10.11996/JG.j.2095-302X.2025030625
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
YU Bing1,2(), CHENG Guang1,2, HUANG Dongjin1,2, DING Youdong1,2
Received:
2024-08-23
Accepted:
2024-12-24
Online:
2025-06-30
Published:
2025-06-13
About author:
First author contact:YU Bing (1989-), lecturer, Ph.D. His main research interests cover image processing, deep learning. E-mail:yubing@shu.edu.cn
Supported by:
CLC Number:
YU Bing, CHENG Guang, HUANG Dongjin, DING Youdong. 3D human mesh reconstruction based on dual-stream network fusion[J]. Journal of Graphics, 2025, 46(3): 625-634.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025030625
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
HMMR[ | 116.5 | 72.6 | 139.3 | 15.2 |
MEVA[ | 86.9 | 54.7 | - | 11.6 |
VIBE[ | 91.9 | 57.6 | 99.1 | 25.4 |
TCMR[ | 86.5 | 52.7 | 102.9 | 7.1 |
MPS-Net[ | 84.3 | 52.1 | 99.7 | 7.4 |
GLOT[ | 80.7 | 50.6 | 96.3 | 6.6 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
Table 1 Comparison of experimental results on 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
HMMR[ | 116.5 | 72.6 | 139.3 | 15.2 |
MEVA[ | 86.9 | 54.7 | - | 11.6 |
VIBE[ | 91.9 | 57.6 | 99.1 | 25.4 |
TCMR[ | 86.5 | 52.7 | 102.9 | 7.1 |
MPS-Net[ | 84.3 | 52.1 | 99.7 | 7.4 |
GLOT[ | 80.7 | 50.6 | 96.3 | 6.6 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 96.4 | 65.4 | 11.1 |
VIBE[ | 103.9 | 68.9 | 27.3 |
TCMR[ | 97.6 | 63.5 | 8.5 |
MPS-Net[ | 96.7 | 62.8 | 9.6 |
GLOT[ | 93.9 | 61.5 | 7.9 |
本文方法 | 87.7 | 54.5 | 7.1 |
Table 2 Comparison of experimental results on MPI-INF-3DHP
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 96.4 | 65.4 | 11.1 |
VIBE[ | 103.9 | 68.9 | 27.3 |
TCMR[ | 97.6 | 63.5 | 8.5 |
MPS-Net[ | 96.7 | 62.8 | 9.6 |
GLOT[ | 93.9 | 61.5 | 7.9 |
本文方法 | 87.7 | 54.5 | 7.1 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 76.0 | 53.2 | 15.3 |
VIBE[ | 65.9 | 41.5 | 18.3 |
TCMR[ | 62.3 | 41.1 | 5.3 |
MPS-Net[ | 69.4 | 47.4 | 3.6 |
GLOT[ | 67.0 | 46.3 | 3.6 |
本文方法 | 57.9 | 38.9 | 3.3 |
Table 3 Comparison of experimental results on Human3.6M
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
MEVA[ | 76.0 | 53.2 | 15.3 |
VIBE[ | 65.9 | 41.5 | 18.3 |
TCMR[ | 62.3 | 41.1 | 5.3 |
MPS-Net[ | 69.4 | 47.4 | 3.6 |
GLOT[ | 67.0 | 46.3 | 3.6 |
本文方法 | 57.9 | 38.9 | 3.3 |
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
PQ-GCN[ | 89.2 | 58.3 | - |
Pose2Mesh[ | 88.9 | 58.3 | 22.6 |
GTRS[ | 88.5 | 58.9 | 25.0 |
HybrIK[ | 81.0 | 76.0 | 7.1 |
NIKI[ | 85.5 | 53.5 | - |
ReFit[ | 71.0 | 43.9 | - |
本文方法 | 74.6 | 47.7 | 6.9 |
Table 4 Comparison of experimental results with image-based methods on 3DPW
Method | MPJPE↓ | P-MPJPE↓ | ACCEL↓ |
---|---|---|---|
PQ-GCN[ | 89.2 | 58.3 | - |
Pose2Mesh[ | 88.9 | 58.3 | 22.6 |
GTRS[ | 88.5 | 58.9 | 25.0 |
HybrIK[ | 81.0 | 76.0 | 7.1 |
NIKI[ | 85.5 | 53.5 | - |
ReFit[ | 71.0 | 43.9 | - |
本文方法 | 74.6 | 47.7 | 6.9 |
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ |
---|---|---|---|
F(baseline) | 84.3 | 52.1 | 99.7 |
F+ST | 79.1 | 49.6 | 93.6 |
F+Dec | 81.3 | 50.2 | 96.5 |
本文方法 | 76.5 | 46.7 | 90.8 |
Table 5 Ablation experiments with different modules added on the 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ |
---|---|---|---|
F(baseline) | 84.3 | 52.1 | 99.7 |
F+ST | 79.1 | 49.6 | 93.6 |
F+Dec | 81.3 | 50.2 | 96.5 |
本文方法 | 76.5 | 46.7 | 90.8 |
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
Crop | 80.2 | 50.2 | 96.4 | 7.3 |
Crop+Bbox | 78.1 | 47.9 | 94.1 | 7.4 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
Table 6 Three-dimensional attitude estimation ablation experiments on the 3DPW
Method | MPJPE↓ | P-MPJPE↓ | MPVPE↓ | ACCEL↓ |
---|---|---|---|---|
Crop | 80.2 | 50.2 | 96.4 | 7.3 |
Crop+Bbox | 78.1 | 47.9 | 94.1 | 7.4 |
本文方法 | 76.5 | 46.7 | 90.8 | 6.2 |
[1] | WANG J B, TAN S J, ZHEN X T, et al. Deep 3D human pose estimation: a review[J]. Computer Vision and Image Understanding, 2021, 210: 103225. |
[2] | DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978. |
[3] |
吕衡, 杨鸿宇. 一种基于时空运动信息交互建模的三维人体姿态估计方法[J]. 图学学报, 2024, 45(1): 159-168.
DOI |
LV H, YANG H Y. A 3D human pose estimation approach based on spatio-temporal motion interaction modeling[J]. Journal of Graphics, 2024, 45(1): 159-168 (in Chinese).
DOI |
|
[4] | LUO Z Y, GOLESTANEH S A, KITANI K M. 3D human motion estimation via motion compression and refinement[C]// The 15th Asian Conference on Computer Vision. Cham: Springer, 2021: 324-340. |
[5] | SUN Y, YE Y, LIU W, et al. Human mesh recovery from monocular images via a skeleton-disentangled representation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5349-5358. |
[6] | KANAZAWA A, ZHANG J Y, FELSEN P, et al. Learning 3D human dynamics from video[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5614-5623. |
[7] |
王玉萍, 曾毅, 李胜辉, 等. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145.
DOI |
WANG Y P, ZENG Y, LI S H, et al. A Transformer-based 3D human pose estimation method[J]. Journal of Graphics, 2023, 44(1): 139-145 (in Chinese). | |
[8] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 248. |
[9] | ANGUELOV D, SRINIVASAN P, KOLLER D, et al. SCAPE: shape completion and animation of people[C]// ACM SIGGRAPH 2005 Papers. New York: ACM, 2005: 408-416. |
[10] | OSMAN A A A, BOLKART T, BLACK M J. STAR: sparse trained articulated human body regressor[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 598-613. |
[11] | KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131. |
[12] | ZHANG J L, TU Z G, YANG J Y, et al. MixSTE: seq2seq mixed spatio-temporal encoder for 3D human pose estimation in video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13232-13242. |
[13] |
黄友文, 林志钦, 章劲, 等. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
DOI |
HUANG Y W, LIN Z Q, ZHANG J, et al. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527 (in Chinese).
DOI |
|
[14] | LI Z, CHEN L L, LIU C L, et al. 3D human avatar digitization from a single image[C]// The 17th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry. New York: ACM, 2019: 12. |
[15] | DEY R, SALEM F M. Gate-variants of gated recurrent unit (GRU) neural networks[C]// The 60th IEEE International Midwest Symposium on Circuits and Systems. New York: IEEE Press, 2017: 1597-1600. |
[16] | WANG J, HU Y Z. An improved enhancement algorithm based on CNN applicable for weak contrast images[J]. IEEE Access, 2020, 8: 8459-8476. |
[17] | LI W H, LIU H, TANG H, et al. MHFormer: multi-hypothesis transformer for 3D human pose estimation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13147-13156. |
[18] | WAN Z N, LI Z J, TIAN M Q, et al. Encoder-decoder with multi-level attention for 3D human shape and pose estimation[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 13033-13042. |
[19] | KISSOS I, FRITZ L, GOLDMAN M, et al. Beyond weak perspective for monocular 3D human pose estimation[C]// Computer Vision-ECCV 2020 Workshops. Cham: Springer, 2020: 541-554. |
[20] | SHEN X L, YANG Z X, WANG X H, et al. Global-to-local modeling for video-based 3D human pose and shape estimation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8887-8896. |
[21] | RONG Y, LIU Z W, LI C, et al. Delving deep into hybrid annotations for 3D human recovery in the wild[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5340-5348. |
[22] | LI Z W, XU B, HUANG H, et al. Deep two-stream video inference for human body pose and shape estimation[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 430-439. |
[23] | ZHANG Z X, LU X Q, CAO G J, et al. ViT-YOLO: transformer-based YOLO for object detection[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2799-2808. |
[24] | LI Z H, LIU J Z, ZHANG Z S, et al. CLIFF: carrying location information in full frames into human pose and shape estimation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 590-606. |
[25] | ZHENG C, MENDIETA M, WANG P, et al. A lightweight graph transformer network for human mesh reconstruction from 2D human pose[C]// The 30th ACM International Conference on Multimedia. New York: ACM, 2022: 5496-5507. |
[26] | CHENG G, HUANG Y, YU B. Recurrent transformer for 3D human pose estimation[C]// The 4th International Conference on Big Data & Artificial Intelligence & Software Engineering. New York: IEEE Press, 2023: 207-210. |
[27] | KOLOTOUROS N, PAVLAKOS G, BLACK M J, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[28] | CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2024-12-24]https://arxiv.org/abs/1406.1078. |
[29] | VON MARCARD T, HENSCHEL R, BLACK M J, et al. Recovering accurate 3D human pose in the wild using IMUs and a moving camera[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 601-617. |
[30] | MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]// 2017 International Conference on 3D Vision. New York: IEEE Press, 2017: 506-516. |
[31] |
IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339.
DOI PMID |
[32] | WEI W L, LIN J C, LIU T L, et al. Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 13211-13220. |
[33] | KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5253-5263. |
[34] | CHOI H, MOON G, LEE K M. Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 769-787. |
[35] | LI J F, XU C, CHEN Z C, et al. HybrIK: a hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 3383-3393. |
[36] | LI J F, BIAN S Y, LIU Q, et al. NIKI: neural inverse kinematics with invertible neural networks for 3D human pose and shape estimation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 12933-12942. |
[37] | WANG Y F, DANIILIDIS K. ReFit: recurrent fitting network for 3D human recovery[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 14644-14654. |
[1] | ZENG Zhichao, XU Yue, WANG Jingyu, YE Yuanlong, HUANG Zhikai, WANG Huan. A water surface target detection algorithm based on SOE-YOLO lightweight network [J]. Journal of Graphics, 2024, 45(4): 736-744. |
[2] | ZHAO Lei, LI Dong, FANG Jiandong, CAO Qi. Improved YOLO object detection algorithm for traffic signs [J]. Journal of Graphics, 2024, 45(4): 779-790. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||