Journal of Graphics ›› 2023, Vol. 44 ›› Issue (1): 139-145.DOI: 10.11996/JG.j.2095-302X.2023010139
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
WANG Yu-ping1(), ZENG Yi1, LI Sheng-hui2, ZHANG Lei3
Received:
2022-04-07
Revised:
2022-07-19
Online:
2023-10-31
Published:
2023-02-16
About author:
WANG Yu-ping (1979-), professor, master. Her main research interests cover machine vision, virtual reality and machine learning. E-mail:wangyupingpaper@163.com
Supported by:
CLC Number:
WANG Yu-ping, ZENG Yi, LI Sheng-hui, ZHANG Lei. A Transformer-based 3D human pose estimation method[J]. Journal of Graphics, 2023, 44(1): 139-145.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023010139
设备 | 参数 |
---|---|
操作系统 | Ubuntu20.04 |
深度学习框架 | Pytorch1.10 |
CUDA版本 | 11.5 |
开发软件 | Pycharm |
CPU | I7-12700KF |
显卡 | 3 090(1块) |
Table 1 Experimental environment
设备 | 参数 |
---|---|
操作系统 | Ubuntu20.04 |
深度学习框架 | Pytorch1.10 |
CUDA版本 | 11.5 |
开发软件 | Pycharm |
CPU | I7-12700KF |
显卡 | 3 090(1块) |
Models | 3DPW | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
HMR [ | 76.7 | 130.0 | - | 37.4 |
SPIN [ | 59.2 | 96.9 | 116.4 | 29.8 |
VIB(direct) | 58.7 | 100.0 | 118.5 | 28.7 |
VIBE | 55.2 | 93.8 | 110.4 | 28.2 |
TR-VIBE(direct) | 58.8 | 100.7 | 126.6 | 32.2 |
TR-VIBE | 53.5 | 86.3 | 101.8 | 25.5 |
Table 2 Comparison of 3DPW experimental results
Models | 3DPW | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
HMR [ | 76.7 | 130.0 | - | 37.4 |
SPIN [ | 59.2 | 96.9 | 116.4 | 29.8 |
VIB(direct) | 58.7 | 100.0 | 118.5 | 28.7 |
VIBE | 55.2 | 93.8 | 110.4 | 28.2 |
TR-VIBE(direct) | 58.8 | 100.7 | 126.6 | 32.2 |
TR-VIBE | 53.5 | 86.3 | 101.8 | 25.5 |
Models | MPI-INF-3DHP | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
HMR [ | 89.8 | 124.2 | - | - |
SPIN [ | 67.5 | 105.2 | - | - |
VIB(direct) | 66.8 | 103.2 | 916.8 | 33.2 |
VIBE | 64.3 | 100.8 | 915.0 | 32.2 |
TR-VIBE(direct) | 66.7 | 102.7 | 915.3 | 34.8 |
TR-VIBE | 64.9 | 99.0 | 907.9 | 30.1 |
Table 3 Comparison of MPI-INF-3DHP experimental results
Models | MPI-INF-3DHP | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
HMR [ | 89.8 | 124.2 | - | - |
SPIN [ | 67.5 | 105.2 | - | - |
VIB(direct) | 66.8 | 103.2 | 916.8 | 33.2 |
VIBE | 64.3 | 100.8 | 915.0 | 32.2 |
TR-VIBE(direct) | 66.7 | 102.7 | 915.3 | 34.8 |
TR-VIBE | 64.9 | 99.0 | 907.9 | 30.1 |
Models | 3DPW | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
VIBE | 55.2 | 93.8 | 110.4 | 28.2 |
VIBE-α | 55.1 | 94.2 | 110.2 | 28.5 |
VIBE-β | 55.0 | 87.7 | 104.2 | 25.8 |
TR-VIBE | 53.5 | 86.3 | 101.8 | 25.5 |
TR-VIBE-α | 55.9 | 92.5 | 110.1 | 28.6 |
TR-VIBE-β | 55.0 | 94.2 | 110.4 | 29.8 |
Table 4 Ablation experiments for LSTM and Transformer
Models | 3DPW | |||
---|---|---|---|---|
PA-MPJPE | MPJPE | PVE | Accel | |
VIBE | 55.2 | 93.8 | 110.4 | 28.2 |
VIBE-α | 55.1 | 94.2 | 110.2 | 28.5 |
VIBE-β | 55.0 | 87.7 | 104.2 | 25.8 |
TR-VIBE | 53.5 | 86.3 | 101.8 | 25.5 |
TR-VIBE-α | 55.9 | 92.5 | 110.1 | 28.6 |
TR-VIBE-β | 55.0 | 94.2 | 110.4 | 29.8 |
[1] | PAVLAKOS G, ZHOU X W, DERPANIS K G, et al. Coarse-to-fine volumetric prediction for single-image 3D human pose[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1263-1272. |
[2] | LI S J, CHAN A B. 3D human pose estimation from monocular images with deep convolutional neural network[M]//Computer Vision - ACCV 2014. Cham: Springer International Publishing, 2015: 332-347. |
[3] | KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5252-5262. |
[4] | PAVLLO D, FEICHTENHOFER C, GRANGIER D, et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7745-7754. |
[5] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2021-12-02].https://arxiv.org/abs/1810.04805. |
[6] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-12-05]. https://arxiv.org/abs/2010.11929. |
[7] | LI K, WANG S J, ZHANG X, et al. Pose recognition with cascade transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1944-1953. |
[8] | ZHENG C, ZHU S J, MENDIETA M, et al. 3D human pose estimation with spatial and temporal transformers[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11636-11645. |
[9] | SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM Network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York: ACM, 2015: 802-810. |
[10] | LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 248. |
[11] | KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131. |
[12] | JIANG Y, CHANG S, WANG Z. Transgan: two pure transformers can make one strong gan, and that can scale up[J]. Advances in Neural Information Processing Systems, 2021, 34: 14745-14758. |
[13] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[14] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-12-02].https://arxiv.org/abs/2010.11929. |
[15] | SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations[EB/OL]. [2021-12-01].https://arxiv.org/abs/1803.02155. |
[16] | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[EB/OL]. [2021-12-02].https://arxiv.org/abs/1910.10683. |
[17] | HUANG C Z A, VASWANI A, USZKOREIT J, et al. Music transformer[EB/OL]. [2021-12-05]. https://arxiv.org/abs/1809.04281. |
[18] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
[19] | HU H, ZHANG Z, XIE Z D, et al. Local relation networks for image recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3463-3472. |
[20] | HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 630-645. |
[21] | KOLOTOUROS N, PAVLAKOS G, BLACK M, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[22] | KANAZAWA A, ZHANG J Y, FELSEN P, et al. Learning 3D human dynamics from video[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5607-5616. |
[23] | KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2022-01-03].https://arxiv.org/abs/1412.6980. |
[24] | MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]// 2017 International Conference on 3D Vision. New York: IEEE Press, 2017: 506-516. |
[25] | MAHMOOD N, GHORBANI N, TROJE N F, et al. AMASS: archive of motion capture As surface shapes[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5441-5450. |
[26] | VON MARCARD T, HENSCHEL R, BLACK M J, et al. Recovering accurate 3D human pose in the wild using IMUs and a moving camera[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 614-631. |
[27] | KOLOTOUROS N, PAVLAKOS G, BLACK M, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261. |
[1] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[2] | SONG Huan-sheng, WEN Ya, SUN Shi-jie, SONG Xiang-yu, ZHANG Chao-yang, LI Xu. Tunnel fire detection based on improved student-teacher network [J]. Journal of Graphics, 2023, 44(5): 978-987. |
[3] | LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism [J]. Journal of Graphics, 2023, 44(4): 658-666. |
[4] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[5] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[6] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[7] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[8] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[9] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[10] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. |
[11] | LUO Yue-tong, YANG Meng-nan, PENG Jun, ZHOU Bo, ZHANG Yan-kong. Study on multi-scale visual analysis method of activated corrosion products in fusion reactor [J]. Journal of Graphics, 2023, 44(3): 588-598. |
[12] | XIE Guo-bo, HE Di-xuan, HE Yu-qin, LIN Zhi-yi. P-CenterNet for chimney detection in optical remote-sensing images [J]. Journal of Graphics, 2023, 44(2): 233-240. |
[13] | XIONG Ju-ju, XU Yang, FAN Run-ze, SUN Shao-cong. Flowers recognition based on lightweight visual transformer [J]. Journal of Graphics, 2023, 44(2): 271-279. |
[14] | CHENG Lang, JING Chao. X-ray image rotating object detection based on improved YOLOv7 [J]. Journal of Graphics, 2023, 44(2): 324-334. |
[15] | CAO Yi-qin, WU Ming-lin, XU Lu. Steel surface defect detection based on improved YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(2): 335-345. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||