[1] |
PAVLAKOS G, ZHOU X W, DERPANIS K G, et al. Coarse-to-fine volumetric prediction for single-image 3D human pose[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1263-1272.
|
[2] |
LI S J, CHAN A B. 3D human pose estimation from monocular images with deep convolutional neural network[M]//Computer Vision - ACCV 2014. Cham: Springer International Publishing, 2015: 332-347.
|
[3] |
KOCABAS M, ATHANASIOU N, BLACK M J. VIBE: video inference for human body pose and shape estimation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5252-5262.
|
[4] |
PAVLLO D, FEICHTENHOFER C, GRANGIER D, et al. 3D human pose estimation in video with temporal convolutions and semi-supervised training[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7745-7754.
|
[5] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2021-12-02].https://arxiv.org/abs/1810.04805.
|
[6] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-12-05]. https://arxiv.org/abs/2010.11929.
|
[7] |
LI K, WANG S J, ZHANG X, et al. Pose recognition with cascade transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1944-1953.
|
[8] |
ZHENG C, ZHU S J, MENDIETA M, et al. 3D human pose estimation with spatial and temporal transformers[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11636-11645.
|
[9] |
SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM Network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York: ACM, 2015: 802-810.
|
[10] |
LOPER M, MAHMOOD N, ROMERO J, et al. SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34(6): 248.
|
[11] |
KANAZAWA A, BLACK M J, JACOBS D W, et al. End-to-end recovery of human shape and pose[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7122-7131.
|
[12] |
JIANG Y, CHANG S, WANG Z. Transgan: two pure transformers can make one strong gan, and that can scale up[J]. Advances in Neural Information Processing Systems, 2021, 34: 14745-14758.
|
[13] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
|
[14] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2021-12-02].https://arxiv.org/abs/2010.11929.
|
[15] |
SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations[EB/OL]. [2021-12-01].https://arxiv.org/abs/1803.02155.
|
[16] |
RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[EB/OL]. [2021-12-02].https://arxiv.org/abs/1910.10683.
|
[17] |
HUANG C Z A, VASWANI A, USZKOREIT J, et al. Music transformer[EB/OL]. [2021-12-05]. https://arxiv.org/abs/1809.04281.
|
[18] |
LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002.
|
[19] |
HU H, ZHANG Z, XIE Z D, et al. Local relation networks for image recognition[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3463-3472.
|
[20] |
HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 630-645.
|
[21] |
KOLOTOUROS N, PAVLAKOS G, BLACK M, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261.
|
[22] |
KANAZAWA A, ZHANG J Y, FELSEN P, et al. Learning 3D human dynamics from video[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5607-5616.
|
[23] |
KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. [2022-01-03].https://arxiv.org/abs/1412.6980.
|
[24] |
MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]// 2017 International Conference on 3D Vision. New York: IEEE Press, 2017: 506-516.
|
[25] |
MAHMOOD N, GHORBANI N, TROJE N F, et al. AMASS: archive of motion capture As surface shapes[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5441-5450.
|
[26] |
VON MARCARD T, HENSCHEL R, BLACK M J, et al. Recovering accurate 3D human pose in the wild using IMUs and a moving camera[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 614-631.
|
[27] |
KOLOTOUROS N, PAVLAKOS G, BLACK M, et al. Learning to reconstruct 3D human pose and shape via model-fitting in the loop[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 2252-2261.
|