[1] |
FEICHTENHOFER C, FAN H Q, MALIK J, et al. SlowFast networks for video recognition[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6201-6210.
|
[2] |
毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
DOI
|
|
BI C Y, LIU Y. A survey of video human action recognition based on deep learning[J]. Journal of Graphics, 2023, 44(4): 625-639 (in Chinese).
|
[3] |
XU J W, YU Z B, NI B B, et al. Deep kinematics analysis for monocular 3D human pose estimation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 896-905.
|
[4] |
DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2959-2968.
|
[5] |
李松洋, 王雪婷, 陈相龙, 等. 基于骨骼点动态时域滤波的人体动作识别[J]. 图学学报, 2024, 45(4): 760-769.
DOI
|
|
LI S Y, WANG X T, CHEN X L, et al. Human action recognition based on skeleton dynamic temporal filter[J]. Journal of Graphics, 2024, 45(4): 760-769 (in Chinese).
DOI
|
[6] |
LU P, JIANG T, LI Y N, et al. RTMO: towards high-performance one-stage real-time multi-person pose estimation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 1491-1500.
|
[7] |
LI S, LI W Q, COOK C, et al. Independently recurrent neural network: building a longer and deeper RNN[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5457-5466.
|
[8] |
BANERJEE A, SINGH P K, SARKAR R. Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(6): 2206-2216.
|
[9] |
LIU Z Y, ZHANG H W, CHEN Z H, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 140-149.
|
[10] |
ZHANG Y H, WU B, LI W, et al. STST: spatial-temporal specialized transformer for skeleton-based action recognition[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3229-3237.
|
[11] |
HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 15979-15988.
|
[12] |
WU W H, HUA Y L, ZHENG C, et al. Skeletonmae: spatial-temporal masked autoencoders for self-supervised skeleton action recognition[C]// 2023 IEEE International Conference on Multimedia and Expo Workshops. New York: IEEE Press, 2023: 224-229.
|
[13] |
KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2022-12-10)[2024-06-27]. https://arxiv.org/abs/1312.6114.
|
[14] |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2019-05-24)[2024-06-27]. https://arxiv.org/abs/1810.04805.
|
[15] |
TONG Z, SONG Y B, WANG J, et al. VideoMAE: masked autoencoders are data-efficient learners for self-supervised video pre-training[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 732.
|
[16] |
QING Z W, ZHANG S W, HUANG Z Y, et al. MAR: masked autoencoders for efficient action recognition[J]. IEEE Transactions on Multimedia, 2024, 26: 218-233.
|
[17] |
HIGGINS I, MATTHEY L, PAL A, et al. β-VAE: learning basic visual concepts with a constrained variational framework[EB/OL]. [2024-06-27]. https://openreview.net/pdf?id=Sy2fzU9gl.
|
[18] |
SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+D: a large scale dataset for 3D human activity analysis[C]// IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 1010-1019.
|
[19] |
LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(10): 2684-2701.
DOI
PMID
|
[20] |
PASZKE A, GROSS S, MASSA F, et al. PyTorch:an imperative style, high-performance deep learning library[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 721.
|
[21] |
ZHANG H Y, HOU Y H, ZHANG W J, et al. Contrastive positive mining for unsupervised 3D action representation learning[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 36-51.
|
[22] |
HUA Y L, WU W H, ZHENG C, et al. Part aware contrastive learning for self-supervised action recognition[EB/OL]. (2023-05-11)[2024-06-27]. https://arxiv.org/abs/2305.00666.
|
[23] |
LI L G, WANG M S, NI B B, et al. 3D human action representation learning via cross-view consistency pursuit[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4739-4748.
|
[24] |
GUO T Y, LIU H, CHEN Z, et al. Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition[C]// The 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 762-770.
|
[25] |
CHEN Y X, ZHAO L, YUAN J B, et al. Hierarchically self-supervised transformer for human skeleton representation learning[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 185-202.
|