Journal of Graphics ›› 2024, Vol. 45 ›› Issue (4): 760-769.DOI: 10.11996/JG.j.2095-302X.2024040760
• Image Processing and Computer Vision • Previous Articles Next Articles
LI Songyang(), WANG Xueting, CHEN Xianglong, CHEN Enqing(
)
Received:
2023-10-08
Accepted:
2024-02-20
Online:
2024-08-31
Published:
2024-09-03
Contact:
CHEN Enqing
About author:
First author contact:LI Songyang (1998-), master student. His main research interests cover computer vision and pattern recognition. E-mail:lisongyang1998@gs.zzu.edu.cn
Supported by:
CLC Number:
LI Songyang, WANG Xueting, CHEN Xianglong, CHEN Enqing. Human action recognition based on skeleton dynamic temporal filter[J]. Journal of Graphics, 2024, 45(4): 760-769.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024040760
层数 | 输入通道 | 输出通道 | 参数量 |
---|---|---|---|
1 | 3 | 64 | 1792 |
2, 3, 4 | 64 | 64 | 36928 |
5 | 64 | 128 | 73856 |
6, 7 | 128 | 128 | 147584 |
8 | 128 | 256 | 295168 |
9, 10 | 256 | 256 | 590080 |
合计 | 3 | 256 | 1145408 |
Table 1 Convolution parameter quantities in AGCN
层数 | 输入通道 | 输出通道 | 参数量 |
---|---|---|---|
1 | 3 | 64 | 1792 |
2, 3, 4 | 64 | 64 | 36928 |
5 | 64 | 128 | 73856 |
6, 7 | 128 | 128 | 147584 |
8 | 128 | 256 | 295168 |
9, 10 | 256 | 256 | 590080 |
合计 | 3 | 256 | 1145408 |
层数 | 输入通道 | 输出通道 | 参数量 |
---|---|---|---|
1 | 3 | 64 | 1792 |
2, 3, 4 | 300 | 302 | 90902 |
5 | 64 | 128 | 73856 |
6, 7 | 150 | 152 | 22952 |
8 | 128 | 256 | 295168 |
9, 10 | 75 | 76 | 5776 |
合计 | 3 | 256 | 490446 |
Table 2 Convolution parameter quantities in AGCN-SDTF
层数 | 输入通道 | 输出通道 | 参数量 |
---|---|---|---|
1 | 3 | 64 | 1792 |
2, 3, 4 | 300 | 302 | 90902 |
5 | 64 | 128 | 73856 |
6, 7 | 150 | 152 | 22952 |
8 | 128 | 256 | 295168 |
9, 10 | 75 | 76 | 5776 |
合计 | 3 | 256 | 490446 |
激活函数 | 准确率/% |
---|---|
None | 85.4 |
Sigmoid | 86.4 |
Tanh | 84.8 |
Softmax | 86.1 |
LeakyReLU | 84.5 |
ReLU | 86.9 |
Table 3 Influence of activation function selection on model performance
激活函数 | 准确率/% |
---|---|
None | 85.4 |
Sigmoid | 86.4 |
Tanh | 84.8 |
Softmax | 86.1 |
LeakyReLU | 84.5 |
ReLU | 86.9 |
组件 | 准确率/% |
---|---|
Conv* | 86.1 |
BN* | 84.6 |
Table 4 The influence of convolutional layer and BN layer on SDTF performance
组件 | 准确率/% |
---|---|
Conv* | 86.1 |
BN* | 84.6 |
模型 | 准确率/% |
---|---|
AGCN | 86.5 |
Full-SDTF | 80.1 |
Meta-SDTF | 85.4 |
AGCN-SDTF | 86.9 |
Table 5 The impact of using different layers of SDTF on model performance
模型 | 准确率/% |
---|---|
AGCN | 86.5 |
Full-SDTF | 80.1 |
Meta-SDTF | 85.4 |
AGCN-SDTF | 86.9 |
模型 | 参数量/M | 计算量/GFLOPs |
---|---|---|
AGCN | 3.45 | 18.65 |
Full-SDTF | 1.89 | 10.35 |
Meta-SDTF | 2.46 | 13.27 |
AGCN-SDTF | 2.37 | 12.61 |
Table 6 The impact of using different layers of SDTF on model complexity
模型 | 参数量/M | 计算量/GFLOPs |
---|---|---|
AGCN | 3.45 | 18.65 |
Full-SDTF | 1.89 | 10.35 |
Meta-SDTF | 2.46 | 13.27 |
AGCN-SDTF | 2.37 | 12.61 |
模型 | 数据类型 | CS准确率/% | CV准确率/% |
---|---|---|---|
AGCN | 骨骼点 | 86.5 | 93.7 |
骨骼 | 87.1 | 93.2 | |
AGCN-TCN* | 骨骼点 | 85.8 | 93.4 |
骨骼 | 86.5 | 93.3 |
Table 7 Performance of AGCN on NTU-RGBD Dataset
模型 | 数据类型 | CS准确率/% | CV准确率/% |
---|---|---|---|
AGCN | 骨骼点 | 86.5 | 93.7 |
骨骼 | 87.1 | 93.2 | |
AGCN-TCN* | 骨骼点 | 85.8 | 93.4 |
骨骼 | 86.5 | 93.3 |
模型 | 数据类型 | CS准确率/% | CV准确率/% |
---|---|---|---|
AGCN-SDTF | 骨骼点 | 86.9 | 93.7 |
骨骼 | 87.3 | 93.6 |
Table 8 Performance of AGCN-SDTF on NTU-RGBD Dataset
模型 | 数据类型 | CS准确率/% | CV准确率/% |
---|---|---|---|
AGCN-SDTF | 骨骼点 | 86.9 | 93.7 |
骨骼 | 87.3 | 93.6 |
模型 | CS准确 率/% | CV准确 率/% | 参数量/ M | 计算量/ GFLOPs |
---|---|---|---|---|
ST-GCN[ | 81.5 | 88.3 | 3.08 | 16.32 |
AS-GCN[ | 86.8 | 94.2 | - | - |
NAS-GCN[ | 87.6 | 94.5 | 6.50 | 36.60 |
ST-TR-AGCN[ | 89.3 | 96.1 | 12.11 | 64.41 |
2s-AGCN[ | 88.5 | 95.1 | 6.90 | 37.30 |
2s-AGCN-SDTF | 89.1 | 95.1 | 4.74 | 25.22 |
Table 9 Comparison of accuracy between different models in NTU-RGBD dataset
模型 | CS准确 率/% | CV准确 率/% | 参数量/ M | 计算量/ GFLOPs |
---|---|---|---|---|
ST-GCN[ | 81.5 | 88.3 | 3.08 | 16.32 |
AS-GCN[ | 86.8 | 94.2 | - | - |
NAS-GCN[ | 87.6 | 94.5 | 6.50 | 36.60 |
ST-TR-AGCN[ | 89.3 | 96.1 | 12.11 | 64.41 |
2s-AGCN[ | 88.5 | 95.1 | 6.90 | 37.30 |
2s-AGCN-SDTF | 89.1 | 95.1 | 4.74 | 25.22 |
模型 | Top1准确率/% | Top5准确率/% |
---|---|---|
ST-GCN[ | 30.7 | 52.8 |
AS-GCN[ | 34.8 | 56.5 |
NAS-GCN[ | 35.5 | 57.9 |
2s-AGCN[ | 36.1 | 58.7 |
2s-AGCN-SDTF | 35.8 | 59.0 |
Table 10 Comparison of accuracy between different models in Kinetics-Skeleton dataset
模型 | Top1准确率/% | Top5准确率/% |
---|---|---|
ST-GCN[ | 30.7 | 52.8 |
AS-GCN[ | 34.8 | 56.5 |
NAS-GCN[ | 35.5 | 57.9 |
2s-AGCN[ | 36.1 | 58.7 |
2s-AGCN-SDTF | 35.8 | 59.0 |
模型 | CS准确率/% |
---|---|
ST-GCN[ | 81.5 |
ST-GCN-SDTF | 81.7 |
CTR-GCN[ | 89.8 |
CTR-GCN-SDTF | 90.0 |
Table 11 Performance of ST-GCN-SDTF and CTR-GCN-SDTF
模型 | CS准确率/% |
---|---|
ST-GCN[ | 81.5 |
ST-GCN-SDTF | 81.7 |
CTR-GCN[ | 89.8 |
CTR-GCN-SDTF | 90.0 |
[1] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
[2] | KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1725-1732. |
[3] | SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videoss[C]// The 27th International Conference on Neural Information Processing Systems. New York: IEEE Press, 2014:568-576. |
[4] |
蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282.
DOI |
JIANG S N, CHEN E Q, ZHEN M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(02): 277-282 (in Chinese). | |
[5] | NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 4694-4702. |
[6] | 杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181. |
YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese).
DOI |
|
[7] | WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: towards good practices for deep action recognition[C]// European Conference on Computer Vision. Cham: Springer, 2016: 20-36. |
[8] | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3d convolutional networks[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 4489-4497. |
[9] | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6299-6308. |
[10] | TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6450-6459. |
[11] | ZHANG Z Y. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10. |
[12] | FANG H S, XIE S Q, TAI Y W, et al. Rmpe: regional multi-person pose estimation[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2334-2343. |
[13] | YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 7444-7452. |
[14] | SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 12026-12035. |
[15] | SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7912-7921. |
[16] | PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal transformer networks[J]. Computer Vision and Image Understanding, 2021, 208: 103219. |
[17] | 安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848. |
AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese). | |
[18] | LEE J, LEE M, LEE D, et al. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]// 2023 IEEE International Conference on Computer Vision. New York: IEEE Press, 2023: 10444-10453. |
[19] | DONG J, SUN S, LIU Z, et al. Hierarchical contrast for unsupervised skeleton-based action representation learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(1): 525-533. |
[20] | SUN S K, LIU D Z, DONG J F, et al. Unified multi-modal unsupervised representation learning for skeleton-based action understanding[C]// The 31st ACM International Conference on Multimedia. New York: ACM, 2023: 2973-2984. |
[21] | CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE International Conference on Computer Vision. New York: IEEE Press, 2021: 13359-13368. |
[22] | OBINATA Y, YAMAMOTO T. Temporal extension module for skeleton-based action recognition[C]// The 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 534-540. |
[23] | LONG F, QIU Z, PAN Y, et al. Dynamic temporal filtering in video models[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 475-492. |
[24] | SHAHROUDY A, LIU J, NG T T, et al. NTR RGB+ D: a large scale dataset for 3d human activity analysis[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 1010-1019. |
[25] | KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. [2023-08-19]. https://arxiv.org/abs/1705.06950v1. |
[26] | LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3595-3603. |
[27] | PENG W, HONG X P, CHEN H Y, et al. Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 2669-2676. |
[1] | JIANG Xiaoheng, DUAN Jinzhong, LU Yang, CUI Lisha, XU Mingliang. Fusing prior knowledge reasoning for surface defect detection [J]. Journal of Graphics, 2024, 45(5): 957-967. |
[2] | LIANG Chengwu, YANG Jie, HU Wei, JIANG Songqi, QIAN Qiyang, HOU Ning. Temporal dynamic frame selection and spatio-temporal graph convolution for interpretable skeleton-based action recognition [J]. Journal of Graphics, 2024, 45(4): 791-803. |
[3] | GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks [J]. Journal of Graphics, 2024, 45(1): 26-34. |
[4] | ZHOU Bo, GUO Zheng-yue, HAN Cheng-cun, DU Hua, YAN Yi-man, LUO Yue-tong. Graph convolution network based BREP→CSG conversion method and its application [J]. Journal of Graphics, 2022, 43(1): 101-109. |
[5] | REN Yong-feng, WU Hao-nan, CHU Cheng-qun, JIAO Xin-quan . 2D-FFT Image Edge Enhancement Design Based on FPGA [J]. Journal of Graphics, 2019, 40(1): 137-142. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||