Journal of Graphics ›› 2024, Vol. 45 ›› Issue (4): 760-769.DOI: 10.11996/JG.j.2095-302X.2024040760
• Image Processing and Computer Vision • Previous Articles Next Articles
LI Songyang(
), WANG Xueting, CHEN Xianglong, CHEN Enqing(
)
Received:2023-10-08
Accepted:2024-02-20
Online:2024-08-31
Published:2024-09-03
Contact:
CHEN Enqing
About author:First author contact:LI Songyang (1998-), master student. His main research interests cover computer vision and pattern recognition. E-mail:lisongyang1998@gs.zzu.edu.cn
Supported by:CLC Number:
LI Songyang, WANG Xueting, CHEN Xianglong, CHEN Enqing. Human action recognition based on skeleton dynamic temporal filter[J]. Journal of Graphics, 2024, 45(4): 760-769.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024040760
| 层数 | 输入通道 | 输出通道 | 参数量 |
|---|---|---|---|
| 1 | 3 | 64 | 1792 |
| 2, 3, 4 | 64 | 64 | 36928 |
| 5 | 64 | 128 | 73856 |
| 6, 7 | 128 | 128 | 147584 |
| 8 | 128 | 256 | 295168 |
| 9, 10 | 256 | 256 | 590080 |
| 合计 | 3 | 256 | 1145408 |
Table 1 Convolution parameter quantities in AGCN
| 层数 | 输入通道 | 输出通道 | 参数量 |
|---|---|---|---|
| 1 | 3 | 64 | 1792 |
| 2, 3, 4 | 64 | 64 | 36928 |
| 5 | 64 | 128 | 73856 |
| 6, 7 | 128 | 128 | 147584 |
| 8 | 128 | 256 | 295168 |
| 9, 10 | 256 | 256 | 590080 |
| 合计 | 3 | 256 | 1145408 |
| 层数 | 输入通道 | 输出通道 | 参数量 |
|---|---|---|---|
| 1 | 3 | 64 | 1792 |
| 2, 3, 4 | 300 | 302 | 90902 |
| 5 | 64 | 128 | 73856 |
| 6, 7 | 150 | 152 | 22952 |
| 8 | 128 | 256 | 295168 |
| 9, 10 | 75 | 76 | 5776 |
| 合计 | 3 | 256 | 490446 |
Table 2 Convolution parameter quantities in AGCN-SDTF
| 层数 | 输入通道 | 输出通道 | 参数量 |
|---|---|---|---|
| 1 | 3 | 64 | 1792 |
| 2, 3, 4 | 300 | 302 | 90902 |
| 5 | 64 | 128 | 73856 |
| 6, 7 | 150 | 152 | 22952 |
| 8 | 128 | 256 | 295168 |
| 9, 10 | 75 | 76 | 5776 |
| 合计 | 3 | 256 | 490446 |
| 激活函数 | 准确率/% |
|---|---|
| None | 85.4 |
| Sigmoid | 86.4 |
| Tanh | 84.8 |
| Softmax | 86.1 |
| LeakyReLU | 84.5 |
| ReLU | 86.9 |
Table 3 Influence of activation function selection on model performance
| 激活函数 | 准确率/% |
|---|---|
| None | 85.4 |
| Sigmoid | 86.4 |
| Tanh | 84.8 |
| Softmax | 86.1 |
| LeakyReLU | 84.5 |
| ReLU | 86.9 |
| 组件 | 准确率/% |
|---|---|
| Conv* | 86.1 |
| BN* | 84.6 |
Table 4 The influence of convolutional layer and BN layer on SDTF performance
| 组件 | 准确率/% |
|---|---|
| Conv* | 86.1 |
| BN* | 84.6 |
| 模型 | 准确率/% |
|---|---|
| AGCN | 86.5 |
| Full-SDTF | 80.1 |
| Meta-SDTF | 85.4 |
| AGCN-SDTF | 86.9 |
Table 5 The impact of using different layers of SDTF on model performance
| 模型 | 准确率/% |
|---|---|
| AGCN | 86.5 |
| Full-SDTF | 80.1 |
| Meta-SDTF | 85.4 |
| AGCN-SDTF | 86.9 |
| 模型 | 参数量/M | 计算量/GFLOPs |
|---|---|---|
| AGCN | 3.45 | 18.65 |
| Full-SDTF | 1.89 | 10.35 |
| Meta-SDTF | 2.46 | 13.27 |
| AGCN-SDTF | 2.37 | 12.61 |
Table 6 The impact of using different layers of SDTF on model complexity
| 模型 | 参数量/M | 计算量/GFLOPs |
|---|---|---|
| AGCN | 3.45 | 18.65 |
| Full-SDTF | 1.89 | 10.35 |
| Meta-SDTF | 2.46 | 13.27 |
| AGCN-SDTF | 2.37 | 12.61 |
| 模型 | 数据类型 | CS准确率/% | CV准确率/% |
|---|---|---|---|
| AGCN | 骨骼点 | 86.5 | 93.7 |
| 骨骼 | 87.1 | 93.2 | |
| AGCN-TCN* | 骨骼点 | 85.8 | 93.4 |
| 骨骼 | 86.5 | 93.3 |
Table 7 Performance of AGCN on NTU-RGBD Dataset
| 模型 | 数据类型 | CS准确率/% | CV准确率/% |
|---|---|---|---|
| AGCN | 骨骼点 | 86.5 | 93.7 |
| 骨骼 | 87.1 | 93.2 | |
| AGCN-TCN* | 骨骼点 | 85.8 | 93.4 |
| 骨骼 | 86.5 | 93.3 |
| 模型 | 数据类型 | CS准确率/% | CV准确率/% |
|---|---|---|---|
| AGCN-SDTF | 骨骼点 | 86.9 | 93.7 |
| 骨骼 | 87.3 | 93.6 |
Table 8 Performance of AGCN-SDTF on NTU-RGBD Dataset
| 模型 | 数据类型 | CS准确率/% | CV准确率/% |
|---|---|---|---|
| AGCN-SDTF | 骨骼点 | 86.9 | 93.7 |
| 骨骼 | 87.3 | 93.6 |
| 模型 | CS准确 率/% | CV准确 率/% | 参数量/ M | 计算量/ GFLOPs |
|---|---|---|---|---|
| ST-GCN[ | 81.5 | 88.3 | 3.08 | 16.32 |
| AS-GCN[ | 86.8 | 94.2 | - | - |
| NAS-GCN[ | 87.6 | 94.5 | 6.50 | 36.60 |
| ST-TR-AGCN[ | 89.3 | 96.1 | 12.11 | 64.41 |
| 2s-AGCN[ | 88.5 | 95.1 | 6.90 | 37.30 |
| 2s-AGCN-SDTF | 89.1 | 95.1 | 4.74 | 25.22 |
Table 9 Comparison of accuracy between different models in NTU-RGBD dataset
| 模型 | CS准确 率/% | CV准确 率/% | 参数量/ M | 计算量/ GFLOPs |
|---|---|---|---|---|
| ST-GCN[ | 81.5 | 88.3 | 3.08 | 16.32 |
| AS-GCN[ | 86.8 | 94.2 | - | - |
| NAS-GCN[ | 87.6 | 94.5 | 6.50 | 36.60 |
| ST-TR-AGCN[ | 89.3 | 96.1 | 12.11 | 64.41 |
| 2s-AGCN[ | 88.5 | 95.1 | 6.90 | 37.30 |
| 2s-AGCN-SDTF | 89.1 | 95.1 | 4.74 | 25.22 |
| 模型 | Top1准确率/% | Top5准确率/% |
|---|---|---|
| ST-GCN[ | 30.7 | 52.8 |
| AS-GCN[ | 34.8 | 56.5 |
| NAS-GCN[ | 35.5 | 57.9 |
| 2s-AGCN[ | 36.1 | 58.7 |
| 2s-AGCN-SDTF | 35.8 | 59.0 |
Table 10 Comparison of accuracy between different models in Kinetics-Skeleton dataset
| 模型 | Top1准确率/% | Top5准确率/% |
|---|---|---|
| ST-GCN[ | 30.7 | 52.8 |
| AS-GCN[ | 34.8 | 56.5 |
| NAS-GCN[ | 35.5 | 57.9 |
| 2s-AGCN[ | 36.1 | 58.7 |
| 2s-AGCN-SDTF | 35.8 | 59.0 |
| 模型 | CS准确率/% |
|---|---|
| ST-GCN[ | 81.5 |
| ST-GCN-SDTF | 81.7 |
| CTR-GCN[ | 89.8 |
| CTR-GCN-SDTF | 90.0 |
Table 11 Performance of ST-GCN-SDTF and CTR-GCN-SDTF
| 模型 | CS准确率/% |
|---|---|
| ST-GCN[ | 81.5 |
| ST-GCN-SDTF | 81.7 |
| CTR-GCN[ | 89.8 |
| CTR-GCN-SDTF | 90.0 |
| [1] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. |
| [2] | KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 1725-1732. |
| [3] | SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videoss[C]// The 27th International Conference on Neural Information Processing Systems. New York: IEEE Press, 2014:568-576. |
| [4] |
蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282.
DOI |
| JIANG S N, CHEN E Q, ZHEN M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(02): 277-282 (in Chinese). | |
| [5] | NG J Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: deep networks for video classification[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 4694-4702. |
| [6] | 杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181. |
|
YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese).
DOI |
|
| [7] | WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: towards good practices for deep action recognition[C]// European Conference on Computer Vision. Cham: Springer, 2016: 20-36. |
| [8] | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3d convolutional networks[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 4489-4497. |
| [9] | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6299-6308. |
| [10] | TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6450-6459. |
| [11] | ZHANG Z Y. Microsoft kinect sensor and its effect[J]. IEEE Multimedia, 2012, 19(2): 4-10. |
| [12] | FANG H S, XIE S Q, TAI Y W, et al. Rmpe: regional multi-person pose estimation[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2334-2343. |
| [13] | YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 7444-7452. |
| [14] | SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 12026-12035. |
| [15] | SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7912-7921. |
| [16] | PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal transformer networks[J]. Computer Vision and Image Understanding, 2021, 208: 103219. |
| [17] | 安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848. |
| AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese). | |
| [18] | LEE J, LEE M, LEE D, et al. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition[C]// 2023 IEEE International Conference on Computer Vision. New York: IEEE Press, 2023: 10444-10453. |
| [19] | DONG J, SUN S, LIU Z, et al. Hierarchical contrast for unsupervised skeleton-based action representation learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(1): 525-533. |
| [20] | SUN S K, LIU D Z, DONG J F, et al. Unified multi-modal unsupervised representation learning for skeleton-based action understanding[C]// The 31st ACM International Conference on Multimedia. New York: ACM, 2023: 2973-2984. |
| [21] | CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE International Conference on Computer Vision. New York: IEEE Press, 2021: 13359-13368. |
| [22] | OBINATA Y, YAMAMOTO T. Temporal extension module for skeleton-based action recognition[C]// The 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 534-540. |
| [23] | LONG F, QIU Z, PAN Y, et al. Dynamic temporal filtering in video models[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 475-492. |
| [24] | SHAHROUDY A, LIU J, NG T T, et al. NTR RGB+ D: a large scale dataset for 3d human activity analysis[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 1010-1019. |
| [25] | KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset[EB/OL]. [2023-08-19]. https://arxiv.org/abs/1705.06950v1. |
| [26] | LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3595-3603. |
| [27] | PENG W, HONG X P, CHEN H Y, et al. Learning graph convolutional network for skeleton-based human action recognition by neural searching[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020: 2669-2676. |
| [1] | JIANG Xiaoheng, DUAN Jinzhong, LU Yang, CUI Lisha, XU Mingliang. Fusing prior knowledge reasoning for surface defect detection [J]. Journal of Graphics, 2024, 45(5): 957-967. |
| [2] | LIANG Chengwu, YANG Jie, HU Wei, JIANG Songqi, QIAN Qiyang, HOU Ning. Temporal dynamic frame selection and spatio-temporal graph convolution for interpretable skeleton-based action recognition [J]. Journal of Graphics, 2024, 45(4): 791-803. |
| [3] | GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks [J]. Journal of Graphics, 2024, 45(1): 26-34. |
| [4] | ZHOU Bo, GUO Zheng-yue, HAN Cheng-cun, DU Hua, YAN Yi-man, LUO Yue-tong. Graph convolution network based BREP→CSG conversion method and its application [J]. Journal of Graphics, 2022, 43(1): 101-109. |
| [5] | REN Yong-feng, WU Hao-nan, CHU Cheng-qun, JIAO Xin-quan . 2D-FFT Image Edge Enhancement Design Based on FPGA [J]. Journal of Graphics, 2019, 40(1): 137-142. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||