Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 26-34.DOI: 10.11996/JG.j.2095-302X.2024010026
• Image Processing and Computer Vision • Previous Articles Next Articles
GUO Zongyang1(), LIU Lidong1(
), JIANG Donghua2, LIU Zixiang1, ZHU Shukang1, CHEN Jinghua1
Received:
2023-09-06
Accepted:
2023-11-12
Online:
2024-02-29
Published:
2024-02-29
Contact:
LIU Lidong (1982-), professor, Ph.D. His main research interests cover graphic image processing, computer vision, etc. About author:
GUO Zongyang (2000-), master student. His main research interests cover digital image processing and human action recognition, etc.
E-mail:gzy000119@chd.edu.cn
Supported by:
CLC Number:
GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks[J]. Journal of Graphics, 2024, 45(1): 26-34.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024010026
Fig. 5 Schematic diagram of the deformable convolution kernel ((a) The convolution kernel of an ordinary convolution; (b)~(d) Convolutional nuclei of deformable convolution)
网络 | 参数量/M | CS/% | CV/% |
---|---|---|---|
SGN | 0.659 0 | 89.0 | 94.5 |
SGN+T | 0.722 5 | 91.2 | 95.6 |
SGN+ECA | 0.659 1 | 92.5 | 96.3 |
SGN+DCM | 1.824 0 | 90.1 | 95.5 |
SGN+ALL | 1.877 6 | 93.0 | 96.5 |
Table 1 Parameter volume and identification accuracy on the NTURGB+D before and after the introduction of the module
网络 | 参数量/M | CS/% | CV/% |
---|---|---|---|
SGN | 0.659 0 | 89.0 | 94.5 |
SGN+T | 0.722 5 | 91.2 | 95.6 |
SGN+ECA | 0.659 1 | 92.5 | 96.3 |
SGN+DCM | 1.824 0 | 90.1 | 95.5 |
SGN+ALL | 1.877 6 | 93.0 | 96.5 |
网络 | 参数量/M | C-Sub/% | C-Set/% |
---|---|---|---|
SGN | 0.659 0 | 79.2 | 81.5 |
SGN+T | 0.722 5 | 84.2 | 85.6 |
SGN+ECA | 0.659 1 | 87.1 | 88.3 |
SGN+DCM | 1.824 0 | 82.1 | 85.5 |
SGN+ALL | 1.877 6 | 88.5 | 89.8 |
Table 2 Parameter volume and identification accuracy on the NTURGB+D 120 before and after the introduction of the module
网络 | 参数量/M | C-Sub/% | C-Set/% |
---|---|---|---|
SGN | 0.659 0 | 79.2 | 81.5 |
SGN+T | 0.722 5 | 84.2 | 85.6 |
SGN+ECA | 0.659 1 | 87.1 | 88.3 |
SGN+DCM | 1.824 0 | 82.1 | 85.5 |
SGN+ALL | 1.877 6 | 88.5 | 89.8 |
算法 | CS | CV |
---|---|---|
VA-LSTM[ | 79.4 | 87.6 |
ST-GCN[ | 81.5 | 88.3 |
SR-TSL[ | 84.8 | 89.8 |
HCN[ | 86.5 | 91.1 |
AS-GCN[ | 86.8 | 94.2 |
2s-AGCN[ | 88.5 | 95.1 |
VA-CNN[ | 88.7 | 94.3 |
SGN[ | 89.0 | 94.5 |
AGC-LSTM[ | 89.2 | 95.0 |
DGNN[ | 89.9 | 96.1 |
Shift-GCN[ | 90.7 | 96.5 |
PA-ResGCN-B19[ | 90.9 | 96.0 |
Dynamic GCN[ | 91.5 | 96.0 |
MS-G3D[ | 91.5 | 96.2 |
EfficientGCN-B4[ | 91.7 | 95.7 |
CTR-GCN[ | 92.4 | 96.8 |
PSUMNet[ | 92.9 | 96.7 |
本文算法 | 93.0 | 96.5 |
Table 3 Performance of the related algorithms on the NTURGB+D dataset/%
算法 | CS | CV |
---|---|---|
VA-LSTM[ | 79.4 | 87.6 |
ST-GCN[ | 81.5 | 88.3 |
SR-TSL[ | 84.8 | 89.8 |
HCN[ | 86.5 | 91.1 |
AS-GCN[ | 86.8 | 94.2 |
2s-AGCN[ | 88.5 | 95.1 |
VA-CNN[ | 88.7 | 94.3 |
SGN[ | 89.0 | 94.5 |
AGC-LSTM[ | 89.2 | 95.0 |
DGNN[ | 89.9 | 96.1 |
Shift-GCN[ | 90.7 | 96.5 |
PA-ResGCN-B19[ | 90.9 | 96.0 |
Dynamic GCN[ | 91.5 | 96.0 |
MS-G3D[ | 91.5 | 96.2 |
EfficientGCN-B4[ | 91.7 | 95.7 |
CTR-GCN[ | 92.4 | 96.8 |
PSUMNet[ | 92.9 | 96.7 |
本文算法 | 93.0 | 96.5 |
算法 | C-Sub | C-Set |
---|---|---|
GCA-LSTM[ | 58.3 | 59.2 |
Clips+CNN+MTLN[ | 58.4 | 57.9 |
Two-Stream GCA-LSTM[ | 61.2 | 63.3 |
RotClips+MTCNN[ | 62.2 | 61.8 |
TSRJI[ | 67.9 | 59.7 |
SGN[ | 79.2 | 81.5 |
MV-IGNET[ | 83.9 | 85.6 |
4s Shift-GCN[ | 85.9 | 87.6 |
MS-G3D[ | 86.9 | 88.4 |
PA-ResGCN-B19[ | 87.3 | 88.3 |
EfficientGCN-B4[ | 88.3 | 89.1 |
CTR-GCN[ | 88.9 | 90.6 |
PoseC3D[ | 86.9 | 90.3 |
本文算法 | 88.5 | 89.8 |
Table 4 Performance of the related algorithms on the NTURGB+D 120 dataset/%
算法 | C-Sub | C-Set |
---|---|---|
GCA-LSTM[ | 58.3 | 59.2 |
Clips+CNN+MTLN[ | 58.4 | 57.9 |
Two-Stream GCA-LSTM[ | 61.2 | 63.3 |
RotClips+MTCNN[ | 62.2 | 61.8 |
TSRJI[ | 67.9 | 59.7 |
SGN[ | 79.2 | 81.5 |
MV-IGNET[ | 83.9 | 85.6 |
4s Shift-GCN[ | 85.9 | 87.6 |
MS-G3D[ | 86.9 | 88.4 |
PA-ResGCN-B19[ | 87.3 | 88.3 |
EfficientGCN-B4[ | 88.3 | 89.1 |
CTR-GCN[ | 88.9 | 90.6 |
PoseC3D[ | 86.9 | 90.3 |
本文算法 | 88.5 | 89.8 |
[1] |
蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282.
DOI |
JIANG S N, CHEN E Q, ZHENG M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(2): 277-282 (in Chinese). | |
[2] | 安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848. |
AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese). | |
[3] | 杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181. |
YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese).
DOI |
|
[4] | YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[EB/OL]. [2023-08-22]. https://arxiv:1801.07455.pdf. |
[5] | LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3590-3598. |
[6] | SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12018-12027. |
[7] | THAKKAR K, NARAYANAN P J. Part-based graph convolutional network for action recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/1809.04983.pdf. |
[8] | SI C Y, CHEN W T, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1227-1236. |
[9] |
WEN Y H, GAO L, FU H B, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8989-8996.
DOI URL |
[10] |
YE F F, TANG H M. Skeleton-based action recognition with JRR-GCN[J]. Electronics Letters, 2019, 55(17): 933-935.
DOI |
[11] | CHENG K, ZHANG Y F, HE X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 180-189. |
[12] | ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1109-1118. |
[13] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[14] | WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539. |
[15] |
WANG M S, NI B B, YANG X K. Learning multi-view interactional skeleton graph for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6940-6954.
DOI URL |
[16] | ZHANG P F, LAN C L, XING J L, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2136-2145. |
[17] | LI C, ZHONG Q Y, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//The 27th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization. New York: ACM, 2018: 786-792. |
[18] | SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// European Conference on Computer Vision. Cham: Springer, 2018: 106-121. |
[19] |
ZHANG P F, LAN C L, XING J L, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978.
DOI PMID |
[20] | SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7904-7913. |
[21] | SONG Y F, ZHANG Z, SHAN C F, et al. Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1625-1633. |
[22] | YE F F, PU S L, ZHONG Q Y, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 55-63. |
[23] | LIU Z Y, ZHANG H W, CHEN Z H, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 140-149. |
[24] |
SONG Y F, ZHANG Z, SHAN C F, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1474-1488.
DOI URL |
[25] | CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13339-13348. |
[26] | TRIVEDI N, SARVADEVABHATLA R K. PSUMNet: unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/2208.05775.pdf. |
[27] | LIU J, WANG G, HU P, et al. Global context-aware attention LSTM networks for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3671-3680. |
[28] | KE Q H, BENNAMOUN M, AN S J, et al. A new representation of skeleton sequences for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4570-4579. |
[29] |
LIU J, WANG G, DUAN L Y, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1586-1599.
DOI PMID |
[30] |
KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2842-2855.
DOI PMID |
[31] | CAETANO C, BRÉMOND F, SCHWARTZ W R. Skeleton image representation for 3D action recognition based on tree structure and reference joints[C]// 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images. New York: IEEE Press, 2019: 16-23. |
[32] | DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978. |
[1] | LI Daxiang, JI Zhan, LIU Ying, TANG Yao. Improving YOLOv7 remote sensing image target detection algorithm [J]. Journal of Graphics, 2024, 45(4): 650-658. |
[2] | WEI Min, YAO Xin. Two-stage storm entity prediction based on multiscale and attention [J]. Journal of Graphics, 2024, 45(4): 696-704. |
[3] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. |
[4] | ZENG Zhichao, XU Yue, WANG Jingyu, YE Yuanlong, HUANG Zhikai, WANG Huan. A water surface target detection algorithm based on SOE-YOLO lightweight network [J]. Journal of Graphics, 2024, 45(4): 736-744. |
[5] | LI Songyang, WANG Xueting, CHEN Xianglong, CHEN Enqing. Human action recognition based on skeleton dynamic temporal filter [J]. Journal of Graphics, 2024, 45(4): 760-769. |
[6] | WU Bing, TIAN Ying. Research on multi-scale road damage detection algorithm based on attention mechanism [J]. Journal of Graphics, 2024, 45(4): 770-778. |
[7] | ZHAO Lei, LI Dong, FANG Jiandong, CAO Qi. Improved YOLO object detection algorithm for traffic signs [J]. Journal of Graphics, 2024, 45(4): 779-790. |
[8] | LIANG Chengwu, YANG Jie, HU Wei, JIANG Songqi, QIAN Qiyang, HOU Ning. Temporal dynamic frame selection and spatio-temporal graph convolution for interpretable skeleton-based action recognition [J]. Journal of Graphics, 2024, 45(4): 791-803. |
[9] | LI Yuehua, ZHONG Xin, YAO Zhangyan, HU Bin. Detection of dress code violations based on improved YOLOv5s [J]. Journal of Graphics, 2024, 45(3): 433-445. |
[10] | ZHANG Xiangsheng, YANG Xiao. Defect detection method of rubber seal ring based on improved YOLOv7-tiny [J]. Journal of Graphics, 2024, 45(3): 446-453. |
[11] | LI Tao, HU Ting, WU Dandan. Monocular depth estimation combining pyramid structure and attention mechanism [J]. Journal of Graphics, 2024, 45(3): 454-463. |
[12] | LU Longfei, WANG Junfeng, ZHAO Shiwen, LI Guang, DING Xintao. Peg-in-hole compliant assembly method based on skill learning of force-position perception [J]. Journal of Graphics, 2024, 45(2): 250-258. |
[13] | LV Ling, LI Hua, WANG Wu. Multi-directional text detection based on the fusion of enhanced feature extraction network and semantic feature [J]. Journal of Graphics, 2024, 45(1): 56-64. |
[14] | ZHAI Yongjie, ZHAO Xiaoyu, WANG Luyao, WANG Yaru, SONG Xiaoke, ZHU Haoshuo. IDD-YOLOv7: a lightweight method for multiple defect detection of insulators in transmission lines [J]. Journal of Graphics, 2024, 45(1): 90-101. |
[15] | GU Tianjun, XIONG Suya, LIN Xiao. Diversified generation of theatrical masks based on SASGAN [J]. Journal of Graphics, 2024, 45(1): 102-111. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||