Human action recognition algorithm based on semantics guided neural networks

doi:10.11996/JG.j.2095-302X.2024010026

Abstract

Abstract:

In recent years, modeling the three-dimensional coordinates of skeletal joints using deep feedforward neural networks has become a trend. However, challenges such as low recognition accuracy, huge parametric volume, and poor real-time performance still persist in the field of skeletal data-based action recognition. In response, an improved network model built upon semantic-guided networks (SGN) was proposed. Firstly, a non-local feature extraction module was integrated into the original network to enhance its training and prediction performance in advanced semantic guidance models, thereby decreasing its computational complexity and inference time in natural language processing tasks. Secondly, an attention mechanism was implemented to learn the channel weights of each convolutional network layer and lessen the redundant information between channels, thus further enhancing the computational efficiency and recognition accuracy of the model. Additionally, a deformable convolution module was employed to dynamically learn the weights of different graph convolutional network (GCN) layer channels and effectively aggregate the joint features across different channels for the final classification of the network, thereby boosting the utilization of feature information. Finally, human action recognition experiments were conducted on the public datasets NTU RGB+D and NTU RGB+D 120. The numerical results demonstrated that the proposed network was an order of magnitude smaller than most networks, and it significantly outperformed the original network and several other state-of-the-art algorithms in terms of recognition accuracy.

Key words: human action recognition, graph convolutional network, semantics guided neural network, non-local feature extraction, attention mechanism, deformable convolution

CLC Number:

TP391

GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks[J]. Journal of Graphics, 2024, 45(1): 26-34.

Figures/Tables 12

References 32

[1]	蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282. DOI
	JIANG S N, CHEN E Q, ZHENG M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(2): 277-282 (in Chinese).
[2]	安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848.
	AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese).
[3]	杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
	YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese). DOI
[4]	YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[EB/OL]. [2023-08-22]. https://arxiv:1801.07455.pdf.
[5]	LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3590-3598.
[6]	SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12018-12027.
[7]	THAKKAR K, NARAYANAN P J. Part-based graph convolutional network for action recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/1809.04983.pdf.
[8]	SI C Y, CHEN W T, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1227-1236.
[9]	WEN Y H, GAO L, FU H B, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8989-8996. DOI URL
[10]	YE F F, TANG H M. Skeleton-based action recognition with JRR-GCN[J]. Electronics Letters, 2019, 55(17): 933-935. DOI
[11]	CHENG K, ZHANG Y F, HE X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 180-189.
[12]	ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1109-1118.
[13]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[14]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539.
[15]	WANG M S, NI B B, YANG X K. Learning multi-view interactional skeleton graph for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6940-6954. DOI URL
[16]	ZHANG P F, LAN C L, XING J L, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2136-2145.
[17]	LI C, ZHONG Q Y, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//The 27th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization. New York: ACM, 2018: 786-792.
[18]	SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// European Conference on Computer Vision. Cham: Springer, 2018: 106-121.
[19]	ZHANG P F, LAN C L, XING J L, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. DOI PMID
[20]	SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7904-7913.
[21]	SONG Y F, ZHANG Z, SHAN C F, et al. Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1625-1633.
[22]	YE F F, PU S L, ZHONG Q Y, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 55-63.
[23]	LIU Z Y, ZHANG H W, CHEN Z H, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 140-149.
[24]	SONG Y F, ZHANG Z, SHAN C F, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1474-1488. DOI URL
[25]	CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13339-13348.
[26]	TRIVEDI N, SARVADEVABHATLA R K. PSUMNet: unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/2208.05775.pdf.
[27]	LIU J, WANG G, HU P, et al. Global context-aware attention LSTM networks for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3671-3680.
[28]	KE Q H, BENNAMOUN M, AN S J, et al. A new representation of skeleton sequences for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4570-4579.
[29]	LIU J, WANG G, DUAN L Y, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1586-1599. DOI PMID
[30]	KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2842-2855. DOI PMID
[31]	CAETANO C, BRÉMOND F, SCHWARTZ W R. Skeleton image representation for 3D action recognition based on tree structure and reference joints[C]// 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images. New York: IEEE Press, 2019: 16-23.
[32]	DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978.

网络	参数量/M	CS/%	CV/%
SGN	0.659 0	89.0	94.5
SGN+T	0.722 5	91.2	95.6
SGN+ECA	0.659 1	92.5	96.3
SGN+DCM	1.824 0	90.1	95.5
SGN+ALL	1.877 6	93.0	96.5

网络	参数量/M	CS/%	CV/%
SGN	0.659 0	89.0	94.5
SGN+T	0.722 5	91.2	95.6
SGN+ECA	0.659 1	92.5	96.3
SGN+DCM	1.824 0	90.1	95.5
SGN+ALL	1.877 6	93.0	96.5

网络	参数量/M	C-Sub/%	C-Set/%
SGN	0.659 0	79.2	81.5
SGN+T	0.722 5	84.2	85.6
SGN+ECA	0.659 1	87.1	88.3
SGN+DCM	1.824 0	82.1	85.5
SGN+ALL	1.877 6	88.5	89.8

网络	参数量/M	C-Sub/%	C-Set/%
SGN	0.659 0	79.2	81.5
SGN+T	0.722 5	84.2	85.6
SGN+ECA	0.659 1	87.1	88.3
SGN+DCM	1.824 0	82.1	85.5
SGN+ALL	1.877 6	88.5	89.8

算法	CS	CV
VA-LSTM^[16]	79.4	87.6
ST-GCN^[4]	81.5	88.3
SR-TSL^[17]	84.8	89.8
HCN^[18]	86.5	91.1
AS-GCN^[5]	86.8	94.2
2s-AGCN^[6]	88.5	95.1
VA-CNN^[19]	88.7	94.3
SGN^[12]	89.0	94.5
AGC-LSTM^[8]	89.2	95.0
DGNN^[20]	89.9	96.1
Shift-GCN^[11]	90.7	96.5
PA-ResGCN-B19^[21]	90.9	96.0
Dynamic GCN^[22]	91.5	96.0
MS-G3D^[23]	91.5	96.2
EfficientGCN-B4^[24]	91.7	95.7
CTR-GCN^[25]	92.4	96.8
PSUMNet^[26]	92.9	96.7
本文算法	93.0	96.5