基于语义引导神经网络的人体动作识别算法

doi:10.11996/JG.j.2095-302X.2024010026

图学学报 ›› 2024, Vol. 45 ›› Issue (1): 26-34.DOI: 10.11996/JG.j.2095-302X.2024010026

• 图像处理与计算机视觉 • 上一篇下一篇

基于语义引导神经网络的人体动作识别算法

郭宗洋¹(), 刘立东¹(), 蒋东华², 刘子翔¹, 朱熟康¹, 陈京华¹

1.长安大学信息工程学院，陕西西安 710064
2.中山大学计算机学院，广东广州 510006

收稿日期:2023-09-06 接受日期:2023-11-12 出版日期:2024-02-29 发布日期:2024-02-29
通讯作者:刘立东(1982-)，男，教授，博士。主要研究方向为图像处理与计算机视觉等。E-mail：liulidong@chd.edu.cn
第一作者:郭宗洋(2000-)，男，硕士研究生。主要研究方向为图像处理与人体动作识别等。E-mail：gzy000119@chd.edu.cn
基金资助:
国家自然科学基金项目(52172379)

Human action recognition algorithm based on semantics guided neural networks

GUO Zongyang¹(), LIU Lidong¹(), JIANG Donghua², LIU Zixiang¹, ZHU Shukang¹, CHEN Jinghua¹

1. School of Information Engineering, Chang’an University, Xi’an Shaanxi 710064, China
2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou Guangdong 510006, China

Received:2023-09-06 Accepted:2023-11-12 Published:2024-02-29 Online:2024-02-29
First author：GUO Zongyang (2000-), master student. His main research interests cover digital image processing and human action recognition, etc. E-mail：gzy000119@chd.edu.cn
Supported by:
National Natural Science Foundation of China(52172379)

摘要/Abstract

摘要：

近年来，采用深度前馈神经网络对骨骼关节的三维坐标建模成为了一种趋势。但网络识别准确率低、巨大的参数量以及实时性差仍然是基于骨骼数据动作识别领域中急需解决的问题。为此，提出一种基于语义引导神经网络(SGN)改进的网络模型。首先，在原网络中引入了非局部特征提取模块用于增强其在高级语义指导模型训练和预测的表现，降低了其在自然语言处理任务中的计算复杂性和推理时间；其次，引入注意力机制学习每个图卷积网络层的通道权重并减少通道间的冗余信息，进一步提高模型的计算效率和识别准确率；此外，以可变形卷积模块动态学习不同图卷积网络(GCN)层通道的权重，并有效地聚合不同通道中的关节特征用于网络最后的分类识别，从而提高特征信息的利用率。最后，在NTU RGB+D和NTU RGB+D 120公开数据集上进行人体动作识别实验。实验结果表明，所提出的网络比大多数网络小一个数量级，并且在识别准确率上明显优于原网络和其他一些先进的算法。

长安大学刘立东教授及其学生郭宗洋提出基于语义引导神经网络的人体动作识别算法，主要在语义引导神经网络的关节级模块和帧级模块进行了如下改进：引入非局部特征提取模块降低模型计算和推理的时间；引入ECA注意力机制学习每个图卷积网络层的通道权重，从而进一步提高模型的计算效率和识别准确率；最后，以可变形卷积模块动态学习不同图卷积网络层通道的权重从而提高特征信息的利用率。

关键词: 人体动作识别, 图卷积网络, 语义引导神经网络, 非局部特征提取, 注意力机制, 可变形卷积

Abstract:

In recent years, modeling the three-dimensional coordinates of skeletal joints using deep feedforward neural networks has become a trend. However, challenges such as low recognition accuracy, huge parametric volume, and poor real-time performance still persist in the field of skeletal data-based action recognition. In response, an improved network model built upon semantic-guided networks (SGN) was proposed. Firstly, a non-local feature extraction module was integrated into the original network to enhance its training and prediction performance in advanced semantic guidance models, thereby decreasing its computational complexity and inference time in natural language processing tasks. Secondly, an attention mechanism was implemented to learn the channel weights of each convolutional network layer and lessen the redundant information between channels, thus further enhancing the computational efficiency and recognition accuracy of the model. Additionally, a deformable convolution module was employed to dynamically learn the weights of different graph convolutional network (GCN) layer channels and effectively aggregate the joint features across different channels for the final classification of the network, thereby boosting the utilization of feature information. Finally, human action recognition experiments were conducted on the public datasets NTU RGB+D and NTU RGB+D 120. The numerical results demonstrated that the proposed network was an order of magnitude smaller than most networks, and it significantly outperformed the original network and several other state-of-the-art algorithms in terms of recognition accuracy.

Key words: human action recognition, graph convolutional network, semantics guided neural network, non-local feature extraction, attention mechanism, deformable convolution

中图分类号:

TP391

郭宗洋, 刘立东, 蒋东华, 刘子翔, 朱熟康, 陈京华. 基于语义引导神经网络的人体动作识别算法[J]. 图学学报, 2024, 45(1): 26-34.

GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks[J]. Journal of Graphics, 2024, 45(1): 26-34.

图/表 12

图1 改进后的语义引导神经网络结构框架

Fig. 1 Enhanced semantic-guided neural network architecture

图2 语义引导神经网络结构框架

Fig. 2 Semantic-guided neural network architecture

图3 非局部特征提取模块的结构框架

Fig. 3 Structural framework of the non-local feature extraction module

图4 ECA模块的结构框架

Fig. 4 Structural framework of the ECA module

图5 可变形卷积核的原理图((a)普通卷积的卷积核；(b)~(d)可变形卷积的卷积核核)

Fig. 5 Schematic diagram of the deformable convolution kernel ((a) The convolution kernel of an ordinary convolution; (b)~(d) Convolutional nuclei of deformable convolution)

图6 原网络仿真实验的可视化结果((a)拍手；(b)跳起；(c)坐下)

Fig. 6 Visualization results of the original network simulation experiments ((a) Clap; (b) Jump; (c) Sit)

图7 改进后网络仿真实验的可视化结果((a)拍手；(b)跳起；(c)坐下)

Fig. 7 Visualization results of the improved network simulation experiments ((a) Clap; (b) Jump; (c) Sit)

表1 引入模块前后在NTURGB+D上的参数量及识别精度

Table 1 Parameter volume and identification accuracy on the NTURGB+D before and after the introduction of the module

网络	参数量/M	CS/%	CV/%
SGN	0.659 0	89.0	94.5
SGN+T	0.722 5	91.2	95.6
SGN+ECA	0.659 1	92.5	96.3
SGN+DCM	1.824 0	90.1	95.5
SGN+ALL	1.877 6	93.0	96.5

表2 引入模块前后在NTURGB+D 120上的参数量及识别精度

Table 2 Parameter volume and identification accuracy on the NTURGB+D 120 before and after the introduction of the module

网络	参数量/M	C-Sub/%	C-Set/%
SGN	0.659 0	79.2	81.5
SGN+T	0.722 5	84.2	85.6
SGN+ECA	0.659 1	87.1	88.3
SGN+DCM	1.824 0	82.1	85.5
SGN+ALL	1.877 6	88.5	89.8

图8 不同方法在NTURGB+D上识别准确率和参数量的比较

Fig. 8 Comparison of identification accuracy and number of parameters by different methods on NTURGB+D

表3 相关算法在NTURGB+D数据集上的表现/%

Table 3 Performance of the related algorithms on the NTURGB+D dataset/%

算法	CS	CV
VA-LSTM^[16]	79.4	87.6
ST-GCN^[4]	81.5	88.3
SR-TSL^[17]	84.8	89.8
HCN^[18]	86.5	91.1
AS-GCN^[5]	86.8	94.2
2s-AGCN^[6]	88.5	95.1
VA-CNN^[19]	88.7	94.3
SGN^[12]	89.0	94.5
AGC-LSTM^[8]	89.2	95.0
DGNN^[20]	89.9	96.1
Shift-GCN^[11]	90.7	96.5
PA-ResGCN-B19^[21]	90.9	96.0
Dynamic GCN^[22]	91.5	96.0
MS-G3D^[23]	91.5	96.2
EfficientGCN-B4^[24]	91.7	95.7
CTR-GCN^[25]	92.4	96.8
PSUMNet^[26]	92.9	96.7
本文算法	93.0	96.5

表4 相关算法在NTURGB+D 120数据集上的表现/%

Table 4 Performance of the related algorithms on the NTURGB+D 120 dataset/%

算法	C-Sub	C-Set
GCA-LSTM^[27]	58.3	59.2
Clips+CNN+MTLN^[28]	58.4	57.9
Two-Stream GCA-LSTM^[29]	61.2	63.3
RotClips+MTCNN^[30]	62.2	61.8
TSRJI^[31]	67.9	59.7
SGN^[12]	79.2	81.5
MV-IGNET^[15]	83.9	85.6
4s Shift-GCN^[11]	85.9	87.6
MS-G3D^[23]	86.9	88.4
PA-ResGCN-B19^[21]	87.3	88.3
EfficientGCN-B4^[24]	88.3	89.1
CTR-GCN^[25]	88.9	90.6
PoseC3D^[32]	86.9	90.3
本文算法	88.5	89.8

参考文献 32

[1]	蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282. DOI
	JIANG S N, CHEN E Q, ZHENG M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(2): 277-282 (in Chinese).
[2]	安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848.
	AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese).
[3]	杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
	YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese). DOI
[4]	YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[EB/OL]. [2023-08-22]. https://arxiv:1801.07455.pdf.
[5]	LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3590-3598.
[6]	SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12018-12027.
[7]	THAKKAR K, NARAYANAN P J. Part-based graph convolutional network for action recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/1809.04983.pdf.
[8]	SI C Y, CHEN W T, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1227-1236.
[9]	WEN Y H, GAO L, FU H B, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8989-8996. DOI URL
[10]	YE F F, TANG H M. Skeleton-based action recognition with JRR-GCN[J]. Electronics Letters, 2019, 55(17): 933-935. DOI
[11]	CHENG K, ZHANG Y F, HE X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 180-189.
[12]	ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1109-1118.
[13]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[14]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539.
[15]	WANG M S, NI B B, YANG X K. Learning multi-view interactional skeleton graph for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6940-6954. DOI URL
[16]	ZHANG P F, LAN C L, XING J L, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2136-2145.
[17]	LI C, ZHONG Q Y, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//The 27th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization. New York: ACM, 2018: 786-792.
[18]	SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// European Conference on Computer Vision. Cham: Springer, 2018: 106-121.
[19]	ZHANG P F, LAN C L, XING J L, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. DOI PMID
[20]	SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7904-7913.
[21]	SONG Y F, ZHANG Z, SHAN C F, et al. Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1625-1633.
[22]	YE F F, PU S L, ZHONG Q Y, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 55-63.
[23]	LIU Z Y, ZHANG H W, CHEN Z H, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 140-149.
[24]	SONG Y F, ZHANG Z, SHAN C F, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1474-1488. DOI URL
[25]	CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13339-13348.
[26]	TRIVEDI N, SARVADEVABHATLA R K. PSUMNet: unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/2208.05775.pdf.
[27]	LIU J, WANG G, HU P, et al. Global context-aware attention LSTM networks for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3671-3680.
[28]	KE Q H, BENNAMOUN M, AN S J, et al. A new representation of skeleton sequences for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4570-4579.
[29]	LIU J, WANG G, DUAN L Y, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1586-1599. DOI PMID
[30]	KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2842-2855. DOI PMID
[31]	CAETANO C, BRÉMOND F, SCHWARTZ W R. Skeleton image representation for 3D action recognition based on tree structure and reference joints[C]// 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images. New York: IEEE Press, 2019: 16-23.
[32]	DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978.

基于语义引导神经网络的人体动作识别算法

Human action recognition algorithm based on semantics guided neural networks

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658.
[2]	魏敏, 姚鑫. 基于多尺度与注意力机制的两阶段风暴单体外推研究[J]. 图学学报, 2024, 45(4): 696-704.
[3]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[4]	曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744.
[5]	李松洋, 王雪婷, 陈相龙, 陈恩庆. 基于骨骼点动态时域滤波的人体动作识别[J]. 图学学报, 2024, 45(4): 760-769.
[6]	武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778.
[7]	赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790.
[8]	梁成武, 杨杰, 胡伟, 蒋松琪, 钱其扬, 侯宁. 基于时间动态帧选择与时空图卷积的可解释骨架行为识别[J]. 图学学报, 2024, 45(4): 791-803.
[9]	李跃华, 仲新, 姚章燕, 胡彬. 基于改进YOLOv5s的着装不规范检测算法研究[J]. 图学学报, 2024, 45(3): 433-445.
[10]	张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453.
[11]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[12]	路龙飞, 王峻峰, 赵世闻, 李广, 丁鑫涛. 基于力位感知技能学习的轴孔柔顺装配方法[J]. 图学学报, 2024, 45(2): 250-258.
[13]	吕伶, 李华, 王武. 基于增强特征提取网络与语义特征融合的多方向文本检测[J]. 图学学报, 2024, 45(1): 56-64.
[14]	翟永杰, 赵晓瑜, 王璐瑶, 王亚茹, 宋晓轲, 朱浩硕. IDD-YOLOv7：一种用于输电线路绝缘子多缺陷的轻量化检测方法[J]. 图学学报, 2024, 45(1): 90-101.
[15]	古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成[J]. 图学学报, 2024, 45(1): 102-111.