图学学报 ›› 2024, Vol. 45 ›› Issue (1): 26-34.DOI: 10.11996/JG.j.2095-302X.2024010026
郭宗洋1(), 刘立东1(
), 蒋东华2, 刘子翔1, 朱熟康1, 陈京华1
收稿日期:
2023-09-06
接受日期:
2023-11-12
出版日期:
2024-02-29
发布日期:
2024-02-29
通讯作者:
刘立东(1982-),男,教授,博士。主要研究方向为图像处理与计算机视觉等。E-mail:liulidong@chd.edu.cn第一作者:
郭宗洋(2000-),男,硕士研究生。主要研究方向为图像处理与人体动作识别等。E-mail:gzy000119@chd.edu.cn
基金资助:
GUO Zongyang1(), LIU Lidong1(
), JIANG Donghua2, LIU Zixiang1, ZHU Shukang1, CHEN Jinghua1
Received:
2023-09-06
Accepted:
2023-11-12
Published:
2024-02-29
Online:
2024-02-29
First author:
GUO Zongyang (2000-), master student. His main research interests cover digital image processing and human action recognition, etc. E-mail:gzy000119@chd.edu.cn
Supported by:
摘要:
近年来,采用深度前馈神经网络对骨骼关节的三维坐标建模成为了一种趋势。但网络识别准确率低、巨大的参数量以及实时性差仍然是基于骨骼数据动作识别领域中急需解决的问题。为此,提出一种基于语义引导神经网络(SGN)改进的网络模型。首先,在原网络中引入了非局部特征提取模块用于增强其在高级语义指导模型训练和预测的表现,降低了其在自然语言处理任务中的计算复杂性和推理时间;其次,引入注意力机制学习每个图卷积网络层的通道权重并减少通道间的冗余信息,进一步提高模型的计算效率和识别准确率;此外,以可变形卷积模块动态学习不同图卷积网络(GCN)层通道的权重,并有效地聚合不同通道中的关节特征用于网络最后的分类识别,从而提高特征信息的利用率。最后,在NTU RGB+D和NTU RGB+D 120公开数据集上进行人体动作识别实验。实验结果表明,所提出的网络比大多数网络小一个数量级,并且在识别准确率上明显优于原网络和其他一些先进的算法。
中图分类号:
郭宗洋, 刘立东, 蒋东华, 刘子翔, 朱熟康, 陈京华. 基于语义引导神经网络的人体动作识别算法[J]. 图学学报, 2024, 45(1): 26-34.
GUO Zongyang, LIU Lidong, JIANG Donghua, LIU Zixiang, ZHU Shukang, CHEN Jinghua. Human action recognition algorithm based on semantics guided neural networks[J]. Journal of Graphics, 2024, 45(1): 26-34.
图5 可变形卷积核的原理图((a)普通卷积的卷积核;(b)~(d)可变形卷积的卷积核核)
Fig. 5 Schematic diagram of the deformable convolution kernel ((a) The convolution kernel of an ordinary convolution; (b)~(d) Convolutional nuclei of deformable convolution)
网络 | 参数量/M | CS/% | CV/% |
---|---|---|---|
SGN | 0.659 0 | 89.0 | 94.5 |
SGN+T | 0.722 5 | 91.2 | 95.6 |
SGN+ECA | 0.659 1 | 92.5 | 96.3 |
SGN+DCM | 1.824 0 | 90.1 | 95.5 |
SGN+ALL | 1.877 6 | 93.0 | 96.5 |
表1 引入模块前后在NTURGB+D上的参数量及识别精度
Table 1 Parameter volume and identification accuracy on the NTURGB+D before and after the introduction of the module
网络 | 参数量/M | CS/% | CV/% |
---|---|---|---|
SGN | 0.659 0 | 89.0 | 94.5 |
SGN+T | 0.722 5 | 91.2 | 95.6 |
SGN+ECA | 0.659 1 | 92.5 | 96.3 |
SGN+DCM | 1.824 0 | 90.1 | 95.5 |
SGN+ALL | 1.877 6 | 93.0 | 96.5 |
网络 | 参数量/M | C-Sub/% | C-Set/% |
---|---|---|---|
SGN | 0.659 0 | 79.2 | 81.5 |
SGN+T | 0.722 5 | 84.2 | 85.6 |
SGN+ECA | 0.659 1 | 87.1 | 88.3 |
SGN+DCM | 1.824 0 | 82.1 | 85.5 |
SGN+ALL | 1.877 6 | 88.5 | 89.8 |
表2 引入模块前后在NTURGB+D 120上的参数量及识别精度
Table 2 Parameter volume and identification accuracy on the NTURGB+D 120 before and after the introduction of the module
网络 | 参数量/M | C-Sub/% | C-Set/% |
---|---|---|---|
SGN | 0.659 0 | 79.2 | 81.5 |
SGN+T | 0.722 5 | 84.2 | 85.6 |
SGN+ECA | 0.659 1 | 87.1 | 88.3 |
SGN+DCM | 1.824 0 | 82.1 | 85.5 |
SGN+ALL | 1.877 6 | 88.5 | 89.8 |
算法 | CS | CV |
---|---|---|
VA-LSTM[ | 79.4 | 87.6 |
ST-GCN[ | 81.5 | 88.3 |
SR-TSL[ | 84.8 | 89.8 |
HCN[ | 86.5 | 91.1 |
AS-GCN[ | 86.8 | 94.2 |
2s-AGCN[ | 88.5 | 95.1 |
VA-CNN[ | 88.7 | 94.3 |
SGN[ | 89.0 | 94.5 |
AGC-LSTM[ | 89.2 | 95.0 |
DGNN[ | 89.9 | 96.1 |
Shift-GCN[ | 90.7 | 96.5 |
PA-ResGCN-B19[ | 90.9 | 96.0 |
Dynamic GCN[ | 91.5 | 96.0 |
MS-G3D[ | 91.5 | 96.2 |
EfficientGCN-B4[ | 91.7 | 95.7 |
CTR-GCN[ | 92.4 | 96.8 |
PSUMNet[ | 92.9 | 96.7 |
本文算法 | 93.0 | 96.5 |
表3 相关算法在NTURGB+D数据集上的表现/%
Table 3 Performance of the related algorithms on the NTURGB+D dataset/%
算法 | CS | CV |
---|---|---|
VA-LSTM[ | 79.4 | 87.6 |
ST-GCN[ | 81.5 | 88.3 |
SR-TSL[ | 84.8 | 89.8 |
HCN[ | 86.5 | 91.1 |
AS-GCN[ | 86.8 | 94.2 |
2s-AGCN[ | 88.5 | 95.1 |
VA-CNN[ | 88.7 | 94.3 |
SGN[ | 89.0 | 94.5 |
AGC-LSTM[ | 89.2 | 95.0 |
DGNN[ | 89.9 | 96.1 |
Shift-GCN[ | 90.7 | 96.5 |
PA-ResGCN-B19[ | 90.9 | 96.0 |
Dynamic GCN[ | 91.5 | 96.0 |
MS-G3D[ | 91.5 | 96.2 |
EfficientGCN-B4[ | 91.7 | 95.7 |
CTR-GCN[ | 92.4 | 96.8 |
PSUMNet[ | 92.9 | 96.7 |
本文算法 | 93.0 | 96.5 |
算法 | C-Sub | C-Set |
---|---|---|
GCA-LSTM[ | 58.3 | 59.2 |
Clips+CNN+MTLN[ | 58.4 | 57.9 |
Two-Stream GCA-LSTM[ | 61.2 | 63.3 |
RotClips+MTCNN[ | 62.2 | 61.8 |
TSRJI[ | 67.9 | 59.7 |
SGN[ | 79.2 | 81.5 |
MV-IGNET[ | 83.9 | 85.6 |
4s Shift-GCN[ | 85.9 | 87.6 |
MS-G3D[ | 86.9 | 88.4 |
PA-ResGCN-B19[ | 87.3 | 88.3 |
EfficientGCN-B4[ | 88.3 | 89.1 |
CTR-GCN[ | 88.9 | 90.6 |
PoseC3D[ | 86.9 | 90.3 |
本文算法 | 88.5 | 89.8 |
表4 相关算法在NTURGB+D 120数据集上的表现/%
Table 4 Performance of the related algorithms on the NTURGB+D 120 dataset/%
算法 | C-Sub | C-Set |
---|---|---|
GCA-LSTM[ | 58.3 | 59.2 |
Clips+CNN+MTLN[ | 58.4 | 57.9 |
Two-Stream GCA-LSTM[ | 61.2 | 63.3 |
RotClips+MTCNN[ | 62.2 | 61.8 |
TSRJI[ | 67.9 | 59.7 |
SGN[ | 79.2 | 81.5 |
MV-IGNET[ | 83.9 | 85.6 |
4s Shift-GCN[ | 85.9 | 87.6 |
MS-G3D[ | 86.9 | 88.4 |
PA-ResGCN-B19[ | 87.3 | 88.3 |
EfficientGCN-B4[ | 88.3 | 89.1 |
CTR-GCN[ | 88.9 | 90.6 |
PoseC3D[ | 86.9 | 90.3 |
本文算法 | 88.5 | 89.8 |
[1] |
蒋圣南, 陈恩庆, 郑铭耀, 等. 基于ResNeXt的人体动作识别[J]. 图学学报, 2020, 41(2): 277-282.
DOI |
JIANG S N, CHEN E Q, ZHENG M Y, et al. Human action recognition based on ResNeXt[J]. Journal of Graphics, 2020, 41(2): 277-282 (in Chinese). | |
[2] | 安峰, 戴军, 韩振, 等. 引入注意力机制的自监督光流计算[J]. 图学学报, 2022, 43(5): 841-848. |
AN F, DAI J, HAN Z, et al. Self-supervised optical flow estimation with attention module[J]. Journal of Graphics, 2022, 43(5): 841-848 (in Chinese). | |
[3] | 杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181. |
YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese).
DOI |
|
[4] | YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[EB/OL]. [2023-08-22]. https://arxiv:1801.07455.pdf. |
[5] | LI M S, CHEN S H, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3590-3598. |
[6] | SHI L, ZHANG Y F, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12018-12027. |
[7] | THAKKAR K, NARAYANAN P J. Part-based graph convolutional network for action recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/1809.04983.pdf. |
[8] | SI C Y, CHEN W T, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1227-1236. |
[9] |
WEN Y H, GAO L, FU H B, et al. Graph CNNs with motif and variable temporal block for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8989-8996.
DOI URL |
[10] |
YE F F, TANG H M. Skeleton-based action recognition with JRR-GCN[J]. Electronics Letters, 2019, 55(17): 933-935.
DOI |
[11] | CHENG K, ZHANG Y F, HE X Y, et al. Skeleton-based action recognition with shift graph convolutional network[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 180-189. |
[12] | ZHANG P F, LAN C L, ZENG W J, et al. Semantics-guided neural networks for efficient skeleton-based human action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1109-1118. |
[13] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[14] | WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11531-11539. |
[15] |
WANG M S, NI B B, YANG X K. Learning multi-view interactional skeleton graph for action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6940-6954.
DOI URL |
[16] | ZHANG P F, LAN C L, XING J L, et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2136-2145. |
[17] | LI C, ZHONG Q Y, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[C]//The 27th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization. New York: ACM, 2018: 786-792. |
[18] | SI C Y, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// European Conference on Computer Vision. Cham: Springer, 2018: 106-121. |
[19] |
ZHANG P F, LAN C L, XING J L, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978.
DOI PMID |
[20] | SHI L, ZHANG Y F, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7904-7913. |
[21] | SONG Y F, ZHANG Z, SHAN C F, et al. Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1625-1633. |
[22] | YE F F, PU S L, ZHONG Q Y, et al. Dynamic GCN: context-enriched topology learning for skeleton-based action recognition[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 55-63. |
[23] | LIU Z Y, ZHANG H W, CHEN Z H, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 140-149. |
[24] |
SONG Y F, ZHANG Z, SHAN C F, et al. Constructing stronger and faster baselines for skeleton-based action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 1474-1488.
DOI URL |
[25] | CHEN Y X, ZHANG Z Q, YUAN C F, et al. Channel-wise topology refinement graph convolution for skeleton-based action recognition[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13339-13348. |
[26] | TRIVEDI N, SARVADEVABHATLA R K. PSUMNet: unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition[EB/OL]. [2023-08-22]. https://arxiv.org/abs/2208.05775.pdf. |
[27] | LIU J, WANG G, HU P, et al. Global context-aware attention LSTM networks for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3671-3680. |
[28] | KE Q H, BENNAMOUN M, AN S J, et al. A new representation of skeleton sequences for 3D action recognition[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4570-4579. |
[29] |
LIU J, WANG G, DUAN L Y, et al. Skeleton-based human action recognition with global context-aware attention LSTM networks[J]. IEEE Transactions on Image Processing, 2018, 27(4): 1586-1599.
DOI PMID |
[30] |
KE Q H, BENNAMOUN M, AN S J, et al. Learning clip representations for skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2842-2855.
DOI PMID |
[31] | CAETANO C, BRÉMOND F, SCHWARTZ W R. Skeleton image representation for 3D action recognition based on tree structure and reference joints[C]// 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images. New York: IEEE Press, 2019: 16-23. |
[32] | DUAN H D, ZHAO Y, CHEN K, et al. Revisiting skeleton-based action recognition[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2969-2978. |
[1] | 李大湘, 吉展, 刘颖, 唐垚. 改进YOLOv7遥感图像目标检测算法[J]. 图学学报, 2024, 45(4): 650-658. |
[2] | 魏敏, 姚鑫. 基于多尺度与注意力机制的两阶段风暴单体外推研究[J]. 图学学报, 2024, 45(4): 696-704. |
[3] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[4] | 曾志超, 徐玥, 王景玉, 叶元龙, 黄志开, 王欢. 基于SOE-YOLO轻量化的水面目标检测算法[J]. 图学学报, 2024, 45(4): 736-744. |
[5] | 李松洋, 王雪婷, 陈相龙, 陈恩庆. 基于骨骼点动态时域滤波的人体动作识别[J]. 图学学报, 2024, 45(4): 760-769. |
[6] | 武兵, 田莹. 基于注意力机制的多尺度道路损伤检测算法研究[J]. 图学学报, 2024, 45(4): 770-778. |
[7] | 赵磊, 李栋, 房建东, 曹琪. 面向交通标志的改进YOLO目标检测算法[J]. 图学学报, 2024, 45(4): 779-790. |
[8] | 梁成武, 杨杰, 胡伟, 蒋松琪, 钱其扬, 侯宁. 基于时间动态帧选择与时空图卷积的可解释骨架行为识别[J]. 图学学报, 2024, 45(4): 791-803. |
[9] | 李跃华, 仲新, 姚章燕, 胡彬. 基于改进YOLOv5s的着装不规范检测算法研究[J]. 图学学报, 2024, 45(3): 433-445. |
[10] | 张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453. |
[11] | 李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463. |
[12] | 路龙飞, 王峻峰, 赵世闻, 李广, 丁鑫涛. 基于力位感知技能学习的轴孔柔顺装配方法[J]. 图学学报, 2024, 45(2): 250-258. |
[13] | 吕伶, 李华, 王武. 基于增强特征提取网络与语义特征融合的多方向文本检测[J]. 图学学报, 2024, 45(1): 56-64. |
[14] | 翟永杰, 赵晓瑜, 王璐瑶, 王亚茹, 宋晓轲, 朱浩硕. IDD-YOLOv7:一种用于输电线路绝缘子多缺陷的轻量化检测方法[J]. 图学学报, 2024, 45(1): 90-101. |
[15] | 古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成[J]. 图学学报, 2024, 45(1): 102-111. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||