欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (1): 26-34.DOI: 10.11996/JG.j.2095-302X.2024010026

• 图像处理与计算机视觉 • 上一篇    下一篇

基于语义引导神经网络的人体动作识别算法

郭宗洋1(), 刘立东1(), 蒋东华2, 刘子翔1, 朱熟康1, 陈京华1   

  1. 1.长安大学信息工程学院,陕西 西安 710064
    2.中山大学计算机学院,广东 广州 510006
  • 收稿日期:2023-09-06 接受日期:2023-11-12 出版日期:2024-02-29 发布日期:2024-02-29
  • 通讯作者:刘立东(1982-),男,教授,博士。主要研究方向为图像处理与计算机视觉等。E-mail:liulidong@chd.edu.cn
  • 第一作者:郭宗洋(2000-),男,硕士研究生。主要研究方向为图像处理与人体动作识别等。E-mail:gzy000119@chd.edu.cn
  • 基金资助:
    国家自然科学基金项目(52172379)

Human action recognition algorithm based on semantics guided neural networks

GUO Zongyang1(), LIU Lidong1(), JIANG Donghua2, LIU Zixiang1, ZHU Shukang1, CHEN Jinghua1   

  1. 1. School of Information Engineering, Chang’an University, Xi’an Shaanxi 710064, China
    2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou Guangdong 510006, China
  • Received:2023-09-06 Accepted:2023-11-12 Published:2024-02-29 Online:2024-02-29
  • First author:GUO Zongyang (2000-), master student. His main research interests cover digital image processing and human action recognition, etc. E-mail:gzy000119@chd.edu.cn
  • Supported by:
    National Natural Science Foundation of China(52172379)

摘要:

近年来,采用深度前馈神经网络对骨骼关节的三维坐标建模成为了一种趋势。但网络识别准确率低、巨大的参数量以及实时性差仍然是基于骨骼数据动作识别领域中急需解决的问题。为此,提出一种基于语义引导神经网络(SGN)改进的网络模型。首先,在原网络中引入了非局部特征提取模块用于增强其在高级语义指导模型训练和预测的表现,降低了其在自然语言处理任务中的计算复杂性和推理时间;其次,引入注意力机制学习每个图卷积网络层的通道权重并减少通道间的冗余信息,进一步提高模型的计算效率和识别准确率;此外,以可变形卷积模块动态学习不同图卷积网络(GCN)层通道的权重,并有效地聚合不同通道中的关节特征用于网络最后的分类识别,从而提高特征信息的利用率。最后,在NTU RGB+D和NTU RGB+D 120公开数据集上进行人体动作识别实验。实验结果表明,所提出的网络比大多数网络小一个数量级,并且在识别准确率上明显优于原网络和其他一些先进的算法。

长安大学刘立东教授及其学生郭宗洋提出基于语义引导神经网络的人体动作识别算法,主要在语义引导神经网络的关节级模块和帧级模块进行了如下改进:引入非局部特征提取模块降低模型计算和推理的时间;引入ECA注意力机制学习每个图卷积网络层的通道权重,从而进一步提高模型的计算效率和识别准确率;最后,以可变形卷积模块动态学习不同图卷积网络层通道的权重从而提高特征信息的利用率。

关键词: 人体动作识别, 图卷积网络, 语义引导神经网络, 非局部特征提取, 注意力机制, 可变形卷积

Abstract:

In recent years, modeling the three-dimensional coordinates of skeletal joints using deep feedforward neural networks has become a trend. However, challenges such as low recognition accuracy, huge parametric volume, and poor real-time performance still persist in the field of skeletal data-based action recognition. In response, an improved network model built upon semantic-guided networks (SGN) was proposed. Firstly, a non-local feature extraction module was integrated into the original network to enhance its training and prediction performance in advanced semantic guidance models, thereby decreasing its computational complexity and inference time in natural language processing tasks. Secondly, an attention mechanism was implemented to learn the channel weights of each convolutional network layer and lessen the redundant information between channels, thus further enhancing the computational efficiency and recognition accuracy of the model. Additionally, a deformable convolution module was employed to dynamically learn the weights of different graph convolutional network (GCN) layer channels and effectively aggregate the joint features across different channels for the final classification of the network, thereby boosting the utilization of feature information. Finally, human action recognition experiments were conducted on the public datasets NTU RGB+D and NTU RGB+D 120. The numerical results demonstrated that the proposed network was an order of magnitude smaller than most networks, and it significantly outperformed the original network and several other state-of-the-art algorithms in terms of recognition accuracy.

Key words: human action recognition, graph convolutional network, semantics guided neural network, non-local feature extraction, attention mechanism, deformable convolution

中图分类号: