Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (1): 26-34.DOI: 10.11996/JG.j.2095-302X.2024010026

• Image Processing and Computer Vision • Previous Articles     Next Articles

Human action recognition algorithm based on semantics guided neural networks

GUO Zongyang1(), LIU Lidong1(), JIANG Donghua2, LIU Zixiang1, ZHU Shukang1, CHEN Jinghua1   

  1. 1. School of Information Engineering, Chang’an University, Xi’an Shaanxi 710064, China
    2. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou Guangdong 510006, China
  • Received:2023-09-06 Accepted:2023-11-12 Online:2024-02-29 Published:2024-02-29
  • Contact: LIU Lidong (1982-), professor, Ph.D. His main research interests cover graphic image processing, computer vision, etc. E-mail:liulidong@chd.edu.cn
  • About author:

    GUO Zongyang (2000-), master student. His main research interests cover digital image processing and human action recognition, etc.
    E-mail:gzy000119@chd.edu.cn

  • Supported by:
    National Natural Science Foundation of China(52172379)

Abstract:

In recent years, modeling the three-dimensional coordinates of skeletal joints using deep feedforward neural networks has become a trend. However, challenges such as low recognition accuracy, huge parametric volume, and poor real-time performance still persist in the field of skeletal data-based action recognition. In response, an improved network model built upon semantic-guided networks (SGN) was proposed. Firstly, a non-local feature extraction module was integrated into the original network to enhance its training and prediction performance in advanced semantic guidance models, thereby decreasing its computational complexity and inference time in natural language processing tasks. Secondly, an attention mechanism was implemented to learn the channel weights of each convolutional network layer and lessen the redundant information between channels, thus further enhancing the computational efficiency and recognition accuracy of the model. Additionally, a deformable convolution module was employed to dynamically learn the weights of different graph convolutional network (GCN) layer channels and effectively aggregate the joint features across different channels for the final classification of the network, thereby boosting the utilization of feature information. Finally, human action recognition experiments were conducted on the public datasets NTU RGB+D and NTU RGB+D 120. The numerical results demonstrated that the proposed network was an order of magnitude smaller than most networks, and it significantly outperformed the original network and several other state-of-the-art algorithms in terms of recognition accuracy.

Key words: human action recognition, graph convolutional network, semantics guided neural network, non-local feature extraction, attention mechanism, deformable convolution

CLC Number: