Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (4): 760-769.DOI: 10.11996/JG.j.2095-302X.2024040760

• Image Processing and Computer Vision • Previous Articles     Next Articles

Human action recognition based on skeleton dynamic temporal filter

LI Songyang(), WANG Xueting, CHEN Xianglong, CHEN Enqing()   

  1. School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2023-10-08 Accepted:2024-02-20 Online:2024-08-31 Published:2024-09-03
  • Contact: CHEN Enqing
  • About author:First author contact:

    LI Songyang (1998-), master student. His main research interests cover computer vision and pattern recognition. E-mail:lisongyang1998@gs.zzu.edu.cn

  • Supported by:
    National Natural Science Foundation of China(62101503);National Natural Science Foundation of China(U1804152);Scientific and Technological Project of Henan(222102210102);National Supercomputing Center in Zhengzhou Project

Abstract:

Human action recognition is one of the key research areas in computer vision, with a wide range of applications such as human-computer interaction and intelligent surveillance. Existing methods for skeleton-based action recognition often combine graph convolutional networks (GCN) with temporal convolutional networks (TCN). However, the limited size of convolutional kernel restricts the models’ global temporal modeling capability. Moreover, applying convolutional kernel to skeletal data leads to a lack of discriminative power among different skeleton points. Furthermore, using TCN to extract features often entails repeated calculations, leading to an increase in the parameter quantity of TCN as the network deepens. To address these issues, signal processing methods were utilized, and skeleton dynamic temporal filtering (SDTF) module was proposed for skeleton action recognition to replace TCN for global modeling. Based on this, lightweight improvements were made to AGCN, reducing the complexity. SDTF modeled temporal features through Fourier transform, multiplying the frequency domain features obtained from Fourier transform with the filtered frequency domain output, and then undergoing inverse Fourier transform. Extensive experiments conducted on the NTU-RGBD and Kinetics-Skeleton datasets demonstrated that the proposed model significantly reduced network parameters and computational complexity, while achieving comparable or even superior recognition performance compared to the original model.

Key words: human action recognition, graph convolutional network, dynamic temporal filter, Fourier transform, temporal convolutional networks

CLC Number: