欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 760-769.DOI: 10.11996/JG.j.2095-302X.2024040760

• 图像处理与计算机视觉 • 上一篇    下一篇

基于骨骼点动态时域滤波的人体动作识别

李松洋(), 王雪婷, 陈相龙, 陈恩庆()   

  1. 郑州大学电气与信息工程学院,河南 郑州 450001
  • 收稿日期:2023-10-08 接受日期:2024-02-20 出版日期:2024-08-31 发布日期:2024-09-03
  • 通讯作者:陈恩庆(1977-),男,教授,博士。主要研究方向为计算机视觉、模式识别和多媒体信息处理,E-mail:ieeqchen@zzu.edu.cn
  • 第一作者:李松洋(1998-),男,硕士研究生。主要研究方向为计算机视觉与模式识别。E-mail:lisongyang1998@gs.zzu.edu.cn
  • 基金资助:
    国家自然科学基金项目(62101503);国家自然科学基金项目(U1804152);河南省科技攻关项目(222102210102);国家超级计算郑州中心支持项目

Human action recognition based on skeleton dynamic temporal filter

LI Songyang(), WANG Xueting, CHEN Xianglong, CHEN Enqing()   

  1. School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2023-10-08 Accepted:2024-02-20 Published:2024-08-31 Online:2024-09-03
  • Contact: CHEN Enqing (1977-), professor, Ph.D. His main research interests cover computer vision, pattern recognition and multimedia information processing. E-mail:ieeqchen@zzu.edu.cn
  • First author:LI Songyang (1998-), master student. His main research interests cover computer vision and pattern recognition. E-mail:lisongyang1998@gs.zzu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62101503);National Natural Science Foundation of China(U1804152);Scientific and Technological Project of Henan(222102210102);National Supercomputing Center in Zhengzhou Project

摘要:

人体动作识别是计算机视觉的重要研究方向,广泛应用于智能监控、人机交互等领域。现有基于骨骼点的动作识别方法多采用图卷积网络(GCN)和时间卷积网络(TCN)级联的方式实现,而后者卷积核的尺寸限制了模型的全局时间建模能力。此外,仅使用卷积处理骨骼点数据缺乏对于不同骨骼点的区分能力,并且TCN提取特征时往往会重复计算,使得TCN的参数量随着网络层数的加深而增大。借助信号处理的方法提出了一种适用于骨骼点的动态时域滤波模块(SDTF),用于代替TCN对时间特征进行全局建模,并在此基础上对AGCN进行轻量化改进,提出的AGCN-SDTF动作识别模型降低了模型复杂度。SDTF通过傅里叶变换对时间特征进行建模,将傅里叶变换得到的频域特征与滤波得到的频域输出相乘再经过傅里叶逆变换,从而实现对全局时间特征的提取。在NTU-RGBD和Kinetics-Skeleton大型数据集上的实验结果表明,该模型在达到与原模型相同的识别效果时,降低了模型所需的参数量和计算量。

关键词: 人体动作识别, 图卷积网络, 动态时域滤波, 傅里叶变换, 时间卷积网络

Abstract:

Human action recognition is one of the key research areas in computer vision, with a wide range of applications such as human-computer interaction and intelligent surveillance. Existing methods for skeleton-based action recognition often combine graph convolutional networks (GCN) with temporal convolutional networks (TCN). However, the limited size of convolutional kernel restricts the models’ global temporal modeling capability. Moreover, applying convolutional kernel to skeletal data leads to a lack of discriminative power among different skeleton points. Furthermore, using TCN to extract features often entails repeated calculations, leading to an increase in the parameter quantity of TCN as the network deepens. To address these issues, signal processing methods were utilized, and skeleton dynamic temporal filtering (SDTF) module was proposed for skeleton action recognition to replace TCN for global modeling. Based on this, lightweight improvements were made to AGCN, reducing the complexity. SDTF modeled temporal features through Fourier transform, multiplying the frequency domain features obtained from Fourier transform with the filtered frequency domain output, and then undergoing inverse Fourier transform. Extensive experiments conducted on the NTU-RGBD and Kinetics-Skeleton datasets demonstrated that the proposed model significantly reduced network parameters and computational complexity, while achieving comparable or even superior recognition performance compared to the original model.

Key words: human action recognition, graph convolutional network, dynamic temporal filter, Fourier transform, temporal convolutional networks

中图分类号: