基于 ResNeXt 的人体动作识别

doi:10.11996/JG.j.2095-302X.2020020277

图学学报

• 图像处理与计算机视觉 • 上一篇下一篇

基于 ResNeXt 的人体动作识别

(郑州大学信息工程学院，河南郑州 450000)

出版日期:2020-04-30 发布日期:2020-05-15
基金资助:
国家自然科学基金项目(U1804152，61806180)

Human action recognition based on ResNeXt

(School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450000, China)

Online:2020-04-30 Published:2020-05-15

摘要/Abstract

摘要： 人体动作识别是计算机视觉领域的核心研究方向之一，在很多场合都有应用。深度卷积神经网络在静态图像识别方面已取得了巨大成功，并逐渐扩展到视频内容识别领域，但应用依然面临很大挑战。为此提出一种基于 ResNeXt 深度神经网络模型用于视频中的人体动作识别，主要包括：①使用新型 ResNeXt 网络结构代替原有的各种卷积神经网络结构，并使用 RGB 和光流 2 种模态的数据，使模型可充分地利用视频中动作外观及时序信息；②将端到端的视频时间分割策略应用于 ResNeXt 网络模型，同时将视频分为 K 段实现对视频序列的长范围时间结构进行建模，并通过测试得到最优视频分段值 K，使模型能更好地区分存在子动作共享现象的相似动作，解决某些由于子动作相似而易发生的误判问题。通过在动作识别数据集 UCF101 和 HMDB51 上进行的测试表明，该模型和方法的动作识别准确率性能优于目前文献中的一些模型和方法的性能。

关键词: 动作识别, ResNeXt, 视频时间分割, 数据增强, 多模态

Abstract: Human action recognition is one of the core research directions in the field of computer vision and is applied in many occasions. Deep convolutional neural networks have achieved great success in static image recognition and have gradually expanded into the field of video content recognition, but they still face great challenges in applications. This paper proposes a deep neural network model based on ResNeXt network for human action recognition in video. The main innovations of this paper include: ① The new ResNeXt network structure was used to replace the original convolutional neural network structure. Two kinds of modal data of RGB and optical flow was collected to make full use of the appearance and temporal order information in the video. ② The end-to-end video time segmentation strategy was applied to the proposed ResNeXt network model. The video was divided into K segments to model the long-range time structure of the video sequence, and the optimal value of K was obtained through tests, which enables the model to better distinguish the similar actions with sub-action sharing phenomenon and solve the problems of misjudgment that are easy to emerge due to similar sub-actions. Tests performed on the widely used action recognition data sets UCF101 and HMDB51 showed that the action recognition accuracy of the proposed model and method is better than that of the models and methods in the existing literature.

Key words: action recognition, ResNeXt, video temporal segmentation, data enhancement, multimodal

蒋圣南，陈恩庆，郑铭耀，段建康 . 基于 ResNeXt 的人体动作识别[J]. 图学学报, DOI: 10.11996/JG.j.2095-302X.2020020277.

JIANG Sheng-nan, CHEN En-qing, ZHEN Ming-yao, DUAN Jian-kang . Human action recognition based on ResNeXt[J]. Journal of Graphics, DOI: 10.11996/JG.j.2095-302X.2020020277.

[1]	陈昭俊, 储珺, 曾伦杰. 基于动态加权类别平衡损失的多类别口罩佩戴检测[J]. 图学学报, 2022, 43(4): 590-598.
[2]	李晓英, 余亚平. 基于多模态感官体验的儿童音画交互设计研究[J]. 图学学报, 2022, 43(4): 736-743.
[3]	邓壮林, 张绍兵, 成苗, 何莲. 多模态硬币图像单应性矩阵预测[J]. 图学学报, 2022, 43(3): 361-369.
[4]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.
[5]	胡俊, 顾晶晶, 王秋红. 基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43(2): 197-204.
[6]	杨世强, 杨江涛, 李卓, 王金华, 李德信. 基于 LSTM 神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
[7]	黄欢 , 孙力娟 , 曹莹 , 郭剑 , 任恒毅 . 基于注意力的短视频多模态情感分析[J]. 图学学报, 2021, 42(1): 8-14.
[8]	穆大强，李腾 . 基于多模态融合的人脸反欺骗技术[J]. 图学学报, 2020, 41(5): 750-756.
[9]	蒋杰，熊昌镇. 一种数据增强和多模型集成的细粒度分类算法[J]. 图学学报, 2018, 39(2): 244-250.
[10]	黄东晋1,2,姚院秋 1,丁友东 1,2,WEN Tang 3. 基于 Kinect 的虚拟健身跑锻炼系统[J]. 图学学报, 2017, 38(5): 789-795.
[11]	汪成峰1，陈洪1，张瑞萱2，朱德海1，王庆1，梅树立1. 带有关节权重的DTW 动作识别算法研究[J]. 图学学报, 2016, 37(4): 537-544.

基于 ResNeXt 的人体动作识别

Human action recognition based on ResNeXt

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价