Journal of Graphics
Previous Articles Next Articles
Online:
Published:
Abstract: Human action recognition is one of the core research directions in the field of computer vision and is applied in many occasions. Deep convolutional neural networks have achieved great success in static image recognition and have gradually expanded into the field of video content recognition, but they still face great challenges in applications. This paper proposes a deep neural network model based on ResNeXt network for human action recognition in video. The main innovations of this paper include: ① The new ResNeXt network structure was used to replace the original convolutional neural network structure. Two kinds of modal data of RGB and optical flow was collected to make full use of the appearance and temporal order information in the video. ② The end-to-end video time segmentation strategy was applied to the proposed ResNeXt network model. The video was divided into K segments to model the long-range time structure of the video sequence, and the optimal value of K was obtained through tests, which enables the model to better distinguish the similar actions with sub-action sharing phenomenon and solve the problems of misjudgment that are easy to emerge due to similar sub-actions. Tests performed on the widely used action recognition data sets UCF101 and HMDB51 showed that the action recognition accuracy of the proposed model and method is better than that of the models and methods in the existing literature.
Key words: action recognition, ResNeXt, video temporal segmentation, data enhancement, multimodal
JIANG Sheng-nan, CHEN En-qing, ZHEN Ming-yao, DUAN Jian-kang . Human action recognition based on ResNeXt[J]. Journal of Graphics, DOI: 10.11996/JG.j.2095-302X.2020020277.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2020020277
http://www.txxb.com.cn/EN/Y2020/V41/I2/277