Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (4): 625-639.DOI: 10.11996/JG.j.2095-302X.2023040625

• Review • Previous Articles     Next Articles

A survey of video human action recognition based on deep learning

BI Chun-yan1,2(), LIU Yue1,2()   

  1. 1. Beijing Mixed Reality and New Display Engineering Technology Research Center, Beijing 100081, China
    2. School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • Received:2022-10-21 Accepted:2023-04-01 Online:2023-08-31 Published:2023-08-16
  • Contact: LIU Yue (1968-), professor, Ph.D. His main research interests cover augmented reality, computer vision, etc. E-mail:liuyue@bit.edu.cn
  • About author:

    BI Chun-yan (1995-), master student. Her main research interests cover augmented reality, computer vision and video action recognition, etc. E-mail:bichunyan_suda@163.com

  • Supported by:
    National Natural Science Foundation of China(61960206007);Introducing Talents of Discipline to Universities(B18005)


With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.

Key words: action recognition, video understanding, deep learning, convolutional neural network, computer vision

CLC Number: