欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (4): 625-639.DOI: 10.11996/JG.j.2095-302X.2023040625

• 综述 • 上一篇    下一篇

基于深度学习的视频人体动作识别综述

毕春艳1,2(), 刘越1,2()   

  1. 1.北京市混合现实与新型显示工程技术研究中心,北京 100081
    2.北京理工大学光电学院,北京 100081
  • 收稿日期:2022-10-21 接受日期:2023-04-01 出版日期:2023-08-31 发布日期:2023-08-16
  • 通讯作者: 刘越(1968-),男,教授,博士。主要研究方向为增强现实、计算机视觉等。E-mail:liuyue@bit.edu.cn
  • 作者简介:

    毕春艳(1995-),女,硕士研究生。主要研究方向为增强现实、计算机视觉和视频动作识别等。E-mail:bichunyan_suda@163.com

  • 基金资助:
    国家自然科学基金项目(61960206007);高等学校学科创新引智计划项目(B18005)

A survey of video human action recognition based on deep learning

BI Chun-yan1,2(), LIU Yue1,2()   

  1. 1. Beijing Mixed Reality and New Display Engineering Technology Research Center, Beijing 100081, China
    2. School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China
  • Received:2022-10-21 Accepted:2023-04-01 Online:2023-08-31 Published:2023-08-16
  • Contact: LIU Yue (1968-), professor, Ph.D. His main research interests cover augmented reality, computer vision, etc. E-mail:liuyue@bit.edu.cn
  • About author:

    BI Chun-yan (1995-), master student. Her main research interests cover augmented reality, computer vision and video action recognition, etc. E-mail:bichunyan_suda@163.com

  • Supported by:
    National Natural Science Foundation of China(61960206007);Introducing Talents of Discipline to Universities(B18005)

摘要:

随着网络多媒体技术的快速发展和视频采集设备的不断完善,越来越多的视频被共享到网络平台,视频逐渐占据了人类生活,因此视频理解已成为计算机视觉研究的热点之一。作为视频理解的首要任务,对动作识别的研究具有重要的意义。目前基于深度学习的二维图像识别分类方法已经取得了较大的进展,但是视频动作识别仍面临着巨大挑战。其原因在于视频和二维图像相差一个时间维度,对视频中行走、跑步、跳高和跳远等动作的理解不仅需要二维图像所具有的空间语义信息,还需要时序信息。因此,如何利用视频的时序信息对动作识别非常重要。首先介绍了动作识别的研究背景以及发展过程,分析了当前视频动作识别所面临的挑战,然后详细介绍了时序建模及参数优化的方法,分析了常用的动作识别数据集和度量参数,最后对未来的研究方向进行了展望。

关键词: 动作识别, 视频理解, 深度学习, 卷积神经网络, 计算机视觉

Abstract:

With the rapid advancement of network multimedia technology and the continuous improvement of video capture equipment, an increasing number of videos are shared on network platforms, gradually becoming an integral part of human life. Consequently, video understanding has become one of the hot spots of computer vision research, with video understanding being a pivotal task. At present, 2D image recognition classification methods based on deep learning have made significant strides. However, video action recognition still faces a formidable challenge. The reason is that videos differ from 2D images by an additional temporal dimension, and that understanding actions such as walking, running, high jumping, and long jumping in videos requires not only the spatial semantic information that 2D images possess but also temporal information. Therefore, effectively utilizing the temporal information of videos is critical for action recognition. This paper firstly introduced the research background and development process of action recognition, followed by an analysis of the current challenges in video action recognition. The methods of temporal modeling and parameter optimization were then presented in detail, along with an examination of the commonly used action recognition datasets and metric parameters. Finally, the paper outlined the future research directions in this field.

Key words: action recognition, video understanding, deep learning, convolutional neural network, computer vision

中图分类号: