欢迎访问《图学学报》 分享到:

图学学报

• 图像处理与计算机视觉 • 上一篇    下一篇

基于长短时记忆和深度神经网络的视觉手势识别技术

  

  1. (1. 北京市物联网软件与系统工程技术研究中心,北京 100124;
    2. 北京工业大学信息学部,北京 100124)
  • 出版日期:2020-06-30 发布日期:2020-08-18
  • 基金资助:
    国家自然科学基金项目(61602016);北京市科技计划项目(D171100004017003)

Visual gesture recognition technology based on long short term memory and deep neural network

  1. (1. Software and System Engineering Technology Center, Beijing 100124, China;
    2. Faculty of Information, Beijing University of Technology, Beijing 100124, China)
  • Online:2020-06-30 Published:2020-08-18

摘要: 针对基于视觉的动态手势识别易受光照、背景和手势形状变化影响等问题,在分
析人体手势空间上下文特征的基础上,首先建立一种基于人体骨架和部件轮廓特征的动态手势
模型,并采用卷积姿势机和单发多框检测器技术构造深度神经网络进行人体手势骨架和部件轮
廓特征提取。其次,引入长短时记忆网络提取动态人体手势中骨架、左右手和头部轮廓的时序
特征,进而分类识别手势。在此基础上,设计了一种空间上下文与时序特征融合的动态手势识
别机(GRSCTFF),并通过交警指挥手势视频样本库对其进行网络训练和实验分析。实验证明,
该系统 可以快速准确识别动态交警指挥手势,准确率达到94.12%,并对光线、背景和手势形
状变化具有较强的抗干扰能力。

关键词: 手势识别, 空间上下文, 长短时记忆, 特征提取

Abstract:

Aiming at the problem that visual gesture recognition is susceptible to light conditions,
background information and changes in gesture shape, this paper analyzed the spatial context features
of human gestures. First, this paper established a dynamic gesture model based on the contour
features of human skeleton and body parts. The convolutional pose machine (CPM) and the single
shot multibox detector (SSD) technology were utilized to build deep neural network, so as to extract
the contour features of human gesture skeleton and body parts. Next, the long short term memory
(LSTM) network was introduced to extract the temporal features of skeleton, left and right hand, and
head contour in dynamic human gestures, so as to further classify and recognize gestures. On this
basis, this paper designed a dynamic gesture recognizer based on spatial context and temporal feature
fusion (GRSCTFF), and conducted network training and experimental analysis on GRSCTFF through
the video sample database of traffic police command gestures. The experimental results show that
GRSCTFF can quickly and accurately recognize the dynamic traffic police command gestures with an accuracy of 94.12%, and it has strong anti-interference ability to light, background and gesture shape changes.

Key words: gesture recognition, spatial context, long short term memory, feature extraction