Aiming at the problem that visual gesture recognition is susceptible to light conditions,
background information and changes in gesture shape, this paper analyzed the spatial context features
of human gestures. First, this paper established a dynamic gesture model based on the contour
features of human skeleton and body parts. The convolutional pose machine (CPM) and the single
shot multibox detector (SSD) technology were utilized to build deep neural network, so as to extract
the contour features of human gesture skeleton and body parts. Next, the long short term memory
(LSTM) network was introduced to extract the temporal features of skeleton, left and right hand, and
head contour in dynamic human gestures, so as to further classify and recognize gestures. On this
basis, this paper designed a dynamic gesture recognizer based on spatial context and temporal feature
fusion (GRSCTFF), and conducted network training and experimental analysis on GRSCTFF through
the video sample database of traffic police command gestures. The experimental results show that
GRSCTFF can quickly and accurately recognize the dynamic traffic police command gestures with an accuracy of 94.12%, and it has strong anti-interference ability to light, background and gesture shape changes.