Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2021, Vol. 42 ›› Issue (3): 439-445.DOI: 10.11996/JG.j.2095-302X.2021030439

• Image Processing and Computer Vision • Previous Articles     Next Articles

Action detection model fused with non-local neural network

  

  1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Online:2021-06-30 Published:2021-06-29
  • Supported by:
    Open Funds from Guilin University of Electronic Technology, Guangxi Key Laboratory of Image and Graphic Intelligent Processing (GIIP2011)

Abstract: The convolutional neural network (CNN) has insufficient ability to understand the time domain information in video action detection. For this problem, we proposed a model based on fused non-local neural network, which combines non-local block with 3D CNN to capture global connections between video frames. Model used a two-stream architecture of 2D CNN and 3D CNN to extract the spatial and motion features of the video, respectively, which takes video single frames and video frame sequences as inputs. To further enhance contextual semantic information, an improved attention and channel fusion mechanism is used to aggregate the features of the above two networks, and finally the fused features are used for frame-level detection. We conducted experimental verification and comparison on the UCF101-24 and JHMDB data set. The results show that our method can fully integrate spatial and temporal information, and has high detection accuracy on video-based action detection tasks. 

Key words: action detection, non-local neural network, 3D convolution, attention mechanism 

CLC Number: