Action detection model fused with non-local neural network

doi:10.11996/JG.j.2095-302X.2021030439

Journal of Graphics ›› 2021, Vol. 42 ›› Issue (3): 439-445.DOI: 10.11996/JG.j.2095-302X.2021030439

• Image Processing and Computer Vision • Previous Articles Next Articles

Action detection model fused with non-local neural network

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Online:2021-06-30 Published:2021-06-29
Supported by:
Open Funds from Guilin University of Electronic Technology, Guangxi Key Laboratory of Image and Graphic Intelligent Processing (GIIP2011)

Abstract

Abstract: The convolutional neural network (CNN) has insufficient ability to understand the time domain information in video action detection. For this problem, we proposed a model based on fused non-local neural network, which combines non-local block with 3D CNN to capture global connections between video frames. Model used a two-stream architecture of 2D CNN and 3D CNN to extract the spatial and motion features of the video, respectively, which takes video single frames and video frame sequences as inputs. To further enhance contextual semantic information, an improved attention and channel fusion mechanism is used to aggregate the features of the above two networks, and finally the fused features are used for frame-level detection. We conducted experimental verification and comparison on the UCF101-24 and JHMDB data set. The results show that our method can fully integrate spatial and temporal information, and has high detection accuracy on video-based action detection tasks.

Key words: action detection, non-local neural network, 3D convolution, attention mechanism

CLC Number:

TP 391

HUANG Wen-ming, YANG Mu-li, LAN Ru-shi, DENG Zhen-rong, LUO Xiao-nan . Action detection model fused with non-local neural network[J]. Journal of Graphics, 2021, 42(3): 439-445.

[1]	HE Guo-zhong, LIANG Yu. PCB defect detection based on convolutional neural network [J]. Journal of Graphics, 2022, 43(1): 21-27.
[2]	TANG Xiao-tian, MA Jun , LI Feng , YANG Xue , LIANG Liang. Video super-resolution reconstruction based on multi-scale time domain 3D convolution [J]. Journal of Graphics, 2022, 43(1): 53-59.
[3]	HUANG Huan , SUN Li-juan, CAO Ying , GUO Jian, REN Heng-yi. Multimodal sentiment analysis of short videos based on attention [J]. Journal of Graphics, 2021, 42(1): 8-14.

Action detection model fused with non-local neural network

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

Comments