融合非局部神经网络的行为检测模型

doi:10.11996/JG.j.2095-302X.2021030439

图学学报 ›› 2021, Vol. 42 ›› Issue (3): 439-445.DOI: 10.11996/JG.j.2095-302X.2021030439

• 图像处理与计算机视觉 • 上一篇下一篇

融合非局部神经网络的行为检测模型

桂林电子科技大学计算机与信息安全学院，广西桂林 541004

出版日期:2021-06-30 发布日期:2021-06-29
基金资助:
广西图像图形智能处理重点实验室培育基地(桂林电子科技大学)开放基金项目(GIIP2011)

Action detection model fused with non-local neural network

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Online:2021-06-30 Published:2021-06-29
Supported by:
Open Funds from Guilin University of Electronic Technology, Guangxi Key Laboratory of Image and Graphic Intelligent Processing (GIIP2011)

摘要/Abstract

摘要： 针对在视频行为检测中卷积神经网络(CNN)对时域信息理解能力不足的问题，提出了一种融合非局部神经网络的行为检测模型。模型采用一种双分支的 CNN 结构，分别提取视频的空间特征和运动特征。将视频单帧和视频连续帧序列作为网络输入，空间网络对视频当前帧进行 2D CNN 特征提取，时空网络采用融合非局部模块的 3D CNN 来捕获视频帧之间的全局联系。为了进一步增强上下文语义信息，使用一种通道融合机制来聚合双分支网络的特征，最后将融合后的特征用于帧级检测。在 UCF101-24 和 JHMDB 2 个数据集上进行了实验，结果表明，该方法能够充分融合空间和时间维度信息，在基于视频的时空行为检测任务上具有较高的检测精度。

关键词: 行为检测, 非局部模块, 3D 卷积, 注意力机制

Abstract: The convolutional neural network (CNN) has insufficient ability to understand the time domain information in video action detection. For this problem, we proposed a model based on fused non-local neural network, which combines non-local block with 3D CNN to capture global connections between video frames. Model used a two-stream architecture of 2D CNN and 3D CNN to extract the spatial and motion features of the video, respectively, which takes video single frames and video frame sequences as inputs. To further enhance contextual semantic information, an improved attention and channel fusion mechanism is used to aggregate the features of the above two networks, and finally the fused features are used for frame-level detection. We conducted experimental verification and comparison on the UCF101-24 and JHMDB data set. The results show that our method can fully integrate spatial and temporal information, and has high detection accuracy on video-based action detection tasks.

Key words: action detection, non-local neural network, 3D convolution, attention mechanism

中图分类号:

TP 391

黄文明, 阳沐利, 蓝如师, 邓珍荣, 罗笑南. 融合非局部神经网络的行为检测模型 [J]. 图学学报, 2021, 42(3): 439-445.

HUANG Wen-ming, YANG Mu-li, LAN Ru-shi, DENG Zhen-rong, LUO Xiao-nan . Action detection model fused with non-local neural network[J]. Journal of Graphics, 2021, 42(3): 439-445.

[1]	张盾, 黄志开, 王欢, 吴义鹏, 王颖, 邹家豪. 基于多尺度特征实现超参进化的野生菌分类研究与应用[J]. 图学学报, 2022, 43(4): 580-589.
[2]	贺琪, 李汶龙, 宋巍, 杜艳玲, 黄冬梅, 耿立佳 . 结合残差时空注意力机制的海面温度预测算法[J]. 图学学报, 2022, 43(4): 677-684.
[3]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.
[4]	白静, 孟庆亮, 徐昊, 范有福, 杨瞻源. ST-Rec3D：基于结构和目标感知的三维重建[J]. 图学学报, 2022, 43(3): 469-477.
[5]	李扬科, 宋全博, 周元峰. 用于手势识别的时空融合网络以及虚拟签名系统[J]. 图学学报, 2022, 43(3): 504-512.
[6]	张明, 张芳慧, 宗佳平, 宋治, 岑翼刚, 张琳娜. 基于轻量级网络的人脸检测及嵌入式实现[J]. 图学学报, 2022, 43(2): 239-246.
[7]	苏常保, 龚世才. 基于深度学习的人物肖像全自动抠图算法[J]. 图学学报, 2022, 43(2): 247-253.
[8]	李翠云, 白静, 郑凉. 融合边缘增强注意力机制和 U-Net 网络的医学图像分割[J]. 图学学报, 2022, 43(2): 273-278.
[9]	何国忠, 梁宇. 基于卷积神经网络的 PCB 缺陷检测[J]. 图学学报, 2022, 43(1): 21-27.
[10]	唐晓天 , 马骏 , 李峰 , 杨雪 , 梁亮 . 基于多尺度时域 3D 卷积的视频超分辨率重建[J]. 图学学报, 2022, 43(1): 53-59.
[11]	史彩娟, 陈厚儒, 葛录录, 王子雯. 注意力残差多尺度特征增强的显著性实例分割[J]. 图学学报, 2021, 42(6): 883-890.
[12]	杨世强, 杨江涛, 李卓, 王金华, 李德信. 基于 LSTM 神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
[13]	李彬 , 王平 , 赵思逸 . 基于双重注意力机制的图像超分辨重建算法[J]. 图学学报, 2021, 42(2): 206-215.
[14]	黄欢 , 孙力娟 , 曹莹 , 郭剑 , 任恒毅 . 基于注意力的短视频多模态情感分析[J]. 图学学报, 2021, 42(1): 8-14.
[15]	常东良 , 尹军辉 , 谢吉洋 , 孙维亚 , 马占宇 . 面向图像分类的基于注意力引导的 Dropout[J]. 图学学报, 2021, 42(1): 32-36.

融合非局部神经网络的行为检测模型

Action detection model fused with non-local neural network

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价