融合行人时空信息的视频异常检测

doi:10.11996/JG.j.2095-302X.2023010095

图学学报 ›› 2023, Vol. 44 ›› Issue (1): 95-103.DOI: 10.11996/JG.j.2095-302X.2023010095

• 图像处理与计算机视觉 • 上一篇下一篇

融合行人时空信息的视频异常检测

闫善武(), 肖洪兵, 王瑜(), 孙梅

北京工商大学人工智能学院，北京 100048

收稿日期:2022-07-04 修回日期:2022-08-27 出版日期:2023-10-31 发布日期:2023-02-16
通讯作者: 王瑜
作者简介:闫善武(1996-)，男，硕士研究生。主要研究方向为视频异常检测、图像处理。E-mail：18339729107@163.com
基金资助:
北京市自然科学基金-北京市教育委员会科技计划重点项目(KZ202110011015)

Video anomaly detection combining pedestrian spatiotemporal information

YAN Shan-wu(), XIAO Hong-bing, WANG Yu(), SUN Mei

School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

Received:2022-07-04 Revised:2022-08-27 Online:2023-10-31 Published:2023-02-16
Contact: WANG Yu
About author:YAN Shan-wu (1996-), master student. His main research interests cover video anomaly detection, image processing. E-mail：18339729107@163.com
Supported by:
Beijing Natural Science Foundation - Key Project of Science and Technology Program of Beijing Municipal Education Commission(KZ202110011015)

摘要/Abstract

摘要：

针对目前视频异常检测不能充分利用时序信息且忽视正常行为多样性的问题，提出了一种融合行人时空信息的异常检测方法。以卷积自编码器为基础，通过其中的编码器和解码器对输入帧进行压缩和还原，并根据输出帧与真实值的差异实现异常检测。为了加强视频连续帧之间的特征信息联系，引入残差时间移位模块和残差通道注意力模块，分别提升网络对时间信息和通道信息的建模能力。考虑到卷积神经网络(CNN)过度的泛化性，在编解码器各层的跳跃连接之间加入记忆增强模块，限制自编码器对异常帧过于强大的表示能力，提高网络的异常检测精度。此外，通过一种特征离散性损失来修正目标函数，有效区分不同的正常行为模式。在CUHK Avenue和ShanghaiTech数据集上的实验结果表明，该方法在满足实时性要求的同时，优于当前主流的视频异常检测方法。

关键词: 视频异常检测, 无监督学习, 时空双流网络, 自编码器

Abstract:

To address the current problem that video anomaly detection cannot make full use of temporal information and ignores the diversity of normal behaviors, an anomaly detection method incorporating pedestrian spatiotemporal information was proposed. Based on the convolutional auto-encoder, the input frames were compressed and reduced by the encoder and decoder in it, and the anomaly detection was realized according to the difference between the output frames and the real value. In order to strengthen the feature information connection between consecutive frames of the video, the residual time shift module and the residual channel attention module were introduced to enhance the network's ability to model temporal and channel information, respectively. Considering the overgeneralization of the convolutional neural networks (CNN), a memory-augmented module was added between the skip connections of each layer of the encoder and decoder to limit the overpowering representation of anomalous frames by the auto-encoder and improve the anomaly detection accuracy of the network. In addition, the objective function was modified by a kind of feature separateness loss to effectively distinguish different normal behavior patterns. Experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method outperforms the current mainstream video anomaly detection methods while meeting the real-time requirements.

Key words: video anomaly detection, unsupervised learning, spatiotemporal two-stream network, auto-encoder

中图分类号:

TP391

闫善武, 肖洪兵, 王瑜, 孙梅. 融合行人时空信息的视频异常检测[J]. 图学学报, 2023, 44(1): 95-103.

YAN Shan-wu, XIAO Hong-bing, WANG Yu, SUN Mei. Video anomaly detection combining pedestrian spatiotemporal information[J]. Journal of Graphics, 2023, 44(1): 95-103.

图/表 12

图1 视频异常检测网络

Fig. 1 The network of video anomaly detection

图2 时间移位操作

Fig. 2 Temporal shift operation

图3 残差时间移位模块

Fig. 3 Residual temporal shift module

图4 残差通道注意力模块

Fig. 4 Residual channel attention module

图5 改进后的空间(时间)子网络

Fig. 5 Improved spatial (temporal) sub-network

图6 2个数据集上的检测结果((a) Avenue检测结果；(b) ShanghaiTech检测结果)

Fig. 6 Test results on two datasets ((a) Test results on Avenue; (b) Test results on ShanghaiTech)

图7 ShanghaiTech的可视化((a)输入帧；(b)注意力图；(c)预测误差)

Fig. 7 The visualization for ShanghaiTech ((a) The input frames; (b) Attention maps; (c) Prediction error)

表1 帧级检测平均AUC (%)

Table 1 Average AUC for frame level detection (%)

方法	Avenue	ShanghaiTech
Conv-AE^[7]	80.0	60.9
Mem-AE^[16]	83.3	71.2
Pred.&Recon.^[12]	83.7	71.5
Frame-Pred^[11]	85.1	72.8
DDGAN^[10]	85.6	73.7
AnoPCN^[21]	86.2	73.6
ROADMAP^[22]	88.3	76.6
Ours	88.5	77.2

表2 目标函数中超参数对AUC的影响

Table 2 The influence of hyperparameters in the objective function on AUC

设置	参数				AUC (%)
设置	α_s	β_s	γ_i	γ_x	AUC (%)
1	0.2	0.2	0.4	0.6	87.0
2	0.2	0.2	0.6	0.4	87.1
3	0.2	0.2	0.5	0.5	88.0
4	0.1	0.1	0.4	0.6	87.4
5	0.1	0.1	0.6	0.4	87.5
6	0.1	0.1	0.5	0.5	88.5

表3 网络中各模块的性能评估

Table 3 Performance evaluation of each module of the network

方法	RTSM	RCAM	记忆增强模块	L_s	AUC (%)
Baseline	×	×	×	×	78.4
本文方法	√	×	×	×	80.7
	√	√	×	×	83.2
	√	√	√	×	88.2
	√	√	√	√	88.5

表4 单流网络与双流网络的对比

Table 4 Comparison of single-stream network and two-stream network

方法	Avenue	ShanghaiTech
空间子网络	86.2	73.5
时间子网络	88.3	75.7
时空双流网络	88.5	77.2

图8 2个数据集上的ROC曲线((a) Avenue的ROC曲线；(b) ShanghaiTech的ROC曲线)

Fig. 8 ROC curve on two datasets ((a) ROC curve on Avenue; (b) ROC curve on ShanghaiTech)

参考文献 22

[1]	王媛媛. 基于增强时空特征的视频异常检测算法研究[D]. 北京: 北京交通大学, 2021.
	WANG Y Y. Research on video anomaly detection algorithm based on enhanced spatio-temporal features[D]. Beijing: Beijing Jiaotong University, 2021 (in Chinese).
[2]	王志国, 章毓晋. 监控视频异常检测: 综述[J]. 清华大学学报: 自然科学版, 2020, 60(6): 518-529.
	WANG Z G, ZHANG Y J. Anomaly detection in surveillance videos: a survey[J]. Journal of Tsinghua University: Science and Technology, 2020, 60(6): 518-529 (in Chinese).
[3]	KIRAN B, THOMAS D, PARAKKAL R. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos[J]. Journal of Imaging, 2018, 4(2): 36. DOI URL
[4]	KRATZ L, NISHINO K. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 1446-1453.
[5]	ZHANG Y, LU H C, ZHANG L H, et al. Video anomaly detection based on locality sensitive hashing filters[J]. Pattern Recognition, 2016, 59: 302-311. DOI URL
[6]	WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79. DOI URL
[7]	HASAN M, CHOI J, NEUMANN J, et al. Learning temporal regularity in video sequences[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 733-742.
[8]	LUO W X, LIU W, GAO S H. Remembering history with convolutional LSTM for anomaly detection[C]//2017 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2017: 439-444.
[9]	DEEPAK K, CHANDRAKALA S, MOHAN C K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection[J]. Signal, Image and Video Processing, 2021, 15(1): 215-222. DOI
[10]	YAN S J, LIU Y, LI J B, et al. DDGAN: double discriminators GAN for accurate image colorization[C]// 2020 6th International Conference on Big Data and Information Analytics. New York: IEEE Press, 2020: 214-219.
[11]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection - A new baseline[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6536-6545.
[12]	TANG Y, ZHAO L, ZHANG S S, et al. Integrating prediction and reconstruction for anomaly detection[J]. Pattern Recognition Letters, 2020, 129: 123-130. DOI URL
[13]	CHANG Y P, TU Z G, XIE W, et al. Video anomaly detection with spatio-temporal dissociation[J]. Pattern Recognition, 2022, 122: 108213. DOI URL
[14]	李自强, 王正勇, 陈洪刚, 等. 基于外观和动作特征双预测模型的视频异常行为检测[J]. 计算机应用, 2021, 41(10): 2997-3003. DOI
	LI Z Q, WANG Z Y, CHEN H G, et al. Video abnormal behavior detection based on dual prediction model of appearance and motion features[J]. Journal of Computer Applications, 2021, 41(10): 2997-3003 (in Chinese). DOI
[15]	LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 7082-7092.
[16]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[17]	GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1705-1714.
[18]	CHEN W H, CHEN X T, ZHANG J G, et al. Beyond triplet loss: a deep quadruplet network for person re-identification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1320-1329.
[19]	PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14360-14369.
[20]	LU C W, SHI J P, JIA J Y. Abnormal event detection at 150 FPS in MATLAB[C]//2013 IEEE International Conference on Computer Vision. New York: IEEE Press, 2013: 2720-2727.
[21]	YE M C, PENG X J, GAN W H, et al. AnoPCN: video anomaly detection via deep predictive coding network[C]//The 27th ACM International Conference on Multimedia. New York: ACM, 2019: 1805-1813.
[22]	WANG X Z, CHE Z P, JIANG B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(6): 2301-2312. DOI URL

融合行人时空信息的视频异常检测

Video anomaly detection combining pedestrian spatiotemporal information

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 22

相关文章 4

编辑推荐

Metrics

本文评价

[1]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭. 基于改进教师学生网络的隧道火灾检测[J]. 图学学报, 2023, 44(5): 978-987.
[2]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[3]	吕浩, 易鹏飞, 刘瑞, 周东生, 张强, 魏小鹏. 用于视频异常检测的时序多尺度自编码器[J]. 图学学报, 2022, 43(2): 223-229.
[4]	温利龙，徐丹，张熹，钱文华. 基于生成模型的古壁画非规则破损部分修复方法[J]. 图学学报, 2019, 40(5): 925-931.