Video anomaly detection combining pedestrian spatiotemporal information

doi:10.11996/JG.j.2095-302X.2023010095

Abstract

Abstract:

To address the current problem that video anomaly detection cannot make full use of temporal information and ignores the diversity of normal behaviors, an anomaly detection method incorporating pedestrian spatiotemporal information was proposed. Based on the convolutional auto-encoder, the input frames were compressed and reduced by the encoder and decoder in it, and the anomaly detection was realized according to the difference between the output frames and the real value. In order to strengthen the feature information connection between consecutive frames of the video, the residual time shift module and the residual channel attention module were introduced to enhance the network's ability to model temporal and channel information, respectively. Considering the overgeneralization of the convolutional neural networks (CNN), a memory-augmented module was added between the skip connections of each layer of the encoder and decoder to limit the overpowering representation of anomalous frames by the auto-encoder and improve the anomaly detection accuracy of the network. In addition, the objective function was modified by a kind of feature separateness loss to effectively distinguish different normal behavior patterns. Experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method outperforms the current mainstream video anomaly detection methods while meeting the real-time requirements.

Key words: video anomaly detection, unsupervised learning, spatiotemporal two-stream network, auto-encoder

CLC Number:

TP391

YAN Shan-wu, XIAO Hong-bing, WANG Yu, SUN Mei. Video anomaly detection combining pedestrian spatiotemporal information[J]. Journal of Graphics, 2023, 44(1): 95-103.

Figures/Tables 12

Fig. 1 The network of video anomaly detection

Fig. 2 Temporal shift operation

Fig. 3 Residual temporal shift module

Fig. 4 Residual channel attention module

Fig. 5 Improved spatial (temporal) sub-network

Fig. 6 Test results on two datasets ((a) Test results on Avenue; (b) Test results on ShanghaiTech)

Fig. 7 The visualization for ShanghaiTech ((a) The input frames; (b) Attention maps; (c) Prediction error)

Table 1 Average AUC for frame level detection (%)

方法	Avenue	ShanghaiTech
Conv-AE^[7]	80.0	60.9
Mem-AE^[16]	83.3	71.2
Pred.&Recon.^[12]	83.7	71.5
Frame-Pred^[11]	85.1	72.8
DDGAN^[10]	85.6	73.7
AnoPCN^[21]	86.2	73.6
ROADMAP^[22]	88.3	76.6
Ours	88.5	77.2

Table 2 The influence of hyperparameters in the objective function on AUC

设置	参数				AUC (%)
设置	α_s	β_s	γ_i	γ_x	AUC (%)
1	0.2	0.2	0.4	0.6	87.0
2	0.2	0.2	0.6	0.4	87.1
3	0.2	0.2	0.5	0.5	88.0
4	0.1	0.1	0.4	0.6	87.4
5	0.1	0.1	0.6	0.4	87.5
6	0.1	0.1	0.5	0.5	88.5

Table 3 Performance evaluation of each module of the network

方法	RTSM	RCAM	记忆增强模块	L_s	AUC (%)
Baseline	×	×	×	×	78.4
本文方法	√	×	×	×	80.7
	√	√	×	×	83.2
	√	√	√	×	88.2
	√	√	√	√	88.5

Table 4 Comparison of single-stream network and two-stream network

方法	Avenue	ShanghaiTech
空间子网络	86.2	73.5
时间子网络	88.3	75.7
时空双流网络	88.5	77.2

Fig. 8 ROC curve on two datasets ((a) ROC curve on Avenue; (b) ROC curve on ShanghaiTech)

References 22

[1]	王媛媛. 基于增强时空特征的视频异常检测算法研究[D]. 北京: 北京交通大学, 2021.
	WANG Y Y. Research on video anomaly detection algorithm based on enhanced spatio-temporal features[D]. Beijing: Beijing Jiaotong University, 2021 (in Chinese).
[2]	王志国, 章毓晋. 监控视频异常检测: 综述[J]. 清华大学学报: 自然科学版, 2020, 60(6): 518-529.
	WANG Z G, ZHANG Y J. Anomaly detection in surveillance videos: a survey[J]. Journal of Tsinghua University: Science and Technology, 2020, 60(6): 518-529 (in Chinese).
[3]	KIRAN B, THOMAS D, PARAKKAL R. An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos[J]. Journal of Imaging, 2018, 4(2): 36. DOI URL
[4]	KRATZ L, NISHINO K. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 1446-1453.
[5]	ZHANG Y, LU H C, ZHANG L H, et al. Video anomaly detection based on locality sensitive hashing filters[J]. Pattern Recognition, 2016, 59: 302-311. DOI URL
[6]	WANG H, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60-79. DOI URL
[7]	HASAN M, CHOI J, NEUMANN J, et al. Learning temporal regularity in video sequences[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 733-742.
[8]	LUO W X, LIU W, GAO S H. Remembering history with convolutional LSTM for anomaly detection[C]//2017 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2017: 439-444.
[9]	DEEPAK K, CHANDRAKALA S, MOHAN C K. Residual spatiotemporal autoencoder for unsupervised video anomaly detection[J]. Signal, Image and Video Processing, 2021, 15(1): 215-222. DOI
[10]	YAN S J, LIU Y, LI J B, et al. DDGAN: double discriminators GAN for accurate image colorization[C]// 2020 6th International Conference on Big Data and Information Analytics. New York: IEEE Press, 2020: 214-219.
[11]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection - A new baseline[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6536-6545.
[12]	TANG Y, ZHAO L, ZHANG S S, et al. Integrating prediction and reconstruction for anomaly detection[J]. Pattern Recognition Letters, 2020, 129: 123-130. DOI URL
[13]	CHANG Y P, TU Z G, XIE W, et al. Video anomaly detection with spatio-temporal dissociation[J]. Pattern Recognition, 2022, 122: 108213. DOI URL
[14]	李自强, 王正勇, 陈洪刚, 等. 基于外观和动作特征双预测模型的视频异常行为检测[J]. 计算机应用, 2021, 41(10): 2997-3003. DOI
	LI Z Q, WANG Z Y, CHEN H G, et al. Video abnormal behavior detection based on dual prediction model of appearance and motion features[J]. Journal of Computer Applications, 2021, 41(10): 2997-3003 (in Chinese). DOI
[15]	LIN J, GAN C, HAN S. TSM: temporal shift module for efficient video understanding[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 7082-7092.
[16]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[17]	GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1705-1714.
[18]	CHEN W H, CHEN X T, ZHANG J G, et al. Beyond triplet loss: a deep quadruplet network for person re-identification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1320-1329.
[19]	PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14360-14369.
[20]	LU C W, SHI J P, JIA J Y. Abnormal event detection at 150 FPS in MATLAB[C]//2013 IEEE International Conference on Computer Vision. New York: IEEE Press, 2013: 2720-2727.
[21]	YE M C, PENG X J, GAN W H, et al. AnoPCN: video anomaly detection via deep predictive coding network[C]//The 27th ACM International Conference on Multimedia. New York: ACM, 2019: 1805-1813.
[22]	WANG X Z, CHE Z P, JIANG B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(6): 2301-2312. DOI URL