Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection

doi:10.11996/JG.j.2095-302X.2023061173

Abstract

Abstract:

Video anomaly detection in complex scenes holds significant research value and practical applications. Despite the remarkable performance of current prediction-based methods, they encountered challenges, such as the use of large model parameters. To address these problems, we proposed a lightweight model based on multi-branch aggregation for frame prediction. The proposed model leveraged Transformer units as basic structures, with multi-branch aggregation, reducing the model parameters significantly. This method not only reduced computational costs but also enhanced detection accuracy. Building on this foundation, we designed a multi-branch Transformer fusion encoder extracting temporal motion features of normal events. The proposed encoder utilized a multi-branch connection operation to achieve multi-layer feature fusion, elevating the encoder's feature optimization ability. Moreover, a multi-branch clustering decoder was developed using K-means to mitigate the impact of normal feature diversity on anomaly detection performance. Experiments were conducted on three public datasets: UCSD Ped2, CUHK Avenue, and ShanghaiTech. The results demonstrated that the proposed model outperformed the current mainstream algorithms, achieving better detection performance and lower computational cost.

Key words: frame prediction, video anomaly detection, multi-branch fusion, Transformer

CLC Number:

TP391

HUANG Shao-nian, WEN Pei-ran, QUAN Qi, CHEN Rong-yuan. Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection[J]. Journal of Graphics, 2023, 44(6): 1173-1182.

Figures/Tables 13

References 29

[1]	NAYAK R, PATI U C, DAS S K. A comprehensive review on deep learning-based methods for video anomaly detection[J]. Image and Vision Computing, 2021, 106: 104078. DOI URL
[2]	杨帆, 肖斌, 於志文. 监控视频的异常检测与建模综述[J]. 计算机研究与发展, 2021, 58(12): 2708-2723.
	YANG F, XIAO B, YU Z W. Anomaly detection and modeling of surveillance video[J]. Journal of Computer Research and Development, 2021, 58(12): 2708-2723 (in Chinese).
[3]	陈亚当, 陈柳任, 余文斌, 等. 多尺度特征融合的知识蒸馏异常检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1542-1549.
	CHEN Y D, CHEN L R, YU W B, et al. Knowledge distillation anomaly detection with multi-scale feature fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1542-1549 (in Chinese).
[4]	GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1705-1714.
[5]	CHANG Y P, TU Z G, XIE W, et al. Clustering driven deep autoencoder for video anomaly detection[C]// European Conference on Computer Vision. Cham: Springer, 2020: 329-345.
[6]	OUYANG Y Q, SANCHEZ V. Video anomaly detection by estimating likelihood of representations[C]// 2020 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 8984-8991.
[7]	ASTRID M, ZAHEER M Z, LEE S I. Synthetic temporal anomaly guided end-to-end video anomaly detection[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 207-214.
[8]	CHEN D Y, YUE L Y, CHANG X Y, et al. NM-GAN: noise-modulated generative adversarial network for video anomaly detection[J]. Pattern Recognition, 2021, 116: 107969. DOI URL
[9]	BERGAOUI K, NAJI Y, SETKOV A, et al. Object-centric and memory-guided normality reconstruction for video anomaly detection[C]// 2022 IEEE International Conference on Image Processing. New York: IEEE Press, 2022: 2691-2695.
[10]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6536-6545.
[11]	WANG X Z, CHE Z P, JIANG B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312. DOI URL
[12]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[EB/OL]. [2023-01- 12]. https://www.doc88.com/p-6037800762403.html.
[13]	LI S, FANG J W, XU H K, et al. Video frame prediction by deep multi-branch mask network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(4): 1283-1295. DOI URL
[14]	SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York:ACM, 2015: 802-810.
[15]	KWON Y H, PARK M G. Predicting future frames using retrospective cycle GAN[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1811-1820.
[16]	MONIRUZZAMAN M D, RASSAU A, CHAI D, et al. Long future frame prediction using optical flow-informed deep neural networks for enhancement of robotic teleoperation in high latency environments[J]. Journal of Field Robotics, 2023, 40(2): 393-425. DOI URL
[17]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-01-12]. https://arxiv.org/abs/2010.11929.pdf.
[18]	LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018.
[19]	FENG X Y, SONG D J, CHEN Y C, et al. Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 5546-5554.
[20]	ULLAH W, HUSSAIN T, ULLAH F U M, et al. TransCNN: hybrid CNN and transformer mechanism for surveillance anomaly detection[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106173. DOI URL
[21]	LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137. DOI URL
[22]	JANG E, GU S X, POOLE B. Categorical reparameterization with gumbel-softmax[EB/OL]. [2023-01-12]. https://arxiv.org/abs/1611.01144v4.
[23]	LUO W X, LIU W, GAO S H. Remembering history with convolutional LSTM for anomaly detection[C]// 2017 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2017: 439-444.
[24]	PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14360-14369.
[25]	LU Y W, KUMAR K M, NABAVI S S, et al. Future frame prediction using convolutional VRNN for anomaly detection[C]// 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. New York: IEEE Press, 2019: 1-8.
[26]	LI C B, LI H J, ZHANG G A. Future frame prediction based on generative assistant discriminative network for anomaly detection[J]. Applied Intelligence, 2023, 53(1): 542-559. DOI
[27]	YU G, WANG S Q, CAI Z P, et al. Cloze test helps: effective video anomaly detection via learning to complete video events[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 583-591.
[28]	LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018.
[29]	LIU Z A, NIE Y W, LONG C J, et al. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13588-13597.

方法	Ped2	Avenue	ShanghaiTech	参数(MB)	Flops (G)
ConvLSTM-AE^[23]	88.1	77.0	-	6.79	89.20
MemAE^[3]	94.1	83.4	71.2	15.10	38.42
MNAD-P^[24]	97.0	88.5	70.5	15.65	43.99
Conv-VRNN^[25]	96.1	85.8	-	93.40	275.60
GADNet^[26]	96.1	86.2	73.2	-	-
VEC^[27]	97.3	90.2	74.8	21.47	1.99
MCP^[28]	98.0	92.1	75.3	-	-
HF2-VAD^[29]	99.3	91.1	76.2	11.80	1.84
LFP-MBA	99.4	89.8	74.2	5.40	0.60

方法	Ped2	Avenue	ShanghaiTech	参数(MB)	Flops (G)
ConvLSTM-AE^[23]	88.1	77.0	-	6.79	89.20
MemAE^[3]	94.1	83.4	71.2	15.10	38.42
MNAD-P^[24]	97.0	88.5	70.5	15.65	43.99
Conv-VRNN^[25]	96.1	85.8	-	93.40	275.60
GADNet^[26]	96.1	86.2	73.2	-	-
VEC^[27]	97.3	90.2	74.8	21.47	1.99
MCP^[28]	98.0	92.1	75.3	-	-
HF2-VAD^[29]	99.3	91.1	76.2	11.80	1.84
LFP-MBA	99.4	89.8	74.2	5.40	0.60

编码器单元	AUC (%)	参数 (MB)	Flops (G)
原始Transformer	94.2	13.2	1.5
K-MB-Transformers	90.6	4.1	0.4
MB-Transformers	99.4	5.4	0.6

编码器单元	AUC (%)	参数 (MB)	Flops (G)
原始Transformer	94.2	13.2	1.5
K-MB-Transformers	90.6	4.1	0.4
MB-Transformers	99.4	5.4	0.6

解码器模块	AUC (%)	参数 (MB)	Flops (G)
原始Transformer	95.0	15.4	1.7
MB-Transformers	97.3	10.1	1.1
K-MB-Transformers	99.4	5.4	0.6