基于多支路聚合的帧预测轻量化视频异常检测

doi:10.11996/JG.j.2095-302X.2023061173

图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1173-1182.DOI: 10.11996/JG.j.2095-302X.2023061173

• 图像处理与计算机视觉 • 上一篇下一篇

基于多支路聚合的帧预测轻量化视频异常检测

黄少年¹(), 文沛然¹, 全琪¹, 陈荣元²()

1.湖南工商大学计算机学院，湖南长沙 410205
2.湖南工商大学资源环境学院，湖南长沙 410205

收稿日期:2023-06-30 接受日期:2023-10-08 出版日期:2023-12-31 发布日期:2023-12-17
通讯作者: 陈荣元(1976-)，男，教授，博士。主要研究方向为图形图像处理等。E-mail：chry@hutb.edu.cn
作者简介:
黄少年(1977-)，女，副教授, 博士。主要研究方向为视频内容分析。E-mail：snhuang@hutb.edu.cn
基金资助:
国家社会科学基金项目(21BTJ026);湖南省教育厅科学研究重点项目(19A270);湖南省教育厅科学研究重点项目(21A0370);中国高校产学研创新基金项目(2020ITA09005);中国高校产学研创新基金项目(2021ITA05049)

Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection

HUANG Shao-nian¹(), WEN Pei-ran¹, QUAN Qi¹, CHEN Rong-yuan²()

1. School of Computer Science, Hunan University of Technology and Business, Changsha Hunan 410205, China
2. School of Resource and Environment, Hunan University of Technology and Business, Changsha Hunan 410205, China

Received:2023-06-30 Accepted:2023-10-08 Online:2023-12-31 Published:2023-12-17
Contact: Chen Rongyuan (1977-), professor, Ph.D. His main research interests cover graphic image processing, etc. E-mail：chry@hutb.edu.cn
About author:
Huang Shaonian (1977-), associate professor，Ph.D. Her main research interest covers video content analysis. E-mail：snhuang@hutb.edu.cn
Supported by:
The National Social Science Foundation of China(21BTJ026);The Scientific Research Fund of Hunan Provincial Education Department(19A270);The Scientific Research Fund of Hunan Provincial Education Department(21A0370);Funds for Creative Research of China Universities on the Integration of Industry, Education and Research(2020ITA09005);Funds for Creative Research of China Universities on the Integration of Industry, Education and Research(2021ITA05049)

摘要/Abstract

摘要：

复杂场景下的视频异常检测任务具有重要的研究价值与应用意义。尽管基于预测的视频异常检测方法在性能方面取得了显著进展，但仍面临诸如高模型参数量和待进一步提升的检测性能等挑战。针对这些问题，提出了一种基于多支路聚合的帧预测轻量化视频异常检测模型，模型采用多支路聚合的Transformer单元作为基本结构，显著减少了模型参数量和计算成本，并提升了检测精度。在此基础上，设计了多支路Transformer融合的编码器，在提取正常事件的时序运动特征的同时采用多分支连接操作实现多层特征融合，提升编码器特征优化能力。同时，设计了基于K-means的多支路聚类解码器，缓解正常特征多样性对异常性能检测的影响。在3个权威数据集UCSD Ped2，CUHK Avenue和ShanghaiTech上的实验结果表明，与当前主流算法相比，该模型具有更低的计算成本及良好的检测性能。

关键词: 帧预测, 视频异常检测, 多支路聚和, Transformer

Abstract:

Video anomaly detection in complex scenes holds significant research value and practical applications. Despite the remarkable performance of current prediction-based methods, they encountered challenges, such as the use of large model parameters. To address these problems, we proposed a lightweight model based on multi-branch aggregation for frame prediction. The proposed model leveraged Transformer units as basic structures, with multi-branch aggregation, reducing the model parameters significantly. This method not only reduced computational costs but also enhanced detection accuracy. Building on this foundation, we designed a multi-branch Transformer fusion encoder extracting temporal motion features of normal events. The proposed encoder utilized a multi-branch connection operation to achieve multi-layer feature fusion, elevating the encoder's feature optimization ability. Moreover, a multi-branch clustering decoder was developed using K-means to mitigate the impact of normal feature diversity on anomaly detection performance. Experiments were conducted on three public datasets: UCSD Ped2, CUHK Avenue, and ShanghaiTech. The results demonstrated that the proposed model outperformed the current mainstream algorithms, achieving better detection performance and lower computational cost.

Key words: frame prediction, video anomaly detection, multi-branch fusion, Transformer

中图分类号:

TP391

黄少年, 文沛然, 全琪, 陈荣元. 基于多支路聚合的帧预测轻量化视频异常检测[J]. 图学学报, 2023, 44(6): 1173-1182.

HUANG Shao-nian, WEN Pei-ran, QUAN Qi, CHEN Rong-yuan. Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection[J]. Journal of Graphics, 2023, 44(6): 1173-1182.

图/表 13

图1 网络模型结构图

Fig. 1 The network model architecture diagram

图2 数据预处理流程图

Fig. 2 Data preprocessing flowchart

图3 MB-Transformers单元结构图((a)原始Transformer单元；(b) MB-Transformers单元)

Fig. 3 MB-Transformers unit architecture diagram ((a) The original Transformer unit; (b) MB-Transformers unit)

图4 编码器分支连接示意图

Fig. 4 Encoder branch connection diagram

图5 K-MB-Transformers单元结构图

Fig. 5 K-MB-Transformers unit diagram

表1 在3个数据集上与其他模型的性能对比

Table 1 Performance comparison on three datasets

方法	Ped2	Avenue	ShanghaiTech	参数(MB)	Flops (G)
ConvLSTM-AE^[23]	88.1	77.0	-	6.79	89.20
MemAE^[3]	94.1	83.4	71.2	15.10	38.42
MNAD-P^[24]	97.0	88.5	70.5	15.65	43.99
Conv-VRNN^[25]	96.1	85.8	-	93.40	275.60
GADNet^[26]	96.1	86.2	73.2	-	-
VEC^[27]	97.3	90.2	74.8	21.47	1.99
MCP^[28]	98.0	92.1	75.3	-	-
HF2-VAD^[29]	99.3	91.1	76.2	11.80	1.84
LFP-MBA	99.4	89.8	74.2	5.40	0.60

表2 多支路Transformer编码器对性能的影响

Table 2 Performance of multi-branch Transformer encoders

编码器单元	AUC (%)	参数 (MB)	Flops (G)
原始Transformer	94.2	13.2	1.5
K-MB-Transformers	90.6	4.1	0.4
MB-Transformers	99.4	5.4	0.6

表3 多支路聚类解码器对性能的影响

Table 3 Performance of multi-branch Transformer decoders

解码器模块	AUC (%)	参数 (MB)	Flops (G)
原始Transformer	95.0	15.4	1.7
MB-Transformers	97.3	10.1	1.1
K-MB-Transformers	99.4	5.4	0.6

表4 分支连接操作对网络性能的影响(%)

Table 4 The performance of branch connection (%)

操作	AUC
无连接操作	98.5
连接操作	98.9
分支连接操作	99.4

表5 数据预处理对检测性能的影响(%)

Table 5 The performance of data preprocessing (%)

预处理方式	AUC
无滑动窗口	94.8
2×2滑动窗口	99.4
32×32特征图	99.4
64×64特征图	96.4
无前景图像块	95.7
有前景图像块	99.4

表6 分支数目L对性能的影响(%)

Table 6 The performance of different branch number (%)

L	AUC
2	96.0
4	99.4
8	99.4

表7 聚类数对性能的影响(%)

Table 7 The Performance of Different Cluster (%)

k	AUC
64	96.1
100	99.4
120	97.2
400	96.8

图6 K-MB-Transformers聚类性能可视化((a)无聚类；(b)有聚类)

Fig. 6 The visualization of K-MB-Transformers ((a) Without clustering; (b) With clustering)

参考文献 29

[1]	NAYAK R, PATI U C, DAS S K. A comprehensive review on deep learning-based methods for video anomaly detection[J]. Image and Vision Computing, 2021, 106: 104078. DOI URL
[2]	杨帆, 肖斌, 於志文. 监控视频的异常检测与建模综述[J]. 计算机研究与发展, 2021, 58(12): 2708-2723.
	YANG F, XIAO B, YU Z W. Anomaly detection and modeling of surveillance video[J]. Journal of Computer Research and Development, 2021, 58(12): 2708-2723 (in Chinese).
[3]	陈亚当, 陈柳任, 余文斌, 等. 多尺度特征融合的知识蒸馏异常检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1542-1549.
	CHEN Y D, CHEN L R, YU W B, et al. Knowledge distillation anomaly detection with multi-scale feature fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1542-1549 (in Chinese).
[4]	GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1705-1714.
[5]	CHANG Y P, TU Z G, XIE W, et al. Clustering driven deep autoencoder for video anomaly detection[C]// European Conference on Computer Vision. Cham: Springer, 2020: 329-345.
[6]	OUYANG Y Q, SANCHEZ V. Video anomaly detection by estimating likelihood of representations[C]// 2020 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 8984-8991.
[7]	ASTRID M, ZAHEER M Z, LEE S I. Synthetic temporal anomaly guided end-to-end video anomaly detection[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 207-214.
[8]	CHEN D Y, YUE L Y, CHANG X Y, et al. NM-GAN: noise-modulated generative adversarial network for video anomaly detection[J]. Pattern Recognition, 2021, 116: 107969. DOI URL
[9]	BERGAOUI K, NAJI Y, SETKOV A, et al. Object-centric and memory-guided normality reconstruction for video anomaly detection[C]// 2022 IEEE International Conference on Image Processing. New York: IEEE Press, 2022: 2691-2695.
[10]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6536-6545.
[11]	WANG X Z, CHE Z P, JIANG B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312. DOI URL
[12]	LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[EB/OL]. [2023-01- 12]. https://www.doc88.com/p-6037800762403.html.
[13]	LI S, FANG J W, XU H K, et al. Video frame prediction by deep multi-branch mask network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(4): 1283-1295. DOI URL
[14]	SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York:ACM, 2015: 802-810.
[15]	KWON Y H, PARK M G. Predicting future frames using retrospective cycle GAN[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1811-1820.
[16]	MONIRUZZAMAN M D, RASSAU A, CHAI D, et al. Long future frame prediction using optical flow-informed deep neural networks for enhancement of robotic teleoperation in high latency environments[J]. Journal of Field Robotics, 2023, 40(2): 393-425. DOI URL
[17]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-01-12]. https://arxiv.org/abs/2010.11929.pdf.
[18]	LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018.
[19]	FENG X Y, SONG D J, CHEN Y C, et al. Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 5546-5554.
[20]	ULLAH W, HUSSAIN T, ULLAH F U M, et al. TransCNN: hybrid CNN and transformer mechanism for surveillance anomaly detection[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106173. DOI URL
[21]	LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137. DOI URL
[22]	JANG E, GU S X, POOLE B. Categorical reparameterization with gumbel-softmax[EB/OL]. [2023-01-12]. https://arxiv.org/abs/1611.01144v4.
[23]	LUO W X, LIU W, GAO S H. Remembering history with convolutional LSTM for anomaly detection[C]// 2017 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2017: 439-444.
[24]	PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14360-14369.
[25]	LU Y W, KUMAR K M, NABAVI S S, et al. Future frame prediction using convolutional VRNN for anomaly detection[C]// 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. New York: IEEE Press, 2019: 1-8.
[26]	LI C B, LI H J, ZHANG G A. Future frame prediction based on generative assistant discriminative network for anomaly detection[J]. Applied Intelligence, 2023, 53(1): 542-559. DOI
[27]	YU G, WANG S Q, CAI Z P, et al. Cloze test helps: effective video anomaly detection via learning to complete video events[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 583-591.
[28]	LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018.
[29]	LIU Z A, NIE Y W, LONG C J, et al. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13588-13597.

基于多支路聚合的帧预测轻量化视频异常检测

Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 29

相关文章 12

编辑推荐

Metrics

本文评价

[1]	李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络 [J]. 图学学报, 2024, 45(1): 78-89.
[2]	吕衡, 杨鸿宇 . 一种基于时空运动信息交互建模的三维人体姿态估计方法 [J]. 图学学报, 2024, 45(1): 159-168.
[3]	石佳豪, 姚莉. 基于语义引导的视频描述生成[J]. 图学学报, 2023, 44(6): 1191-1201.
[4]	杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867.
[5]	杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965.
[6]	郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
[7]	李刚, 张运涛, 汪文凯, 张东阳. 采用DETR与先验知识融合的输电线路螺栓缺陷检测方法[J]. 图学学报, 2023, 44(3): 438-447.
[8]	闫善武, 肖洪兵, 王瑜, 孙梅. 融合行人时空信息的视频异常检测[J]. 图学学报, 2023, 44(1): 95-103.
[9]	潘东辉, 金映含, 孙旭, 刘玉生, 张东亮. CTH-Net：从线稿和颜色点生成服装图像的CNN-Transformer混合网络[J]. 图学学报, 2023, 44(1): 120-130.
[10]	王玉萍, 曾毅, 李胜辉, 张磊. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145.
[11]	胡海涛 , 杜昊晨 , 王素琴 , 石敏 , 朱登明 , . 改进 YOLOX 的药品泡罩铝箔表面缺陷检测方法[J]. 图学学报, 2022, 43(5): 803-814.
[12]	吕浩, 易鹏飞, 刘瑞, 周东生, 张强, 魏小鹏. 用于视频异常检测的时序多尺度自编码器[J]. 图学学报, 2022, 43(2): 223-229.