图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1173-1182.DOI: 10.11996/JG.j.2095-302X.2023061173
收稿日期:
2023-06-30
接受日期:
2023-10-08
出版日期:
2023-12-31
发布日期:
2023-12-17
通讯作者:
陈荣元(1976-),男,教授,博士。主要研究方向为图形图像处理等。E-mail:作者简介:
黄少年(1977-),女,副教授, 博士。主要研究方向为视频内容分析。E-mail:snhuang@hutb.edu.cn
基金资助:
HUANG Shao-nian1(), WEN Pei-ran1, QUAN Qi1, CHEN Rong-yuan2(
)
Received:
2023-06-30
Accepted:
2023-10-08
Online:
2023-12-31
Published:
2023-12-17
Contact:
Chen Rongyuan (1977-), professor, Ph.D. His main research interests cover graphic image processing, etc. About author:
Huang Shaonian (1977-), associate professor,Ph.D. Her main research interest covers video content analysis. E-mail:snhuang@hutb.edu.cn
Supported by:
摘要:
复杂场景下的视频异常检测任务具有重要的研究价值与应用意义。尽管基于预测的视频异常检测方法在性能方面取得了显著进展,但仍面临诸如高模型参数量和待进一步提升的检测性能等挑战。针对这些问题,提出了一种基于多支路聚合的帧预测轻量化视频异常检测模型,模型采用多支路聚合的Transformer单元作为基本结构,显著减少了模型参数量和计算成本,并提升了检测精度。在此基础上,设计了多支路Transformer融合的编码器,在提取正常事件的时序运动特征的同时采用多分支连接操作实现多层特征融合,提升编码器特征优化能力。同时,设计了基于K-means的多支路聚类解码器,缓解正常特征多样性对异常性能检测的影响。在3个权威数据集UCSD Ped2,CUHK Avenue和ShanghaiTech上的实验结果表明,与当前主流算法相比,该模型具有更低的计算成本及良好的检测性能。
中图分类号:
黄少年, 文沛然, 全琪, 陈荣元. 基于多支路聚合的帧预测轻量化视频异常检测[J]. 图学学报, 2023, 44(6): 1173-1182.
HUANG Shao-nian, WEN Pei-ran, QUAN Qi, CHEN Rong-yuan. Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection[J]. Journal of Graphics, 2023, 44(6): 1173-1182.
图3 MB-Transformers单元结构图((a)原始Transformer单元;(b) MB-Transformers单元)
Fig. 3 MB-Transformers unit architecture diagram ((a) The original Transformer unit; (b) MB-Transformers unit)
方法 | Ped2 | Avenue | ShanghaiTech | 参数(MB) | Flops (G) |
---|---|---|---|---|---|
ConvLSTM-AE[ | 88.1 | 77.0 | - | 6.79 | 89.20 |
MemAE[ | 94.1 | 83.4 | 71.2 | 15.10 | 38.42 |
MNAD-P[ | 97.0 | 88.5 | 70.5 | 15.65 | 43.99 |
Conv-VRNN[ | 96.1 | 85.8 | - | 93.40 | 275.60 |
GADNet[ | 96.1 | 86.2 | 73.2 | - | - |
VEC[ | 97.3 | 90.2 | 74.8 | 21.47 | 1.99 |
MCP[ | 98.0 | 92.1 | 75.3 | - | - |
HF2-VAD[ | 99.3 | 91.1 | 76.2 | 11.80 | 1.84 |
LFP-MBA | 99.4 | 89.8 | 74.2 | 5.40 | 0.60 |
表1 在3个数据集上与其他模型的性能对比
Table 1 Performance comparison on three datasets
方法 | Ped2 | Avenue | ShanghaiTech | 参数(MB) | Flops (G) |
---|---|---|---|---|---|
ConvLSTM-AE[ | 88.1 | 77.0 | - | 6.79 | 89.20 |
MemAE[ | 94.1 | 83.4 | 71.2 | 15.10 | 38.42 |
MNAD-P[ | 97.0 | 88.5 | 70.5 | 15.65 | 43.99 |
Conv-VRNN[ | 96.1 | 85.8 | - | 93.40 | 275.60 |
GADNet[ | 96.1 | 86.2 | 73.2 | - | - |
VEC[ | 97.3 | 90.2 | 74.8 | 21.47 | 1.99 |
MCP[ | 98.0 | 92.1 | 75.3 | - | - |
HF2-VAD[ | 99.3 | 91.1 | 76.2 | 11.80 | 1.84 |
LFP-MBA | 99.4 | 89.8 | 74.2 | 5.40 | 0.60 |
编码器单元 | AUC (%) | 参数 (MB) | Flops (G) |
---|---|---|---|
原始Transformer | 94.2 | 13.2 | 1.5 |
K-MB-Transformers | 90.6 | 4.1 | 0.4 |
MB-Transformers | 99.4 | 5.4 | 0.6 |
表2 多支路Transformer编码器对性能的影响
Table 2 Performance of multi-branch Transformer encoders
编码器单元 | AUC (%) | 参数 (MB) | Flops (G) |
---|---|---|---|
原始Transformer | 94.2 | 13.2 | 1.5 |
K-MB-Transformers | 90.6 | 4.1 | 0.4 |
MB-Transformers | 99.4 | 5.4 | 0.6 |
解码器模块 | AUC (%) | 参数 (MB) | Flops (G) |
---|---|---|---|
原始Transformer | 95.0 | 15.4 | 1.7 |
MB-Transformers | 97.3 | 10.1 | 1.1 |
K-MB-Transformers | 99.4 | 5.4 | 0.6 |
表3 多支路聚类解码器对性能的影响
Table 3 Performance of multi-branch Transformer decoders
解码器模块 | AUC (%) | 参数 (MB) | Flops (G) |
---|---|---|---|
原始Transformer | 95.0 | 15.4 | 1.7 |
MB-Transformers | 97.3 | 10.1 | 1.1 |
K-MB-Transformers | 99.4 | 5.4 | 0.6 |
操作 | AUC |
---|---|
无连接操作 | 98.5 |
连接操作 | 98.9 |
分支连接操作 | 99.4 |
表4 分支连接操作对网络性能的影响(%)
Table 4 The performance of branch connection (%)
操作 | AUC |
---|---|
无连接操作 | 98.5 |
连接操作 | 98.9 |
分支连接操作 | 99.4 |
预处理方式 | AUC |
---|---|
无滑动窗口 | 94.8 |
2×2滑动窗口 | 99.4 |
32×32特征图 | 99.4 |
64×64特征图 | 96.4 |
无前景图像块 | 95.7 |
有前景图像块 | 99.4 |
表5 数据预处理对检测性能的影响(%)
Table 5 The performance of data preprocessing (%)
预处理方式 | AUC |
---|---|
无滑动窗口 | 94.8 |
2×2滑动窗口 | 99.4 |
32×32特征图 | 99.4 |
64×64特征图 | 96.4 |
无前景图像块 | 95.7 |
有前景图像块 | 99.4 |
L | AUC |
---|---|
2 | 96.0 |
4 | 99.4 |
8 | 99.4 |
表6 分支数目L对性能的影响(%)
Table 6 The performance of different branch number (%)
L | AUC |
---|---|
2 | 96.0 |
4 | 99.4 |
8 | 99.4 |
k | AUC |
---|---|
64 | 96.1 |
100 | 99.4 |
120 | 97.2 |
400 | 96.8 |
表7 聚类数对性能的影响(%)
Table 7 The Performance of Different Cluster (%)
k | AUC |
---|---|
64 | 96.1 |
100 | 99.4 |
120 | 97.2 |
400 | 96.8 |
[1] |
NAYAK R, PATI U C, DAS S K. A comprehensive review on deep learning-based methods for video anomaly detection[J]. Image and Vision Computing, 2021, 106: 104078.
DOI URL |
[2] | 杨帆, 肖斌, 於志文. 监控视频的异常检测与建模综述[J]. 计算机研究与发展, 2021, 58(12): 2708-2723. |
YANG F, XIAO B, YU Z W. Anomaly detection and modeling of surveillance video[J]. Journal of Computer Research and Development, 2021, 58(12): 2708-2723 (in Chinese). | |
[3] | 陈亚当, 陈柳任, 余文斌, 等. 多尺度特征融合的知识蒸馏异常检测方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1542-1549. |
CHEN Y D, CHEN L R, YU W B, et al. Knowledge distillation anomaly detection with multi-scale feature fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1542-1549 (in Chinese). | |
[4] | GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1705-1714. |
[5] | CHANG Y P, TU Z G, XIE W, et al. Clustering driven deep autoencoder for video anomaly detection[C]// European Conference on Computer Vision. Cham: Springer, 2020: 329-345. |
[6] | OUYANG Y Q, SANCHEZ V. Video anomaly detection by estimating likelihood of representations[C]// 2020 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 8984-8991. |
[7] | ASTRID M, ZAHEER M Z, LEE S I. Synthetic temporal anomaly guided end-to-end video anomaly detection[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 207-214. |
[8] |
CHEN D Y, YUE L Y, CHANG X Y, et al. NM-GAN: noise-modulated generative adversarial network for video anomaly detection[J]. Pattern Recognition, 2021, 116: 107969.
DOI URL |
[9] | BERGAOUI K, NAJI Y, SETKOV A, et al. Object-centric and memory-guided normality reconstruction for video anomaly detection[C]// 2022 IEEE International Conference on Image Processing. New York: IEEE Press, 2022: 2691-2695. |
[10] | LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6536-6545. |
[11] |
WANG X Z, CHE Z P, JIANG B, et al. Robust unsupervised video anomaly detection by multipath frame prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(6): 2301-2312.
DOI URL |
[12] | LIU W, LUO W X, LIAN D Z, et al. Future frame prediction for anomaly detection — a new baseline[EB/OL]. [2023-01- 12]. https://www.doc88.com/p-6037800762403.html. |
[13] |
LI S, FANG J W, XU H K, et al. Video frame prediction by deep multi-branch mask network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(4): 1283-1295.
DOI URL |
[14] | SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York:ACM, 2015: 802-810. |
[15] | KWON Y H, PARK M G. Predicting future frames using retrospective cycle GAN[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1811-1820. |
[16] |
MONIRUZZAMAN M D, RASSAU A, CHAI D, et al. Long future frame prediction using optical flow-informed deep neural networks for enhancement of robotic teleoperation in high latency environments[J]. Journal of Field Robotics, 2023, 40(2): 393-425.
DOI URL |
[17] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-01-12]. https://arxiv.org/abs/2010.11929.pdf. |
[18] | LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018. |
[19] | FENG X Y, SONG D J, CHEN Y C, et al. Convolutional transformer based dual discriminator generative adversarial networks for video anomaly detection[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 5546-5554. |
[20] |
ULLAH W, HUSSAIN T, ULLAH F U M, et al. TransCNN: hybrid CNN and transformer mechanism for surveillance anomaly detection[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106173.
DOI URL |
[21] |
LLOYD S. Least squares quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28(2): 129-137.
DOI URL |
[22] | JANG E, GU S X, POOLE B. Categorical reparameterization with gumbel-softmax[EB/OL]. [2023-01-12]. https://arxiv.org/abs/1611.01144v4. |
[23] | LUO W X, LIU W, GAO S H. Remembering history with convolutional LSTM for anomaly detection[C]// 2017 IEEE International Conference on Multimedia and Expo. New York: IEEE Press, 2017: 439-444. |
[24] | PARK H, NOH J, HAM B. Learning memory-guided normality for anomaly detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14360-14369. |
[25] | LU Y W, KUMAR K M, NABAVI S S, et al. Future frame prediction using convolutional VRNN for anomaly detection[C]// 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. New York: IEEE Press, 2019: 1-8. |
[26] |
LI C B, LI H J, ZHANG G A. Future frame prediction based on generative assistant discriminative network for anomaly detection[J]. Applied Intelligence, 2023, 53(1): 542-559.
DOI |
[27] | YU G, WANG S Q, CAI Z P, et al. Cloze test helps: effective video anomaly detection via learning to complete video events[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 583-591. |
[28] | LEE J, NAM W J, LEE S W. Multi-contextual predictions with vision transformer for video anomaly detection[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1012-1018. |
[29] | LIU Z A, NIE Y W, LONG C J, et al. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 13588-13597. |
[1] | 李佳琦, 王辉, 郭宇.
基于Transformer的三角形网格分类分割网络
[J]. 图学学报, 2024, 45(1): 78-89. |
[2] | 吕衡, 杨鸿宇 .
一种基于时空运动信息交互建模的三维人体姿态估计方法
[J]. 图学学报, 2024, 45(1): 159-168. |
[3] | 石佳豪, 姚莉. 基于语义引导的视频描述生成[J]. 图学学报, 2023, 44(6): 1191-1201. |
[4] | 杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867. |
[5] | 杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965. |
[6] | 郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676. |
[7] | 李刚, 张运涛, 汪文凯, 张东阳. 采用DETR与先验知识融合的输电线路螺栓缺陷检测方法[J]. 图学学报, 2023, 44(3): 438-447. |
[8] | 闫善武, 肖洪兵, 王瑜, 孙梅. 融合行人时空信息的视频异常检测[J]. 图学学报, 2023, 44(1): 95-103. |
[9] | 潘东辉, 金映含, 孙旭, 刘玉生, 张东亮. CTH-Net:从线稿和颜色点生成服装图像的CNN-Transformer混合网络[J]. 图学学报, 2023, 44(1): 120-130. |
[10] | 王玉萍, 曾毅, 李胜辉, 张磊. 一种基于Transformer的三维人体姿态估计方法[J]. 图学学报, 2023, 44(1): 139-145. |
[11] | 胡海涛 , 杜昊晨 , 王素琴 , 石 敏 , 朱登明 , . 改进 YOLOX 的药品泡罩铝箔表面缺陷 检测方法[J]. 图学学报, 2022, 43(5): 803-814. |
[12] | 吕浩, 易鹏飞, 刘瑞, 周东生, 张强, 魏小鹏. 用于视频异常检测的时序多尺度自编码器[J]. 图学学报, 2022, 43(2): 223-229. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||