欢迎访问《图学学报》 分享到:

图学学报 ›› 2022, Vol. 43 ›› Issue (2): 223-229.DOI: 10.11996/JG.j.2095-302X.2022020223

• 图像处理与计算机视觉 • 上一篇    下一篇

用于视频异常检测的时序多尺度自编码器

  

  1. 1. 大连大学软件工程学院计算机辅助设计国家地方联合工程实验室,辽宁 大连 116622;
    2. 大连理工大学计算机科学与技术学院,辽宁 大连 116024
  • 出版日期:2022-04-30 发布日期:2022-05-07
  • 基金资助:

    国家自然科学基金重点项目(U1908214);

    辽宁省中央引导地方科技发展专项(2021JH6/10500140);

    辽宁特聘教授支持计划;辽宁省高等学校创新团队支持计划(LT2020015);

    大连市双重项目(2020JJ25CY001);

    大连大学创新团队支持计划(XLJ202010)

Sequential multi-scale autoencoder for video anomaly detection

  1. 1. National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, Dalian Liaoning 116622, China;
    2. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Online:2022-04-30 Published:2022-05-07
  • Supported by:

    Key Program of National Natural Science Foundation of China (U1908214); 

    Special Project of Central Government Guiding Local Science and Technology Development (2021JH6/10500140); 

    Program for the Liaoning Distinguished Professor; Program for Innovative Research Team in University of Liaoning Province (LT2020015); 

    Science and Technology Innovation Fund of Dalian (2020JJ25CY001);

    Program for Innovative Research Team of Dalian University (XLJ202010)

摘要: 视频异常检测是指识别不符合预期行为的事件。当前许多方法利用重构误差来检测异常,由于
深度神经网络的强大能力可能会重构出异常行为,这与异常行为重构误差较大的假设不符。而利用预测未来帧
的方法进行异常检测取得了很好的效果,但这些方法大多未考虑正常样本的多样性,或不能建立视频连续帧之
间的关联。为了解决该问题,提出了一种时序多尺度自编码器网络用于预测未来帧,并通过预测值与真实值的
差异完成视频异常检测。该网络不仅明确考虑了正常事件的多样性,而且强大的编码器可以构建长程空间依赖
关系,进而增强输出特征的多样性,此外,针对复杂的数据集含有较多噪声的特点,提出了去噪网络,进一步
提升了模型的精度。该方法在达到实时性要求的前提下,在 Avenue 数据集上达到了目前最优的精度。

关键词: 视频异常检测, 自编码器, 未来帧预测, 多尺度, 自编码

Abstract: Video anomaly detection refers to identifying events inconsistent with expected behaviors. Many current
methods detect abnormalities through reconstruction errors. However, due to the powerful capabilities of deep neural
networks, abnormal behaviors may be reconstructed, which is inconsistent with the hypothesis that the reconstructed
error of abnormal behavior is large. However, the method of predicting future frames for anomaly detection has
achieved good results, but most of these methods neither consider the diversity of normal sample, nor establish the
association between consecutive frames of the video. In order to solve this problem, we proposed a sequential
multi-scale autoencoder network to predict future frames, and completed video anomaly detection through the
difference between the predicted value and the truth value. The network not only explicitly considers the diversity of normal events, but also constructs long-range spatial dependencies through a powerful encoder, thereby enhancing the
diversity of output features. In addition, for the complex dataset containing more noises, we proposed denoising
network to further improve the accuracy of the model. Under the premise of fulfilling real-time requirements, this
method has achieved the best accuracy so far on the Avenue dataset.

Key words: video anomaly detection, autoencoder network, future frame prediction, multi-scale, autoencoder

中图分类号: