Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2022, Vol. 43 ›› Issue (2): 223-229.DOI: 10.11996/JG.j.2095-302X.2022020223

• Image Processing and Computer Vision • Previous Articles     Next Articles

Sequential multi-scale autoencoder for video anomaly detection

  

  1. 1. National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, Dalian Liaoning 116622, China;
    2. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Online:2022-04-30 Published:2022-05-07
  • Supported by:

    Key Program of National Natural Science Foundation of China (U1908214); 

    Special Project of Central Government Guiding Local Science and Technology Development (2021JH6/10500140); 

    Program for the Liaoning Distinguished Professor; Program for Innovative Research Team in University of Liaoning Province (LT2020015); 

    Science and Technology Innovation Fund of Dalian (2020JJ25CY001);

    Program for Innovative Research Team of Dalian University (XLJ202010)

Abstract: Video anomaly detection refers to identifying events inconsistent with expected behaviors. Many current
methods detect abnormalities through reconstruction errors. However, due to the powerful capabilities of deep neural
networks, abnormal behaviors may be reconstructed, which is inconsistent with the hypothesis that the reconstructed
error of abnormal behavior is large. However, the method of predicting future frames for anomaly detection has
achieved good results, but most of these methods neither consider the diversity of normal sample, nor establish the
association between consecutive frames of the video. In order to solve this problem, we proposed a sequential
multi-scale autoencoder network to predict future frames, and completed video anomaly detection through the
difference between the predicted value and the truth value. The network not only explicitly considers the diversity of normal events, but also constructs long-range spatial dependencies through a powerful encoder, thereby enhancing the
diversity of output features. In addition, for the complex dataset containing more noises, we proposed denoising
network to further improve the accuracy of the model. Under the premise of fulfilling real-time requirements, this
method has achieved the best accuracy so far on the Avenue dataset.

Key words: video anomaly detection, autoencoder network, future frame prediction, multi-scale, autoencoder

CLC Number: