欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (1): 95-103.DOI: 10.11996/JG.j.2095-302X.2023010095

• 图像处理与计算机视觉 • 上一篇    下一篇

融合行人时空信息的视频异常检测

闫善武(), 肖洪兵, 王瑜(), 孙梅   

  1. 北京工商大学人工智能学院,北京 100048
  • 收稿日期:2022-07-04 修回日期:2022-08-27 出版日期:2023-10-31 发布日期:2023-02-16
  • 通讯作者: 王瑜
  • 作者简介:闫善武(1996-),男,硕士研究生。主要研究方向为视频异常检测、图像处理。E-mail:18339729107@163.com
  • 基金资助:
    北京市自然科学基金-北京市教育委员会科技计划重点项目(KZ202110011015)

Video anomaly detection combining pedestrian spatiotemporal information

YAN Shan-wu(), XIAO Hong-bing, WANG Yu(), SUN Mei   

  1. School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China
  • Received:2022-07-04 Revised:2022-08-27 Online:2023-10-31 Published:2023-02-16
  • Contact: WANG Yu
  • About author:YAN Shan-wu (1996-), master student. His main research interests cover video anomaly detection, image processing. E-mail:18339729107@163.com
  • Supported by:
    Beijing Natural Science Foundation - Key Project of Science and Technology Program of Beijing Municipal Education Commission(KZ202110011015)

摘要:

针对目前视频异常检测不能充分利用时序信息且忽视正常行为多样性的问题,提出了一种融合行人时空信息的异常检测方法。以卷积自编码器为基础,通过其中的编码器和解码器对输入帧进行压缩和还原,并根据输出帧与真实值的差异实现异常检测。为了加强视频连续帧之间的特征信息联系,引入残差时间移位模块和残差通道注意力模块,分别提升网络对时间信息和通道信息的建模能力。考虑到卷积神经网络(CNN)过度的泛化性,在编解码器各层的跳跃连接之间加入记忆增强模块,限制自编码器对异常帧过于强大的表示能力,提高网络的异常检测精度。此外,通过一种特征离散性损失来修正目标函数,有效区分不同的正常行为模式。在CUHK Avenue和ShanghaiTech数据集上的实验结果表明,该方法在满足实时性要求的同时,优于当前主流的视频异常检测方法。

关键词: 视频异常检测, 无监督学习, 时空双流网络, 自编码器

Abstract:

To address the current problem that video anomaly detection cannot make full use of temporal information and ignores the diversity of normal behaviors, an anomaly detection method incorporating pedestrian spatiotemporal information was proposed. Based on the convolutional auto-encoder, the input frames were compressed and reduced by the encoder and decoder in it, and the anomaly detection was realized according to the difference between the output frames and the real value. In order to strengthen the feature information connection between consecutive frames of the video, the residual time shift module and the residual channel attention module were introduced to enhance the network's ability to model temporal and channel information, respectively. Considering the overgeneralization of the convolutional neural networks (CNN), a memory-augmented module was added between the skip connections of each layer of the encoder and decoder to limit the overpowering representation of anomalous frames by the auto-encoder and improve the anomaly detection accuracy of the network. In addition, the objective function was modified by a kind of feature separateness loss to effectively distinguish different normal behavior patterns. Experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method outperforms the current mainstream video anomaly detection methods while meeting the real-time requirements.

Key words: video anomaly detection, unsupervised learning, spatiotemporal two-stream network, auto-encoder

中图分类号: