欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (6): 1173-1182.DOI: 10.11996/JG.j.2095-302X.2023061173

• 图像处理与计算机视觉 • 上一篇    下一篇

基于多支路聚合的帧预测轻量化视频异常检测

黄少年1(), 文沛然1, 全琪1, 陈荣元2()   

  1. 1.湖南工商大学计算机学院,湖南 长沙 410205
    2.湖南工商大学资源环境学院,湖南 长沙 410205
  • 收稿日期:2023-06-30 接受日期:2023-10-08 出版日期:2023-12-31 发布日期:2023-12-17
  • 通讯作者: 陈荣元(1976-),男,教授,博士。主要研究方向为图形图像处理等。E-mail:chry@hutb.edu.cn
  • 作者简介:

    黄少年(1977-),女,副教授, 博士。主要研究方向为视频内容分析。E-mail:snhuang@hutb.edu.cn

  • 基金资助:
    国家社会科学基金项目(21BTJ026);湖南省教育厅科学研究重点项目(19A270);湖南省教育厅科学研究重点项目(21A0370);中国高校产学研创新基金项目(2020ITA09005);中国高校产学研创新基金项目(2021ITA05049)

Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection

HUANG Shao-nian1(), WEN Pei-ran1, QUAN Qi1, CHEN Rong-yuan2()   

  1. 1. School of Computer Science, Hunan University of Technology and Business, Changsha Hunan 410205, China
    2. School of Resource and Environment, Hunan University of Technology and Business, Changsha Hunan 410205, China
  • Received:2023-06-30 Accepted:2023-10-08 Online:2023-12-31 Published:2023-12-17
  • Contact: Chen Rongyuan (1977-), professor, Ph.D. His main research interests cover graphic image processing, etc. E-mail:chry@hutb.edu.cn
  • About author:

    Huang Shaonian (1977-), associate professor,Ph.D. Her main research interest covers video content analysis. E-mail:snhuang@hutb.edu.cn

  • Supported by:
    The National Social Science Foundation of China(21BTJ026);The Scientific Research Fund of Hunan Provincial Education Department(19A270);The Scientific Research Fund of Hunan Provincial Education Department(21A0370);Funds for Creative Research of China Universities on the Integration of Industry, Education and Research(2020ITA09005);Funds for Creative Research of China Universities on the Integration of Industry, Education and Research(2021ITA05049)

摘要:

复杂场景下的视频异常检测任务具有重要的研究价值与应用意义。尽管基于预测的视频异常检测方法在性能方面取得了显著进展,但仍面临诸如高模型参数量和待进一步提升的检测性能等挑战。针对这些问题,提出了一种基于多支路聚合的帧预测轻量化视频异常检测模型,模型采用多支路聚合的Transformer单元作为基本结构,显著减少了模型参数量和计算成本,并提升了检测精度。在此基础上,设计了多支路Transformer融合的编码器,在提取正常事件的时序运动特征的同时采用多分支连接操作实现多层特征融合,提升编码器特征优化能力。同时,设计了基于K-means的多支路聚类解码器,缓解正常特征多样性对异常性能检测的影响。在3个权威数据集UCSD Ped2,CUHK Avenue和ShanghaiTech上的实验结果表明,与当前主流算法相比,该模型具有更低的计算成本及良好的检测性能。

关键词: 帧预测, 视频异常检测, 多支路聚和, Transformer

Abstract:

Video anomaly detection in complex scenes holds significant research value and practical applications. Despite the remarkable performance of current prediction-based methods, they encountered challenges, such as the use of large model parameters. To address these problems, we proposed a lightweight model based on multi-branch aggregation for frame prediction. The proposed model leveraged Transformer units as basic structures, with multi-branch aggregation, reducing the model parameters significantly. This method not only reduced computational costs but also enhanced detection accuracy. Building on this foundation, we designed a multi-branch Transformer fusion encoder extracting temporal motion features of normal events. The proposed encoder utilized a multi-branch connection operation to achieve multi-layer feature fusion, elevating the encoder's feature optimization ability. Moreover, a multi-branch clustering decoder was developed using K-means to mitigate the impact of normal feature diversity on anomaly detection performance. Experiments were conducted on three public datasets: UCSD Ped2, CUHK Avenue, and ShanghaiTech. The results demonstrated that the proposed model outperformed the current mainstream algorithms, achieving better detection performance and lower computational cost.

Key words: frame prediction, video anomaly detection, multi-branch fusion, Transformer

中图分类号: