欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (6): 1266-1276.DOI: 10.11996/JG.j.2095-302X.2024061266

• “大模型与图学技术及应用”专题 • 上一篇    下一篇

基于多模态大模型的高速公路场景交通异常事件分析方法

吴精乙1(), 景峻2(), 贺熠凡1, 张世渝1, 康运锋1, 唐维2, 孔德兰2, 刘向栋2   

  1. 1.中国科学院自动化研究所,北京 100190
    2.山东高速集团有限公司智慧管理中心,山东 济南 250014
  • 收稿日期:2024-08-05 接受日期:2024-10-15 出版日期:2024-12-31 发布日期:2024-12-24
  • 通讯作者:景峻(1977-),男,研究员,博士。主要研究方向为数字交通、智慧高速和公路信息化等。E-mail:signal926@163.com
  • 第一作者:吴精乙(2002-),男,硕士研究生。主要研究方向为数字图像处理与模式识别。E-mail:goldfish_42@163.com

Traffic anomaly event analysis method for highway scenes based on multimodal large language models

WU Jingyi1(), JING Jun2(), HE Yifan1, ZHANG Shiyu1, KANG Yunfeng1, TANG Wei2, KONG Delan2, LIU Xiangdong2   

  1. 1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. Smart Management Center of Shandong Hi-Speed Group Co., Ltd., Jinan Shandong 250014, China
  • Received:2024-08-05 Accepted:2024-10-15 Published:2024-12-31 Online:2024-12-24
  • Contact: JING Jun (1977-), researcher, Ph.D. His main research interests cover digital transportation, smart highways and highway informatization, etc. E-mail:signal926@163.com
  • First author:WU Jingyi (2002-), master student. His main research interests cover cross-modal understanding and generative large language model. E-mail:goldfish_42@163.com

摘要:

针对现有交通异常事件检测系统无法深入感知事件的局限性,以及人工审核报警事件成本高的问题,研究了一种结合多模态大模型(MLLM)的高速公路场景交通异常事件分析方法,设计并验证了3种基于MLLM的任务:一是自动生成异常事件的详细工单描述,提升事件的感知深度;二是利用MLLM对报警事件进行复审,减少误报,提高检测准确性;三是基于MLLM生成异常事件视频描述,增强事件的可解释性。实验结果显示,基于MLLM的工单描述方法通过视觉指令调优数据集的构建和模型微调,提升了工单信息的完整性和准确性。报警事件复审方面,MLLM能够有效审核出由图像质量低下、虚警误报和类别错误导致的误报,降低了人工审核成本。此外,基于MLLM的视频描述方法通过事件视频图像的采样与描述,实现了对异常事件的高效分析,提高了事件解释性。尽管开源模型在特定场景下略逊于闭源模型,但两者均展现出对多种误报问题的审核能力,证实了MLLM在异常事件审核中的应用潜力。该研究为智能交通监控系统提供了新的解决方案,提高了异常事件处理的自动化水平和实用性。

关键词: 多模态大模型, 监控视频, 异常事件检测, 视频理解, 工单描述, 交通异常事件审核

Abstract:

To address the limitations of current traffic anomaly detection systems, which lack deep incident perception capabilities, and to address the high cost of manual review for alarmed incidents, a highway traffic anomaly analysis method based on multimodal large language models (MLLM) was researched. Three MLLM-based tasks were designed and validated: first, automatically generating detailed work order descriptions for anomalous events, enhancing the depth of event perception depth; second, reviewing alarm events using MLLM, reducing false alarms and improving detection accuracy; and third, generating descriptive narratives for anomaly event videos based on MLLM, enhancing the interpretability of events. Experimental results demonstrated that the MLLM-based work order description method improved work order information completeness and accuracy through the construction of visual instruction-tuned datasets and model fine-tuning. In the review of alarm events, MLLM effectively filtered out false alarms caused by poor image quality, false positives, and misclassifications, thus reducing manual review costs. Furthermore, the MLLM-based video description method enabled efficient anomaly analysis by sampling and describing event video frames, thus improving event explainability. Although open-source models were slightly inferior to closed-source models in specific scenarios, both types demonstrated the ability to review various false alarm issues, confirming the potential application of MLLM in anomaly event reviews. This study provides a novel solution for intelligent traffic monitoring systems, enhancing the automation and practicality of handling anomaly events.

Key words: multimodal large language models, surveillance video, anomaly event detection, video understanding, work order description, traffic event review

中图分类号: