Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (6): 1266-1276.DOI: 10.11996/JG.j.2095-302X.2024061266

• Special Topic on “Large Models and Graphics Technology and Applications” • Previous Articles     Next Articles

Traffic anomaly event analysis method for highway scenes based on multimodal large language models

WU Jingyi1(), JING Jun2(), HE Yifan1, ZHANG Shiyu1, KANG Yunfeng1, TANG Wei2, KONG Delan2, LIU Xiangdong2   

  1. 1. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. Smart Management Center of Shandong Hi-Speed Group Co., Ltd., Jinan Shandong 250014, China
  • Received:2024-08-05 Accepted:2024-10-15 Online:2024-12-31 Published:2024-12-24
  • Contact: JING Jun
  • About author:First author contact:

    WU Jingyi (2002-), master student. His main research interests cover cross-modal understanding and generative large language model. E-mail:goldfish_42@163.com

Abstract:

To address the limitations of current traffic anomaly detection systems, which lack deep incident perception capabilities, and to address the high cost of manual review for alarmed incidents, a highway traffic anomaly analysis method based on multimodal large language models (MLLM) was researched. Three MLLM-based tasks were designed and validated: first, automatically generating detailed work order descriptions for anomalous events, enhancing the depth of event perception depth; second, reviewing alarm events using MLLM, reducing false alarms and improving detection accuracy; and third, generating descriptive narratives for anomaly event videos based on MLLM, enhancing the interpretability of events. Experimental results demonstrated that the MLLM-based work order description method improved work order information completeness and accuracy through the construction of visual instruction-tuned datasets and model fine-tuning. In the review of alarm events, MLLM effectively filtered out false alarms caused by poor image quality, false positives, and misclassifications, thus reducing manual review costs. Furthermore, the MLLM-based video description method enabled efficient anomaly analysis by sampling and describing event video frames, thus improving event explainability. Although open-source models were slightly inferior to closed-source models in specific scenarios, both types demonstrated the ability to review various false alarm issues, confirming the potential application of MLLM in anomaly event reviews. This study provides a novel solution for intelligent traffic monitoring systems, enhancing the automation and practicality of handling anomaly events.

Key words: multimodal large language models, surveillance video, anomaly event detection, video understanding, work order description, traffic event review

CLC Number: