Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (5): 960-968.DOI: 10.11996/JG.j.2095-302X.2025050960

• Image Processing and Computer Vision • Previous Articles     Next Articles

On-Site construction safety monitoring based on large vision language models

LENG Shuo(), WANG Wei(), OU Jiayong, XUE Zhigang, SONG Yinglong, MO Sijun   

  1. Guangzhou Metro Construction Management Co., Ltd., Guangzhou Guangdong 510335, China
  • Received:2024-12-18 Accepted:2025-03-03 Online:2025-10-30 Published:2025-09-10
  • Contact: WANG Wei
  • About author:First author contact:

    LENG Shuo (1996-), Ph.D. His main research interests cover the application of data analysis and image recognition in construction engineering. E-mail:lengshuo@gzmtr.com

Abstract:

To address the challenges of high development cost and limited applicability of traditional vision models in construction safety monitoring, an original solution based on large vision language model (LVLM) was proposed. Based on an open-source pretrained LVLM, various types of prompt strategies suitable for construction safety monitoring tasks were designed, including text prompts, image prompts with supplementary information, and image exemplar prompts. These strategies enable the LVLM to effectively comprehend and reason about construction site imagery. Moreover, an intelligent monitoring workflow and system architecture based on LVLM were developed. The proposed method has been applied to three representative construction safety monitoring scenarios, including supervisor absence detection, hazardous zone intrusion identification, and non-compliant behavior recognition. Empirical data validation demonstrated that with appropriate prompting strategies, the LVLM can achieve satisfactory recognition accuracy close to that of mainstream deep learning models without requiring data annotation and model training. The proposed approach has the advantages of low development cost, fast implementation speed, and flexible task adaptation, revealing application potential in the fields of image recognition and intelligent monitoring.

Key words: large vision language model, computer vision, construction safety, intelligent monitoring, prompt engineering

CLC Number: