Performance evaluation of construction site object detection under drone-captured perspective

doi:10.11996/JG.j.2095-302X.2026010068

Abstract

Abstract:

The organizational management of construction sites is a critical aspect in engineering management; however, traditional human supervision method is constrained by many environment limitations and low efficiency. In recent years, multiple government departments have issued relevant policies advocating deep integration of artificial intelligence with the real economy to promote high-quality and efficient economic development. The accuracy, efficiency, and automation advantages of Computer Vision (CV) technology have gradually led to its widespread application in the field of construction supervision. Meanwhile, the drones, which can efficiently obtain complex and varied visual data of construction scene, demonstrate their application potential in CV-based construction supervision tasks. However, the current researches on drone-based construction scene detection are limited, and the lack of overhead-perspective construction-scene image datasets restricts further development in the field. Therefore, the DJI Mavic 3T drone was utilized to obtain construction-site images to establish an open-source overhead image dataset for construction scene UB-CSD. Several advanced object-detection algorithms were selected for comparative experiments on the UB-CSD dataset, and the reasons for performance differences were analyzed from multiple dimensions such as model workflow design, computation principle, and task characteristics. The mAPs of every algorithm’s detection result were YOLOv8 and YOLOv10 (96.1%), YOLOv9 (96.0%), YOLO11 (95.7%), DETR (95.3%), Faster-RCNN (76.3%) and RetinaNet (72.1%). The analysis results indicated that the YOLO series algorithm constituted the most optical algorithm for drone-based object detection tasks in construction scenes. By establishing a new open-source special dataset and conducting comparative experiments, the conclusion drawn provided effective data and experimental cases to support future safety production management and object-detection algorithm research in the construction industry.

Key words: construction scene, drones, object detection, YOLO, Faster-RCNN, DETR, RetinaNet

CLC Number:

TU71

SONG Zhuo, LU Dehui, HUANG Zhichao, TIAN Shiyu, YAN Ronglong, DENG Yichuan. Performance evaluation of construction site object detection under drone-captured perspective[J]. Journal of Graphics, 2026, 47(1): 68-77.

Figures/Tables 10

References 41

[1]	朱密. 基于图像语义的建筑施工风险场景识别[D]. 大连. 大连理工大学, 2020.
	ZHU M. Recognition of high-risk scenarios in building construction based on image semantics[D]. Dalian: Dalian University of Technology, 2020 (in Chinese).
[2]	崔自强, 杨淑娟, 于德湖. 人工智能在建筑施工领域应用研究进展[J]. 山东建筑大学学报, 2023, 38(4): 117-125, 134.
	CUI Z Q, YANG S J, YU D H. Research progress on the application of artificial intelligence in the field of building construction[J]. Journal of Shandong Jianzhu University, 2023, 38(4): 117-125, 134 (in Chinese).
[3]	PANERU S, JEELANI I. Computer vision applications in construction: current state, opportunities & challenges[J]. Automation in Construction, 2021, 132: 103940. DOI URL
[4]	吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848.
	WU Y Q, TONG K. Research advances on deep learning-based small object detection in UAV aerial images[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 30848 (in Chinese).
[5]	尹东. 基于无人机和计算机视觉的智慧工地管理方法研究[D]. 长沙: 湖南大学, 2022.
	YIN D. Study of intelligent construction site management based on UAV and computer vision[D]. Changsha: Hunan University, 2022 (in Chinese).
[6]	石智强. 基于无人机遥感数据的施工现场不安全行为检测和安全状态分析研究[D]. 宜昌: 三峡大学, 2023.
	SHI Z Q. Research on unsafe behavior detection and safety state analysis of construction site based on UAV remote sensing data[D]. Yichang: China Three Gorges University, 2023 (in Chinese).
[7]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 142-158. DOI PMID
[8]	GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[9]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[10]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[11]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944.
[12]	KHANAM R, HUSSAIN M. YOLOv11:an overview of the key architectural enhancements[EB/OL]. [2025-03-05]. https://arxiv.org/pdf/2410.17725.
[13]	JOCHER G. Ultralytics YOLO[EB/OL]. [2025-03-05]. https://github.com/ultralytics/ultralytics.
[14]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[15]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI URL
[16]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[17]	WU X, LI W, HONG D F, et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey[J]. IEEE Geoscience and Remote Sensing Magazine, 2022, 10(1): 91-124. DOI URL
[18]	TANG G Y, NI J J, ZHAO Y H, et al. A survey of object detection for UAVs based on deep learning[J]. Remote Sensing, 2024, 16(1): 149. DOI URL
[19]	XIANG T Z, XIA G S, ZHANG L P. Mini-unmanned aerial vehicle-based remote sensing: techniques, applications, and prospects[J]. IEEE Geoscience and Remote Sensing Magazine, 2019, 7(3): 29-63. DOI URL
[20]	DING J J, ZHANG J H, ZHAN Z Q, et al. A precision efficient method for collapsed building detection in post-earthquake UAV images based on the improved NMS algorithm and faster R-CNN[J]. Remote Sensing, 2022, 14(3): 663. DOI URL
[21]	CHEN F C, JAHANSHAHI M R. ARF-Crack: rotation invariant deep fully convolutional network for pixel-level crack detection[J]. Machine Vision and Applications, 2020, 31(6): 47. DOI
[22]	ZHOU Y Z, YE Q X, QIU Q, et al. Oriented response networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4961-4970.
[23]	HU G S, YAO P, WAN M Z, et al. Detection and classification of diseased pine trees with different levels of severity from UAV remote sensing images[J]. Ecological Informatics, 2022, 72: 101844. DOI URL
[24]	BASHIR S M A, WANG Y. Small object detection in remote sensing images with residual feature aggregation-based super-resolution and object detector network[J]. Remote Sensing, 2021, 13(9): 1854. DOI URL
[25]	蒋文全, 高豪云, 郑佳秋, 等. 无人机在民用行业应用研究综述[J]. 机电工程技术, 2025, 54(9): 119-124, 183.
	JIANG W Q, GAO H Y, ZHENG J Q, et al. Review of researches on the application of UAV in the civilian industry[J]. Mechanical & Electrical Engineering Technology, 2025, 54(9): 119-124, 183 (in Chinese).
[26]	VARGA L A, KIEFER B, MESSMER M, et al. SeaDronesSee: a maritime benchmark for detecting humans in open water[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 3686-3696.
[27]	DENG J N, SHI Z G, ZHUO C. Energy-efficient real-time UAV object detection on embedded platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 3123-3127. DOI URL
[28]	HSIEH M R, LIN Y L, HSU W H. Drone-based object counting by spatially regularized regional proposal network[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4165-4173.
[29]	DU D W, QI Y K, YU H Y, et al. The unmanned aerial vehicle benchmark: object detection and tracking[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 375-391.
[30]	ZHU P F, WEN L Y, BIAN X, et al. Vision meets drones: a challenge[EB/OL]. [2025-03-05]. https://arxiv.org/abs/1804.07437.
[31]	XU X W, ZHANG X Y, YU B, et al. DAC-SDC low power object detection challenge for UAV applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 392-403. DOI URL
[32]	BOZCAN I, KAYACAN E. AU-AIR: a multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance[C]// 2020 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2020: 8504-8510.
[33]	ZHANG W, LIU C S, CHANG F L, et al. Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images[J]. Remote Sensing, 2020, 12(11): 1760. DOI URL
[34]	ZHANG H J, SUN M S, LI Q, et al. An empirical study of multi-scale object detection in high resolution UAV images[J]. Neurocomputing, 2021, 421, 173-182. DOI URL
[35]	SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713. DOI URL
[36]	AKSHATHA K R, KARUNAKAR A K, SHENOY B S, et al. Manipal-UAV person detection dataset: a step towards benchmarking dataset and algorithms for small object detection[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 77-89. DOI URL
[37]	AHMED I, AHMAD M, ADNAN A, et al. Person detector for different overhead views using machine learning[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(10): 2657-2668. DOI
[38]	CAO Z, KOOISTRA L, WANG W S, et al. Real-time object detection based on UAV remote sensing: a systematic literature review[J]. Drones, 2023, 7(10): 620. DOI URL
[39]	SUN Z Q, CAO S C, YANG Y M, et al. Rethinking transformer-based set prediction for object detection[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 3591-3600.
[40]	SZELISKI R. Computer vision: algorithms and applications[M]. 2nd ed. New York: Springer, 2022: 30-35.
[41]	DEAN J, CORRADO G S, MONGA R, et al. Large scale distributed deep networks[C]// The 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1223-1231.

数据集	模态	图像数/k	图像尺寸	目标数/k	类别数	任务场景	开源与否
CARPK^[28]	可见光	1.45	1 280×720	89.78	1	车辆计数	是
UAVDT^[29]	可见光	80.00	1 080×540	840.00	3	车辆检测追踪	是
VisDrone^[30]	可见光	10.21	2 000×1 500	540.00	10	多类目标检测	是
DAC-SDC^[31]	可见光	150.00	640×360	─	95	多类目标检测	是
AU-Air^[32]	多模态	32.82	1 920×1 080	132.00	8	交通监测	是
UVSD^[33]	可见光	5.87	960×540~5 280×2 970	58.60	1	车辆检测	是
MOHR^[34]	可见光	10.63	5 472×3 078/7 360×4 192/8 688×5 792	90.01	5	多类目标检测	否
DroneVehicle^[35]	可见光红外线	56.88	840×712	819.00	5	车辆检测	是
SeaDroneSee^[26]	多光谱	54.00	3 840×2 160~5 456×3 632	400.00	6	海上人员检测	是
ManipalUAV^[36]	可见光	13.46	1 280×720	153.11	1	行人检测	是

数据集	模态	图像数/k	图像尺寸	目标数/k	类别数	任务场景	开源与否
CARPK^[28]	可见光	1.45	1 280×720	89.78	1	车辆计数	是
UAVDT^[29]	可见光	80.00	1 080×540	840.00	3	车辆检测追踪	是
VisDrone^[30]	可见光	10.21	2 000×1 500	540.00	10	多类目标检测	是
DAC-SDC^[31]	可见光	150.00	640×360	─	95	多类目标检测	是
AU-Air^[32]	多模态	32.82	1 920×1 080	132.00	8	交通监测	是
UVSD^[33]	可见光	5.87	960×540~5 280×2 970	58.60	1	车辆检测	是
MOHR^[34]	可见光	10.63	5 472×3 078/7 360×4 192/8 688×5 792	90.01	5	多类目标检测	否
DroneVehicle^[35]	可见光红外线	56.88	840×712	819.00	5	车辆检测	是
SeaDroneSee^[26]	多光谱	54.00	3 840×2 160~5 456×3 632	400.00	6	海上人员检测	是
ManipalUAV^[36]	可见光	13.46	1 280×720	153.11	1	行人检测	是

训练模型	是否使用预训练权重	训练世代	批处理规模	初始学习率	动量
YOLO系列	否	100	16	0.010 0	0.9
Faster-RCNN	否	10 000	256	0.001 0	0.9
RetinaNet	否	冻结阶段：50 解冻阶段：50	冻结阶段：16 解冻阶段：8	0.000 1	0.9
DETR	是	100	4	0.000 1	0.9

训练模型	是否使用预训练权重	训练世代	批处理规模	初始学习率	动量
YOLO系列	否	100	16	0.010 0	0.9
Faster-RCNN	否	10 000	256	0.001 0	0.9
RetinaNet	否	冻结阶段：50 解冻阶段：50	冻结阶段：16 解冻阶段：8	0.000 1	0.9
DETR	是	100	4	0.000 1	0.9

算法	人员	轿车	水泥搅拌车	卡车	水泥泵车	旋挖钻机	挖掘机	起重车	挖沟机	mAP
YOLOv8	0.888	0.921	0.981	0.973	0.987	0.994	0.993	0.985	0.931	0.961
YOLOv9	0.887	0.913	0.980	0.968	0.988	0.994	0.992	0.984	0.934	0.960
YOLOv10	0.888	0.921	0.981	0.973	0.987	0.994	0.993	0.985	0.931	0.961
YOLO11	0.885	0.911	0.980	0.970	0.984	0.994	0.993	0.982	0.911	0.957
Faster-RCNN	0.484	0.743	0.888	0.754	0.784	0.871	0.812	0.799	0.733	0.763
RetinaNet	0.360	0.654	0.885	0.666	0.838	0.757	0.877	0.705	0.744	0.721
DETR	0.802	0.951	0.973	0.940	0.979	0.990	0.988	0.979	0.979	0.953