无人机视角下施工场景目标检测性能评估

doi:10.11996/JG.j.2095-302X.2026010068

图学学报 ›› 2026, Vol. 47 ›› Issue (1): 68-77.DOI: 10.11996/JG.j.2095-302X.2026010068

• 图像处理与计算机视觉 • 上一篇下一篇

无人机视角下施工场景目标检测性能评估

宋琢¹, 卢德辉¹, 黄志超¹, 田时雨¹, 颜嵘龙², 邓逸川²^,³()

¹ 广州一建建设集团有限公司，广东广州 510060
² 华南理工大学土木与交通学院，广东广州 510641
³ 亚热带城市与建筑科学全国重点实验室，广东广州 510641

收稿日期:2025-03-19 接受日期:2025-07-23 出版日期:2026-02-28 发布日期:2026-03-16
通讯作者:邓逸川，E-mail：ctycdeng@scut.edu.cn
基金资助:
国家自然科学基金(52308314);广东省自然科学基金-青年提升项目(2023A1515030169);广东省住房和城乡建设厅科技创新计划项目(20250305J0004);广州市建筑集团有限公司科技计划项目([2023]-KJ008)

Performance evaluation of construction site object detection under drone-captured perspective

SONG Zhuo¹, LU Dehui¹, HUANG Zhichao¹, TIAN Shiyu¹, YAN Ronglong², DENG Yichuan²^,³()

¹ Guangzhou No. 1 Construction Group Co. Ltd., Guangzhou Guangdong 510060, China
² School of Civil Engineering and Transportation, South China University of Technology, Guangzhou Guangdong 510641, China
³ State Key Laboratory of Subtropical Building and Urban Science, Guangzhou Guangdong 510641, China

Received:2025-03-19 Accepted:2025-07-23 Published:2026-02-28 Online:2026-03-16
Supported by:
National Natural Science Foundation of China(52308314);Youth Enhance Project of Natural Science Foundation of Guangdong Province(2023A1515030169);Technology Innovation Program of Guangdong Provincial Department of Housing and Urban-Rural Development(20250305J0004);Technology Program Project of Guangzhou Municipal Construction Group CO. LTD([2023]-KJ008)

摘要/Abstract

摘要：

施工现场的组织管理是工程管理的关键环节，但传统的人力监管方法限制多、效率低。近年国家多部委发布有关政策，呼吁促进人工智能与实体经济深度融合，以人工智能推动经济高质高效发展。计算机视觉(CV)技术的准确性、高效性和自动化等优点使CV技术在施工监理领域的应用逐渐广泛，特别是无人机能高效获取复杂多变的施工场景视觉数据的特性显示出其在基于CV技术的施工监管任务中的应用潜力。但当前基于无人机的施工场景目标检测研究有限，且稀缺的无人机视角下的施工场景图像数据集限制着有关研究的深入发展。因此，采用大疆Mavic 3T无人机用于获取施工现场图像，以建立开源的施工场景俯拍图像数据集UB-CSD。选用多种先进目标检测算法在UB-CSD数据集上进行对比实验，从模型流程设计、计算原理和任务场景特性等维度分析各算法性能差异原因。各算法的mAP检测结果为YOLOv8和YOLOv10 (96.1%)，YOLOv9 (96.0%)，YOLO11 (95.7%)，DETR (95.3%)，Faster-RCNN (76.3%)和RetinaNet (72.1%)。分析结果表明，YOLO系列算法是基于无人机的施工场景目标检测任务算法的最优选。通过构建全新的开源专用数据集和开展对比实验得出的以上数据及结论，将为建筑业安全生产管理与日后相关检测研究提供有效数据与实验案例。

关键词: 施工场景, 无人机, 目标检测, YOLO, Faster-RCNN, DETR, RetinaNet

Abstract:

The organizational management of construction sites is a critical aspect in engineering management; however, traditional human supervision method is constrained by many environment limitations and low efficiency. In recent years, multiple government departments have issued relevant policies advocating deep integration of artificial intelligence with the real economy to promote high-quality and efficient economic development. The accuracy, efficiency, and automation advantages of Computer Vision (CV) technology have gradually led to its widespread application in the field of construction supervision. Meanwhile, the drones, which can efficiently obtain complex and varied visual data of construction scene, demonstrate their application potential in CV-based construction supervision tasks. However, the current researches on drone-based construction scene detection are limited, and the lack of overhead-perspective construction-scene image datasets restricts further development in the field. Therefore, the DJI Mavic 3T drone was utilized to obtain construction-site images to establish an open-source overhead image dataset for construction scene UB-CSD. Several advanced object-detection algorithms were selected for comparative experiments on the UB-CSD dataset, and the reasons for performance differences were analyzed from multiple dimensions such as model workflow design, computation principle, and task characteristics. The mAPs of every algorithm’s detection result were YOLOv8 and YOLOv10 (96.1%), YOLOv9 (96.0%), YOLO11 (95.7%), DETR (95.3%), Faster-RCNN (76.3%) and RetinaNet (72.1%). The analysis results indicated that the YOLO series algorithm constituted the most optical algorithm for drone-based object detection tasks in construction scenes. By establishing a new open-source special dataset and conducting comparative experiments, the conclusion drawn provided effective data and experimental cases to support future safety production management and object-detection algorithm research in the construction industry.

Key words: construction scene, drones, object detection, YOLO, Faster-RCNN, DETR, RetinaNet

中图分类号:

TU71

宋琢, 卢德辉, 黄志超, 田时雨, 颜嵘龙, 邓逸川. 无人机视角下施工场景目标检测性能评估[J]. 图学学报, 2026, 47(1): 68-77.

SONG Zhuo, LU Dehui, HUANG Zhichao, TIAN Shiyu, YAN Ronglong, DENG Yichuan. Performance evaluation of construction site object detection under drone-captured perspective[J]. Journal of Graphics, 2026, 47(1): 68-77.

图/表 10

参考文献 41

[1]	朱密. 基于图像语义的建筑施工风险场景识别[D]. 大连. 大连理工大学, 2020.
	ZHU M. Recognition of high-risk scenarios in building construction based on image semantics[D]. Dalian: Dalian University of Technology, 2020 (in Chinese).
[2]	崔自强, 杨淑娟, 于德湖. 人工智能在建筑施工领域应用研究进展[J]. 山东建筑大学学报, 2023, 38(4): 117-125, 134.
	CUI Z Q, YANG S J, YU D H. Research progress on the application of artificial intelligence in the field of building construction[J]. Journal of Shandong Jianzhu University, 2023, 38(4): 117-125, 134 (in Chinese).
[3]	PANERU S, JEELANI I. Computer vision applications in construction: current state, opportunities & challenges[J]. Automation in Construction, 2021, 132: 103940. DOI URL
[4]	吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848.
	WU Y Q, TONG K. Research advances on deep learning-based small object detection in UAV aerial images[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 30848 (in Chinese).
[5]	尹东. 基于无人机和计算机视觉的智慧工地管理方法研究[D]. 长沙: 湖南大学, 2022.
	YIN D. Study of intelligent construction site management based on UAV and computer vision[D]. Changsha: Hunan University, 2022 (in Chinese).
[6]	石智强. 基于无人机遥感数据的施工现场不安全行为检测和安全状态分析研究[D]. 宜昌: 三峡大学, 2023.
	SHI Z Q. Research on unsafe behavior detection and safety state analysis of construction site based on UAV remote sensing data[D]. Yichang: China Three Gorges University, 2023 (in Chinese).
[7]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 142-158. DOI PMID
[8]	GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[9]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[10]	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI PMID
[11]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944.
[12]	KHANAM R, HUSSAIN M. YOLOv11:an overview of the key architectural enhancements[EB/OL]. [2025-03-05]. https://arxiv.org/pdf/2410.17725.
[13]	JOCHER G. Ultralytics YOLO[EB/OL]. [2025-03-05]. https://github.com/ultralytics/ultralytics.
[14]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
[15]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI URL
[16]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229.
[17]	WU X, LI W, HONG D F, et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey[J]. IEEE Geoscience and Remote Sensing Magazine, 2022, 10(1): 91-124. DOI URL
[18]	TANG G Y, NI J J, ZHAO Y H, et al. A survey of object detection for UAVs based on deep learning[J]. Remote Sensing, 2024, 16(1): 149. DOI URL
[19]	XIANG T Z, XIA G S, ZHANG L P. Mini-unmanned aerial vehicle-based remote sensing: techniques, applications, and prospects[J]. IEEE Geoscience and Remote Sensing Magazine, 2019, 7(3): 29-63. DOI URL
[20]	DING J J, ZHANG J H, ZHAN Z Q, et al. A precision efficient method for collapsed building detection in post-earthquake UAV images based on the improved NMS algorithm and faster R-CNN[J]. Remote Sensing, 2022, 14(3): 663. DOI URL
[21]	CHEN F C, JAHANSHAHI M R. ARF-Crack: rotation invariant deep fully convolutional network for pixel-level crack detection[J]. Machine Vision and Applications, 2020, 31(6): 47. DOI
[22]	ZHOU Y Z, YE Q X, QIU Q, et al. Oriented response networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4961-4970.
[23]	HU G S, YAO P, WAN M Z, et al. Detection and classification of diseased pine trees with different levels of severity from UAV remote sensing images[J]. Ecological Informatics, 2022, 72: 101844. DOI URL
[24]	BASHIR S M A, WANG Y. Small object detection in remote sensing images with residual feature aggregation-based super-resolution and object detector network[J]. Remote Sensing, 2021, 13(9): 1854. DOI URL
[25]	蒋文全, 高豪云, 郑佳秋, 等. 无人机在民用行业应用研究综述[J]. 机电工程技术, 2025, 54(9): 119-124, 183.
	JIANG W Q, GAO H Y, ZHENG J Q, et al. Review of researches on the application of UAV in the civilian industry[J]. Mechanical & Electrical Engineering Technology, 2025, 54(9): 119-124, 183 (in Chinese).
[26]	VARGA L A, KIEFER B, MESSMER M, et al. SeaDronesSee: a maritime benchmark for detecting humans in open water[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 3686-3696.
[27]	DENG J N, SHI Z G, ZHUO C. Energy-efficient real-time UAV object detection on embedded platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 3123-3127. DOI URL
[28]	HSIEH M R, LIN Y L, HSU W H. Drone-based object counting by spatially regularized regional proposal network[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4165-4173.
[29]	DU D W, QI Y K, YU H Y, et al. The unmanned aerial vehicle benchmark: object detection and tracking[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 375-391.
[30]	ZHU P F, WEN L Y, BIAN X, et al. Vision meets drones: a challenge[EB/OL]. [2025-03-05]. https://arxiv.org/abs/1804.07437.
[31]	XU X W, ZHANG X Y, YU B, et al. DAC-SDC low power object detection challenge for UAV applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 392-403. DOI URL
[32]	BOZCAN I, KAYACAN E. AU-AIR: a multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance[C]// 2020 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2020: 8504-8510.
[33]	ZHANG W, LIU C S, CHANG F L, et al. Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images[J]. Remote Sensing, 2020, 12(11): 1760. DOI URL
[34]	ZHANG H J, SUN M S, LI Q, et al. An empirical study of multi-scale object detection in high resolution UAV images[J]. Neurocomputing, 2021, 421, 173-182. DOI URL
[35]	SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713. DOI URL
[36]	AKSHATHA K R, KARUNAKAR A K, SHENOY B S, et al. Manipal-UAV person detection dataset: a step towards benchmarking dataset and algorithms for small object detection[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 77-89. DOI URL
[37]	AHMED I, AHMAD M, ADNAN A, et al. Person detector for different overhead views using machine learning[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(10): 2657-2668. DOI
[38]	CAO Z, KOOISTRA L, WANG W S, et al. Real-time object detection based on UAV remote sensing: a systematic literature review[J]. Drones, 2023, 7(10): 620. DOI URL
[39]	SUN Z Q, CAO S C, YANG Y M, et al. Rethinking transformer-based set prediction for object detection[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 3591-3600.
[40]	SZELISKI R. Computer vision: algorithms and applications[M]. 2nd ed. New York: Springer, 2022: 30-35.
[41]	DEAN J, CORRADO G S, MONGA R, et al. Large scale distributed deep networks[C]// The 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1223-1231.

数据集	模态	图像数/k	图像尺寸	目标数/k	类别数	任务场景	开源与否
CARPK^[28]	可见光	1.45	1 280×720	89.78	1	车辆计数	是
UAVDT^[29]	可见光	80.00	1 080×540	840.00	3	车辆检测追踪	是
VisDrone^[30]	可见光	10.21	2 000×1 500	540.00	10	多类目标检测	是
DAC-SDC^[31]	可见光	150.00	640×360	─	95	多类目标检测	是
AU-Air^[32]	多模态	32.82	1 920×1 080	132.00	8	交通监测	是
UVSD^[33]	可见光	5.87	960×540~5 280×2 970	58.60	1	车辆检测	是
MOHR^[34]	可见光	10.63	5 472×3 078/7 360×4 192/8 688×5 792	90.01	5	多类目标检测	否
DroneVehicle^[35]	可见光红外线	56.88	840×712	819.00	5	车辆检测	是
SeaDroneSee^[26]	多光谱	54.00	3 840×2 160~5 456×3 632	400.00	6	海上人员检测	是
ManipalUAV^[36]	可见光	13.46	1 280×720	153.11	1	行人检测	是

数据集	模态	图像数/k	图像尺寸	目标数/k	类别数	任务场景	开源与否
CARPK^[28]	可见光	1.45	1 280×720	89.78	1	车辆计数	是
UAVDT^[29]	可见光	80.00	1 080×540	840.00	3	车辆检测追踪	是
VisDrone^[30]	可见光	10.21	2 000×1 500	540.00	10	多类目标检测	是
DAC-SDC^[31]	可见光	150.00	640×360	─	95	多类目标检测	是
AU-Air^[32]	多模态	32.82	1 920×1 080	132.00	8	交通监测	是
UVSD^[33]	可见光	5.87	960×540~5 280×2 970	58.60	1	车辆检测	是
MOHR^[34]	可见光	10.63	5 472×3 078/7 360×4 192/8 688×5 792	90.01	5	多类目标检测	否
DroneVehicle^[35]	可见光红外线	56.88	840×712	819.00	5	车辆检测	是
SeaDroneSee^[26]	多光谱	54.00	3 840×2 160~5 456×3 632	400.00	6	海上人员检测	是
ManipalUAV^[36]	可见光	13.46	1 280×720	153.11	1	行人检测	是

训练模型	是否使用预训练权重	训练世代	批处理规模	初始学习率	动量
YOLO系列	否	100	16	0.010 0	0.9
Faster-RCNN	否	10 000	256	0.001 0	0.9
RetinaNet	否	冻结阶段：50 解冻阶段：50	冻结阶段：16 解冻阶段：8	0.000 1	0.9
DETR	是	100	4	0.000 1	0.9

训练模型	是否使用预训练权重	训练世代	批处理规模	初始学习率	动量
YOLO系列	否	100	16	0.010 0	0.9
Faster-RCNN	否	10 000	256	0.001 0	0.9
RetinaNet	否	冻结阶段：50 解冻阶段：50	冻结阶段：16 解冻阶段：8	0.000 1	0.9
DETR	是	100	4	0.000 1	0.9

算法	人员	轿车	水泥搅拌车	卡车	水泥泵车	旋挖钻机	挖掘机	起重车	挖沟机	mAP
YOLOv8	0.888	0.921	0.981	0.973	0.987	0.994	0.993	0.985	0.931	0.961
YOLOv9	0.887	0.913	0.980	0.968	0.988	0.994	0.992	0.984	0.934	0.960
YOLOv10	0.888	0.921	0.981	0.973	0.987	0.994	0.993	0.985	0.931	0.961
YOLO11	0.885	0.911	0.980	0.970	0.984	0.994	0.993	0.982	0.911	0.957
Faster-RCNN	0.484	0.743	0.888	0.754	0.784	0.871	0.812	0.799	0.733	0.763
RetinaNet	0.360	0.654	0.885	0.666	0.838	0.757	0.877	0.705	0.744	0.721
DETR	0.802	0.951	0.973	0.940	0.979	0.990	0.988	0.979	0.979	0.953

无人机视角下施工场景目标检测性能评估

Performance evaluation of construction site object detection under drone-captured perspective

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 41

相关文章 15

编辑推荐

Metrics

本文评价

[1]	董文益, 杨伟东, 唐冰慧, 王琦, 肖宏宇. 基于深度学习的肝脏局灶性病变检测方法综述[J]. 图学学报, 2026, 47(1): 1-16.
[2]	杨彪, 王学, 官铮, 龙萍. BSD-YOLO：基于动态稀疏注意力与自适应检测头的小目标车辆检测方法[J]. 图学学报, 2026, 47(1): 99-110.
[3]	赵振兵, 欧阳文斌, 冯烁, 李浩鹏, 马隽. 基于类内稀疏先验与改进YOLOv8的绝缘子红外图像检测方法[J]. 图学学报, 2025, 46(6): 1247-1256.
[4]	肖凯, 袁玲, 储珺. 基于周期一致性和动态记忆增强的无监督无人机目标跟踪[J]. 图学学报, 2025, 46(6): 1281-1291.
[5]	王海涵. 基于YOLOv8-OSRA的钢拱塔表观病害多目标检测方法[J]. 图学学报, 2025, 46(6): 1327-1336.
[6]	刘伯凯, 殷雪峰, 孙传昱, 葛慧林, 魏子麒, 姜雨彤, 朴海音, 周东生, 杨鑫. 基于深度强化学习的无人机三维场景导航方法研究[J]. 图学学报, 2025, 46(5): 1010-1017.
[7]	翟永杰, 翟邦朝, 胡哲东, 杨珂, 王乾铭, 赵晓瑜. 基于自适应特征融合金字塔与注意力机制的输电线路绝缘子缺陷检测方法[J]. 图学学报, 2025, 46(5): 950-959.
[8]	郭瑞东, 蓝贵文, 范冬林, 钟展, 徐梓睿, 任新月. 基于特征聚焦扩散网络的电力巡检目标检测算法[J]. 图学学报, 2025, 46(4): 719-726.
[9]	胡悦, 孙智达, 黄惠. 面向无人机路径规划的可视分析系统[J]. 图学学报, 2025, 46(3): 655-665.
[10]	王志东, 陈晨阳, 刘晓明. 基于自适应特征提取的通信光缆缺陷检测方法[J]. 图学学报, 2025, 46(2): 241-248.
[11]	张立立, 杨康, 张珂, 魏薇, 李晶, 谭洪鑫, 张翔宇. 面向柴油车辆排放黑烟的改进型YOLOv8检测算法研究[J]. 图学学报, 2025, 46(2): 249-258.
[12]	翟永杰, 王璐瑶, 赵晓瑜, 胡哲东, 王乾铭, 王亚茹. 基于级联查询-位置关系的输电线路多金具检测方法[J]. 图学学报, 2025, 46(2): 288-299.
[13]	赵振兵, 韩钰, 唐辰康. 基于改进YOLOv8的配电线路绝缘子缺陷级联检测方法[J]. 图学学报, 2025, 46(1): 1-12.
[14]	程旭东, 史彩娟, 高炜翔, 王森, 段昌钰, 闫晓东. 面向域自适应目标检测的一致无偏教师模型[J]. 图学学报, 2025, 46(1): 114-125.
[15]	崔克彬, 耿佳昌. 基于EE-YOLOv8s的多场景火灾迹象检测算法[J]. 图学学报, 2025, 46(1): 13-27.