图学学报 ›› 2026, Vol. 47 ›› Issue (1): 68-77.DOI: 10.11996/JG.j.2095-302X.2026010068
宋琢1, 卢德辉1, 黄志超1, 田时雨1, 颜嵘龙2, 邓逸川2,3(
)
收稿日期:2025-03-19
接受日期:2025-07-23
出版日期:2026-02-28
发布日期:2026-03-16
通讯作者:邓逸川,E-mail:ctycdeng@scut.edu.cn基金资助:
SONG Zhuo1, LU Dehui1, HUANG Zhichao1, TIAN Shiyu1, YAN Ronglong2, DENG Yichuan2,3(
)
Received:2025-03-19
Accepted:2025-07-23
Published:2026-02-28
Online:2026-03-16
Supported by:摘要:
施工现场的组织管理是工程管理的关键环节,但传统的人力监管方法限制多、效率低。近年国家多部委发布有关政策,呼吁促进人工智能与实体经济深度融合,以人工智能推动经济高质高效发展。计算机视觉(CV)技术的准确性、高效性和自动化等优点使CV技术在施工监理领域的应用逐渐广泛,特别是无人机能高效获取复杂多变的施工场景视觉数据的特性显示出其在基于CV技术的施工监管任务中的应用潜力。但当前基于无人机的施工场景目标检测研究有限,且稀缺的无人机视角下的施工场景图像数据集限制着有关研究的深入发展。因此,采用大疆Mavic 3T无人机用于获取施工现场图像,以建立开源的施工场景俯拍图像数据集UB-CSD。选用多种先进目标检测算法在UB-CSD数据集上进行对比实验,从模型流程设计、计算原理和任务场景特性等维度分析各算法性能差异原因。各算法的mAP检测结果为YOLOv8和YOLOv10 (96.1%),YOLOv9 (96.0%),YOLO11 (95.7%),DETR (95.3%),Faster-RCNN (76.3%)和RetinaNet (72.1%)。分析结果表明,YOLO系列算法是基于无人机的施工场景目标检测任务算法的最优选。通过构建全新的开源专用数据集和开展对比实验得出的以上数据及结论,将为建筑业安全生产管理与日后相关检测研究提供有效数据与实验案例。
中图分类号:
宋琢, 卢德辉, 黄志超, 田时雨, 颜嵘龙, 邓逸川. 无人机视角下施工场景目标检测性能评估[J]. 图学学报, 2026, 47(1): 68-77.
SONG Zhuo, LU Dehui, HUANG Zhichao, TIAN Shiyu, YAN Ronglong, DENG Yichuan. Performance evaluation of construction site object detection under drone-captured perspective[J]. Journal of Graphics, 2026, 47(1): 68-77.
| 数据集 | 模态 | 图像数/k | 图像尺寸 | 目标数/k | 类别数 | 任务场景 | 开源与否 |
|---|---|---|---|---|---|---|---|
| CARPK[ | 可见光 | 1.45 | 1 280×720 | 89.78 | 1 | 车辆计数 | 是 |
| UAVDT[ | 可见光 | 80.00 | 1 080×540 | 840.00 | 3 | 车辆检测追踪 | 是 |
| VisDrone[ | 可见光 | 10.21 | 2 000×1 500 | 540.00 | 10 | 多类目标检测 | 是 |
| DAC-SDC[ | 可见光 | 150.00 | 640×360 | ─ | 95 | 多类目标检测 | 是 |
| AU-Air[ | 多模态 | 32.82 | 1 920×1 080 | 132.00 | 8 | 交通监测 | 是 |
| UVSD[ | 可见光 | 5.87 | 960×540~5 280×2 970 | 58.60 | 1 | 车辆检测 | 是 |
| MOHR[ | 可见光 | 10.63 | 5 472×3 078/7 360×4 192/8 688×5 792 | 90.01 | 5 | 多类目标检测 | 否 |
| DroneVehicle[ | 可见光 红外线 | 56.88 | 840×712 | 819.00 | 5 | 车辆检测 | 是 |
| SeaDroneSee[ | 多光谱 | 54.00 | 3 840×2 160~5 456×3 632 | 400.00 | 6 | 海上人员检测 | 是 |
| ManipalUAV[ | 可见光 | 13.46 | 1 280×720 | 153.11 | 1 | 行人检测 | 是 |
表1 无人机俯拍图像数据集调研表
Table 1 Survey on drone-captured image datasets
| 数据集 | 模态 | 图像数/k | 图像尺寸 | 目标数/k | 类别数 | 任务场景 | 开源与否 |
|---|---|---|---|---|---|---|---|
| CARPK[ | 可见光 | 1.45 | 1 280×720 | 89.78 | 1 | 车辆计数 | 是 |
| UAVDT[ | 可见光 | 80.00 | 1 080×540 | 840.00 | 3 | 车辆检测追踪 | 是 |
| VisDrone[ | 可见光 | 10.21 | 2 000×1 500 | 540.00 | 10 | 多类目标检测 | 是 |
| DAC-SDC[ | 可见光 | 150.00 | 640×360 | ─ | 95 | 多类目标检测 | 是 |
| AU-Air[ | 多模态 | 32.82 | 1 920×1 080 | 132.00 | 8 | 交通监测 | 是 |
| UVSD[ | 可见光 | 5.87 | 960×540~5 280×2 970 | 58.60 | 1 | 车辆检测 | 是 |
| MOHR[ | 可见光 | 10.63 | 5 472×3 078/7 360×4 192/8 688×5 792 | 90.01 | 5 | 多类目标检测 | 否 |
| DroneVehicle[ | 可见光 红外线 | 56.88 | 840×712 | 819.00 | 5 | 车辆检测 | 是 |
| SeaDroneSee[ | 多光谱 | 54.00 | 3 840×2 160~5 456×3 632 | 400.00 | 6 | 海上人员检测 | 是 |
| ManipalUAV[ | 可见光 | 13.46 | 1 280×720 | 153.11 | 1 | 行人检测 | 是 |
图2 平视与俯拍视角下的人体姿态对比 ((a) 平视视角下人体姿态;(b) 俯拍视角下人体姿态)
Fig. 2 Human posture comparison between frontal and downward perspectives ((a) Human posture from frontal perspective; (b) Human posture under downward perspective)
图3 UB-CSD数据集部分图像展示((a)~(e) 白天拍摄图像;(f)~(i) 部分夜间拍摄图像)
Fig. 3 Presentation of some images in UB-CSD dataset ((a)~(e) Daytime images; (f)~(i) Nighttime images)
图4 部分增强图像组((a) 旋转增强;(b) 平移增强;(c) 缩放增强;(d) 亮度增强)
Fig. 4 Part of enhanced figure groups ((a) Rotation enhancement; (b) Translation enhancement; (c) Scaling enhancement; (d) Brightness enhancement)
| 训练模型 | 是否使用预训练权重 | 训练世代 | 批处理规模 | 初始学习率 | 动量 |
|---|---|---|---|---|---|
| YOLO系列 | 否 | 100 | 16 | 0.010 0 | 0.9 |
| Faster-RCNN | 否 | 10 000 | 256 | 0.001 0 | 0.9 |
| RetinaNet | 否 | 冻结阶段:50 解冻阶段:50 | 冻结阶段:16 解冻阶段:8 | 0.000 1 | 0.9 |
| DETR | 是 | 100 | 4 | 0.000 1 | 0.9 |
表2 算法训练参数表
Table 2 Algorithm train parameter table
| 训练模型 | 是否使用预训练权重 | 训练世代 | 批处理规模 | 初始学习率 | 动量 |
|---|---|---|---|---|---|
| YOLO系列 | 否 | 100 | 16 | 0.010 0 | 0.9 |
| Faster-RCNN | 否 | 10 000 | 256 | 0.001 0 | 0.9 |
| RetinaNet | 否 | 冻结阶段:50 解冻阶段:50 | 冻结阶段:16 解冻阶段:8 | 0.000 1 | 0.9 |
| DETR | 是 | 100 | 4 | 0.000 1 | 0.9 |
| 算法 | 人员 | 轿车 | 水泥搅拌车 | 卡车 | 水泥泵车 | 旋挖钻机 | 挖掘机 | 起重车 | 挖沟机 | mAP |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.888 | 0.921 | 0.981 | 0.973 | 0.987 | 0.994 | 0.993 | 0.985 | 0.931 | 0.961 |
| YOLOv9 | 0.887 | 0.913 | 0.980 | 0.968 | 0.988 | 0.994 | 0.992 | 0.984 | 0.934 | 0.960 |
| YOLOv10 | 0.888 | 0.921 | 0.981 | 0.973 | 0.987 | 0.994 | 0.993 | 0.985 | 0.931 | 0.961 |
| YOLO11 | 0.885 | 0.911 | 0.980 | 0.970 | 0.984 | 0.994 | 0.993 | 0.982 | 0.911 | 0.957 |
| Faster-RCNN | 0.484 | 0.743 | 0.888 | 0.754 | 0.784 | 0.871 | 0.812 | 0.799 | 0.733 | 0.763 |
| RetinaNet | 0.360 | 0.654 | 0.885 | 0.666 | 0.838 | 0.757 | 0.877 | 0.705 | 0.744 | 0.721 |
| DETR | 0.802 | 0.951 | 0.973 | 0.940 | 0.979 | 0.990 | 0.988 | 0.979 | 0.979 | 0.953 |
表3 算法检测性能指标对比
Table 3 Comparison on detection performance indicators between algorithms
| 算法 | 人员 | 轿车 | 水泥搅拌车 | 卡车 | 水泥泵车 | 旋挖钻机 | 挖掘机 | 起重车 | 挖沟机 | mAP |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.888 | 0.921 | 0.981 | 0.973 | 0.987 | 0.994 | 0.993 | 0.985 | 0.931 | 0.961 |
| YOLOv9 | 0.887 | 0.913 | 0.980 | 0.968 | 0.988 | 0.994 | 0.992 | 0.984 | 0.934 | 0.960 |
| YOLOv10 | 0.888 | 0.921 | 0.981 | 0.973 | 0.987 | 0.994 | 0.993 | 0.985 | 0.931 | 0.961 |
| YOLO11 | 0.885 | 0.911 | 0.980 | 0.970 | 0.984 | 0.994 | 0.993 | 0.982 | 0.911 | 0.957 |
| Faster-RCNN | 0.484 | 0.743 | 0.888 | 0.754 | 0.784 | 0.871 | 0.812 | 0.799 | 0.733 | 0.763 |
| RetinaNet | 0.360 | 0.654 | 0.885 | 0.666 | 0.838 | 0.757 | 0.877 | 0.705 | 0.744 | 0.721 |
| DETR | 0.802 | 0.951 | 0.973 | 0.940 | 0.979 | 0.990 | 0.988 | 0.979 | 0.979 | 0.953 |
| [1] | 朱密. 基于图像语义的建筑施工风险场景识别[D]. 大连. 大连理工大学, 2020. |
| ZHU M. Recognition of high-risk scenarios in building construction based on image semantics[D]. Dalian: Dalian University of Technology, 2020 (in Chinese). | |
| [2] | 崔自强, 杨淑娟, 于德湖. 人工智能在建筑施工领域应用研究进展[J]. 山东建筑大学学报, 2023, 38(4): 117-125, 134. |
| CUI Z Q, YANG S J, YU D H. Research progress on the application of artificial intelligence in the field of building construction[J]. Journal of Shandong Jianzhu University, 2023, 38(4): 117-125, 134 (in Chinese). | |
| [3] |
PANERU S, JEELANI I. Computer vision applications in construction: current state, opportunities & challenges[J]. Automation in Construction, 2021, 132: 103940.
DOI URL |
| [4] | 吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848. |
| WU Y Q, TONG K. Research advances on deep learning-based small object detection in UAV aerial images[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 30848 (in Chinese). | |
| [5] | 尹东. 基于无人机和计算机视觉的智慧工地管理方法研究[D]. 长沙: 湖南大学, 2022. |
| YIN D. Study of intelligent construction site management based on UAV and computer vision[D]. Changsha: Hunan University, 2022 (in Chinese). | |
| [6] | 石智强. 基于无人机遥感数据的施工现场不安全行为检测和安全状态分析研究[D]. 宜昌: 三峡大学, 2023. |
| SHI Z Q. Research on unsafe behavior detection and safety state analysis of construction site based on UAV remote sensing data[D]. Yichang: China Three Gorges University, 2023 (in Chinese). | |
| [7] |
GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 142-158.
DOI PMID |
| [8] | GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448. |
| [9] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
| [10] |
HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
DOI PMID |
| [11] | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944. |
| [12] | KHANAM R, HUSSAIN M. YOLOv11:an overview of the key architectural enhancements[EB/OL]. [2025-03-05]. https://arxiv.org/pdf/2410.17725. |
| [13] | JOCHER G. Ultralytics YOLO[EB/OL]. [2025-03-05]. https://github.com/ultralytics/ultralytics. |
| [14] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
| [15] |
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.
DOI URL |
| [16] | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229. |
| [17] |
WU X, LI W, HONG D F, et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey[J]. IEEE Geoscience and Remote Sensing Magazine, 2022, 10(1): 91-124.
DOI URL |
| [18] |
TANG G Y, NI J J, ZHAO Y H, et al. A survey of object detection for UAVs based on deep learning[J]. Remote Sensing, 2024, 16(1): 149.
DOI URL |
| [19] |
XIANG T Z, XIA G S, ZHANG L P. Mini-unmanned aerial vehicle-based remote sensing: techniques, applications, and prospects[J]. IEEE Geoscience and Remote Sensing Magazine, 2019, 7(3): 29-63.
DOI URL |
| [20] |
DING J J, ZHANG J H, ZHAN Z Q, et al. A precision efficient method for collapsed building detection in post-earthquake UAV images based on the improved NMS algorithm and faster R-CNN[J]. Remote Sensing, 2022, 14(3): 663.
DOI URL |
| [21] |
CHEN F C, JAHANSHAHI M R. ARF-Crack: rotation invariant deep fully convolutional network for pixel-level crack detection[J]. Machine Vision and Applications, 2020, 31(6): 47.
DOI |
| [22] | ZHOU Y Z, YE Q X, QIU Q, et al. Oriented response networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4961-4970. |
| [23] |
HU G S, YAO P, WAN M Z, et al. Detection and classification of diseased pine trees with different levels of severity from UAV remote sensing images[J]. Ecological Informatics, 2022, 72: 101844.
DOI URL |
| [24] |
BASHIR S M A, WANG Y. Small object detection in remote sensing images with residual feature aggregation-based super-resolution and object detector network[J]. Remote Sensing, 2021, 13(9): 1854.
DOI URL |
| [25] | 蒋文全, 高豪云, 郑佳秋, 等. 无人机在民用行业应用研究综述[J]. 机电工程技术, 2025, 54(9): 119-124, 183. |
| JIANG W Q, GAO H Y, ZHENG J Q, et al. Review of researches on the application of UAV in the civilian industry[J]. Mechanical & Electrical Engineering Technology, 2025, 54(9): 119-124, 183 (in Chinese). | |
| [26] | VARGA L A, KIEFER B, MESSMER M, et al. SeaDronesSee: a maritime benchmark for detecting humans in open water[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 3686-3696. |
| [27] |
DENG J N, SHI Z G, ZHUO C. Energy-efficient real-time UAV object detection on embedded platforms[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 3123-3127.
DOI URL |
| [28] | HSIEH M R, LIN Y L, HSU W H. Drone-based object counting by spatially regularized regional proposal network[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4165-4173. |
| [29] | DU D W, QI Y K, YU H Y, et al. The unmanned aerial vehicle benchmark: object detection and tracking[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 375-391. |
| [30] | ZHU P F, WEN L Y, BIAN X, et al. Vision meets drones: a challenge[EB/OL]. [2025-03-05]. https://arxiv.org/abs/1804.07437. |
| [31] |
XU X W, ZHANG X Y, YU B, et al. DAC-SDC low power object detection challenge for UAV applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 392-403.
DOI URL |
| [32] | BOZCAN I, KAYACAN E. AU-AIR: a multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance[C]// 2020 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2020: 8504-8510. |
| [33] |
ZHANG W, LIU C S, CHANG F L, et al. Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images[J]. Remote Sensing, 2020, 12(11): 1760.
DOI URL |
| [34] |
ZHANG H J, SUN M S, LI Q, et al. An empirical study of multi-scale object detection in high resolution UAV images[J]. Neurocomputing, 2021, 421, 173-182.
DOI URL |
| [35] |
SUN Y M, CAO B, ZHU P F, et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(10): 6700-6713.
DOI URL |
| [36] |
AKSHATHA K R, KARUNAKAR A K, SHENOY B S, et al. Manipal-UAV person detection dataset: a step towards benchmarking dataset and algorithms for small object detection[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195: 77-89.
DOI URL |
| [37] |
AHMED I, AHMAD M, ADNAN A, et al. Person detector for different overhead views using machine learning[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(10): 2657-2668.
DOI |
| [38] |
CAO Z, KOOISTRA L, WANG W S, et al. Real-time object detection based on UAV remote sensing: a systematic literature review[J]. Drones, 2023, 7(10): 620.
DOI URL |
| [39] | SUN Z Q, CAO S C, YANG Y M, et al. Rethinking transformer-based set prediction for object detection[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 3591-3600. |
| [40] | SZELISKI R. Computer vision: algorithms and applications[M]. 2nd ed. New York: Springer, 2022: 30-35. |
| [41] | DEAN J, CORRADO G S, MONGA R, et al. Large scale distributed deep networks[C]// The 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1223-1231. |
| [1] | 董文益, 杨伟东, 唐冰慧, 王琦, 肖宏宇. 基于深度学习的肝脏局灶性病变检测方法综述[J]. 图学学报, 2026, 47(1): 1-16. |
| [2] | 杨彪, 王学, 官铮, 龙萍. BSD-YOLO:基于动态稀疏注意力与自适应检测头的小目标车辆检测方法[J]. 图学学报, 2026, 47(1): 99-110. |
| [3] | 赵振兵, 欧阳文斌, 冯烁, 李浩鹏, 马隽. 基于类内稀疏先验与改进YOLOv8的绝缘子红外图像检测方法[J]. 图学学报, 2025, 46(6): 1247-1256. |
| [4] | 肖凯, 袁玲, 储珺. 基于周期一致性和动态记忆增强的无监督无人机目标跟踪[J]. 图学学报, 2025, 46(6): 1281-1291. |
| [5] | 王海涵. 基于YOLOv8-OSRA的钢拱塔表观病害多目标检测方法[J]. 图学学报, 2025, 46(6): 1327-1336. |
| [6] | 刘伯凯, 殷雪峰, 孙传昱, 葛慧林, 魏子麒, 姜雨彤, 朴海音, 周东生, 杨鑫. 基于深度强化学习的无人机三维场景导航方法研究[J]. 图学学报, 2025, 46(5): 1010-1017. |
| [7] | 翟永杰, 翟邦朝, 胡哲东, 杨珂, 王乾铭, 赵晓瑜. 基于自适应特征融合金字塔与注意力机制的输电线路绝缘子缺陷检测方法[J]. 图学学报, 2025, 46(5): 950-959. |
| [8] | 郭瑞东, 蓝贵文, 范冬林, 钟展, 徐梓睿, 任新月. 基于特征聚焦扩散网络的电力巡检目标检测算法[J]. 图学学报, 2025, 46(4): 719-726. |
| [9] | 胡悦, 孙智达, 黄惠. 面向无人机路径规划的可视分析系统[J]. 图学学报, 2025, 46(3): 655-665. |
| [10] | 王志东, 陈晨阳, 刘晓明. 基于自适应特征提取的通信光缆缺陷检测方法[J]. 图学学报, 2025, 46(2): 241-248. |
| [11] | 张立立, 杨康, 张珂, 魏薇, 李晶, 谭洪鑫, 张翔宇. 面向柴油车辆排放黑烟的改进型YOLOv8检测算法研究[J]. 图学学报, 2025, 46(2): 249-258. |
| [12] | 翟永杰, 王璐瑶, 赵晓瑜, 胡哲东, 王乾铭, 王亚茹. 基于级联查询-位置关系的输电线路多金具检测方法[J]. 图学学报, 2025, 46(2): 288-299. |
| [13] | 赵振兵, 韩钰, 唐辰康. 基于改进YOLOv8的配电线路绝缘子缺陷级联检测方法[J]. 图学学报, 2025, 46(1): 1-12. |
| [14] | 程旭东, 史彩娟, 高炜翔, 王森, 段昌钰, 闫晓东. 面向域自适应目标检测的一致无偏教师模型[J]. 图学学报, 2025, 46(1): 114-125. |
| [15] | 崔克彬, 耿佳昌. 基于EE-YOLOv8s的多场景火灾迹象检测算法[J]. 图学学报, 2025, 46(1): 13-27. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||