Journal of Graphics ›› 2023, Vol. 44 ›› Issue (4): 658-666.DOI: 10.11996/JG.j.2095-302X.2023040658
Previous Articles Next Articles
LI Li-xia1(), WANG Xin2,1,3(
), WANG Jun3, ZHANG You-yuan4
Received:
2022-11-18
Accepted:
2023-01-18
Online:
2023-08-31
Published:
2023-08-16
Contact:
WANG Xin (1976-), professor, Ph.D. His main research interests cover image processing, network information security, internet of things, data mining and other research, etc. E-mail:About author:
LI Li-xia (1995-), master student. Her main research interests cover image processing and object recognition. E-mail:20032202019@mails.guet.edu.cn
Supported by:
CLC Number:
LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism[J]. Journal of Graphics, 2023, 44(4): 658-666.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023040658
Model | Params (M) | Depth | Width | GFLOPs | mAP (%) | FPS1536 (帧/秒) |
---|---|---|---|---|---|---|
YOLOv5n | 1.777 | 0.33 | 0.25 | 4.3 | 32.4 | 81 |
YOLOv5s | 7.037 | 0.33 | 0.50 | 15.8 | 42.6 | 56 |
YOLOv5m | 20.889 | 0.67 | 0.75 | 48.0 | 46.1 | 32 |
YOLOv5l | 46.157 | 1.00 | 1.00 | 107.8 | 48.2 | 19 |
Table 1 Comparison of YOLOv5 model specification results on the test set
Model | Params (M) | Depth | Width | GFLOPs | mAP (%) | FPS1536 (帧/秒) |
---|---|---|---|---|---|---|
YOLOv5n | 1.777 | 0.33 | 0.25 | 4.3 | 32.4 | 81 |
YOLOv5s | 7.037 | 0.33 | 0.50 | 15.8 | 42.6 | 56 |
YOLOv5m | 20.889 | 0.67 | 0.75 | 48.0 | 46.1 | 32 |
YOLOv5l | 46.157 | 1.00 | 1.00 | 107.8 | 48.2 | 19 |
Model | BT-MHSA | SP | MF | P (%) | R (%) | mAP (%) | Params (M) | FPS1536 (帧/秒) |
---|---|---|---|---|---|---|---|---|
YOLOv5s | - | - | - | 53.2 | 43.5 | 42.6 | 7.037 | 56 |
M1 | √ | - | - | 54.0 | 44.2 | 43.5 | 6.719 | 58 |
M2 | - | √ | - | 52.7 | 45.3 | 43.7 | 5.388 | 60 |
M3 | - | - | √ | 53.1 | 45.0 | 43.5 | 8.174 | 47 |
M4 | - | √ | √ | 54.9 | 44.6 | 43.9 | 9.159 | 44 |
M5 | √ | √ | - | 52.1 | 45.3 | 43.6 | 7.061 | 54 |
M6 | √ | - | √ | 53.3 | 46.0 | 44.4 | 9.747 | 43 |
M7 | √ | √ | √ | 55.6 | 46.5 | 45.7 | 9.832 | 41 |
Table 2 Ablation experiment results on the VisDrone2021 test set
Model | BT-MHSA | SP | MF | P (%) | R (%) | mAP (%) | Params (M) | FPS1536 (帧/秒) |
---|---|---|---|---|---|---|---|---|
YOLOv5s | - | - | - | 53.2 | 43.5 | 42.6 | 7.037 | 56 |
M1 | √ | - | - | 54.0 | 44.2 | 43.5 | 6.719 | 58 |
M2 | - | √ | - | 52.7 | 45.3 | 43.7 | 5.388 | 60 |
M3 | - | - | √ | 53.1 | 45.0 | 43.5 | 8.174 | 47 |
M4 | - | √ | √ | 54.9 | 44.6 | 43.9 | 9.159 | 44 |
M5 | √ | √ | - | 52.1 | 45.3 | 43.6 | 7.061 | 54 |
M6 | √ | - | √ | 53.3 | 46.0 | 44.4 | 9.747 | 43 |
M7 | √ | √ | √ | 55.6 | 46.5 | 45.7 | 9.832 | 41 |
算法 | 输入 尺寸 | 目标类别 | mAP(%) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Awn-tr | Bicycle | Bus | Car | Motor | Pedestrian | People | Tricycle | Truck | Van | |||
Faster R-CNN | 640×640 | 8.73 | 5.86 | 43.79 | 44.16 | 16.83 | 12.55 | 8.10 | 8.53 | 30.42 | 20.45 | 19.94 |
YOLOv3 | 640×640 | 7.71 | 6.80 | 39.36 | 68.87 | 21.53 | 22.54 | 12.50 | 8.41 | 26.41 | 24.31 | 23.84 |
CenterNet | 640×640 | 14.28 | 7.51 | 42.66 | 61.96 | 18.86 | 22.94 | 11.67 | 13.08 | 24.74 | 19.38 | 23.71 |
DMNet[ | 640×640 | 14.11 | 8.89 | 49.23 | 58.90 | 29.38 | 27.67 | 18.93 | 20.32 | 29.30 | 30.27 | 28.70 |
YOLOv4[ | 640×640 | 12.39 | 8.68 | 48.86 | 69.21 | 22.71 | 26.67 | 14.48 | 12.67 | 29.94 | 27.19 | 27.28 |
SSD | 640×640 | 11.15 | 7.38 | 49.82 | 63.17 | 19.09 | 18.71 | 9.01 | 11.74 | 33.10 | 29.96 | 25.31 |
YOLOX | 640×640 | 15.43 | 9.03 | 51.80 | 72.16 | 29.33 | 25.44 | 17.07 | 16.47 | 39.21 | 35.16 | 31.11 |
本文算法 | 640×640 | 18.20 | 11.90 | 57.60 | 74.80 | 28.50 | 32.50 | 18.80 | 17.60 | 39.00 | 35.60 | 33.45 |
Table 3 Comparative analysis of different algorithms on the VisDrone test set
算法 | 输入 尺寸 | 目标类别 | mAP(%) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Awn-tr | Bicycle | Bus | Car | Motor | Pedestrian | People | Tricycle | Truck | Van | |||
Faster R-CNN | 640×640 | 8.73 | 5.86 | 43.79 | 44.16 | 16.83 | 12.55 | 8.10 | 8.53 | 30.42 | 20.45 | 19.94 |
YOLOv3 | 640×640 | 7.71 | 6.80 | 39.36 | 68.87 | 21.53 | 22.54 | 12.50 | 8.41 | 26.41 | 24.31 | 23.84 |
CenterNet | 640×640 | 14.28 | 7.51 | 42.66 | 61.96 | 18.86 | 22.94 | 11.67 | 13.08 | 24.74 | 19.38 | 23.71 |
DMNet[ | 640×640 | 14.11 | 8.89 | 49.23 | 58.90 | 29.38 | 27.67 | 18.93 | 20.32 | 29.30 | 30.27 | 28.70 |
YOLOv4[ | 640×640 | 12.39 | 8.68 | 48.86 | 69.21 | 22.71 | 26.67 | 14.48 | 12.67 | 29.94 | 27.19 | 27.28 |
SSD | 640×640 | 11.15 | 7.38 | 49.82 | 63.17 | 19.09 | 18.71 | 9.01 | 11.74 | 33.10 | 29.96 | 25.31 |
YOLOX | 640×640 | 15.43 | 9.03 | 51.80 | 72.16 | 29.33 | 25.44 | 17.07 | 16.47 | 39.21 | 35.16 | 31.11 |
本文算法 | 640×640 | 18.20 | 11.90 | 57.60 | 74.80 | 28.50 | 32.50 | 18.80 | 17.60 | 39.00 | 35.60 | 33.45 |
[1] | 江波, 屈若锟, 李彦冬. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. 1-524519. 15. |
JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519. 1-524519. 15. (in Chinese). | |
[2] | 周立旺, 潘天翔, 杨泽曦, 等. 多阶段优化的小目标聚焦检测[J]. 图学学报, 2020, 41(1): 93-99. |
ZHOU L W, PAN T X, YANG Z X, et al. FocusNet: coarse-to-fine small object detection network[J]. Journal of Graphics, 2020, 41(1): 93-99 (in Chinese). | |
[3] | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-05-26]. https://arxiv.org/abs/1804.02767. |
[4] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBoxsDetector[C]// The 14th European Conference on Computer Vision. Cham: Springer International Publishin, 2016: 21-37. |
[5] | GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448. |
[6] | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
[7] | CAO J, CHOLAKKAL H, ANWER R M, et al. D2Det: towards high quality object detection and instance segmentation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11485-11494. |
[8] |
ZHAN W, SUN C F, WANG M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Computing, 2022, 26(1): 361-373.
DOI |
[9] | LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]// 2021 International Conference on Artificial Intelligence in Information and Communication. New York: IEEE Press, 2021: 181-186. |
[10] |
SONG Z Y, ZHANG Y, LIU Y, et al. MSFYOLO: feature fusion-based detection for small objects[J]. IEEE Latin America Transactions, 2022, 20(5): 823-830.
DOI URL |
[11] |
LIU Y J, YANG F B, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks[J]. IEEE Access, 2020, 8: 145740-145750.
DOI URL |
[12] | 胡俊, 顾晶晶, 王秋红. 基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43(2): 197-204. |
HU J, GU J J, WANG Q H. Multimodal small target detection based on remote sensing image[J]. Journal of Graphics, 2022, 43(2): 197-204 (in Chinese). | |
[13] | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2125. |
[14] | LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL]. [2022-05-26]. https://arxiv.org/abs/1805.10180. |
[15] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[16] | PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 815-825. |
[17] | SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 16514-16524. |
[18] | CAO Y R, HE Z J, WANG L J, et al. VisDrone-DET2021: the vision meets drone object detection challenge results[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2847-2854. |
[19] | LI C L, YANG T, ZHU S J, et al. Density map guided object detection in aerial images[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 737-746. |
[20] | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-05-26]. https://arxiv.org/abs/2004.10934. |
[1] |
YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming.
Reference based transformer texture migrates depth imagessuper resolution reconstruction
[J]. Journal of Graphics, 2023, 44(5): 861-867.
|
[2] |
SONG Huan-sheng, WEN Ya, SUN Shi-jie, SONG Xiang-yu, ZHANG Chao-yang, LI Xu.
Tunnel fire detection based on improved student-teacher network
[J]. Journal of Graphics, 2023, 44(5): 978-987.
|
[3] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. |
[4] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[5] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[6] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[7] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[8] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[9] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[10] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[11] | LIU Bing, YE Cheng-xu. Fine-grained classification model of lung disease for imbalanced data [J]. Journal of Graphics, 2023, 44(3): 513-520. |
[12] | SHI Cai-juan, SHI Ze, YAN Jin-wei, BI Yang-yang. Bi-directionally aligned VAE based on double semantics for generalized zero-shot learning [J]. Journal of Graphics, 2023, 44(3): 521-530. |
[13] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. |
[14] | LU Qiu, SHAO Hua-ze, ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. |
[15] | XIE Guo-bo, HE Di-xuan, HE Yu-qin, LIN Zhi-yi. P-CenterNet for chimney detection in optical remote-sensing images [J]. Journal of Graphics, 2023, 44(2): 233-240. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||