基于特征融合与注意力机制的无人机图像小目标检测算法

doi:10.11996/JG.j.2095-302X.2023040658

图学学报 ›› 2023, Vol. 44 ›› Issue (4): 658-666.DOI: 10.11996/JG.j.2095-302X.2023040658

• 图像处理与计算机视觉 • 上一篇下一篇

基于特征融合与注意力机制的无人机图像小目标检测算法

李利霞¹(), 王鑫²^,¹^,³(), 王军³, 张又元⁴

1.桂林电子科技大学计算机与信息安全学院，广西桂林 541010
2.电子科技大学信息与软件工程学院，四川成都 610000
3.桂林电子科技大学海洋工程学院，广西北海 536000
4.兰州交通大学电子与信息工程学院，甘肃兰州 730070

收稿日期:2022-11-18 接受日期:2023-01-18 出版日期:2023-08-31 发布日期:2023-08-16
通讯作者: 王鑫(1976-)，男，教授，博士。主要研究方向为图像处理、网络信息安全、物联网和数据挖掘等。E-mail：304379506@qq.com
作者简介:
李利霞(1995-)，女，硕士研究生。主要研究方向为图像处理和物体识别。E-mail：20032202019@mails.guet.edu.cn
基金资助:
广西科技重大专项(AA19254016);广西硕士研究生创新项目(YCSW2021174);北海市科技规划项目(202082033);北海市科技规划项目(202082023)

Small object detection algorithm in UAV image based on feature fusion and attention mechanism

LI Li-xia¹(), WANG Xin²^,¹^,³(), WANG Jun³, ZHANG You-yuan⁴

1. School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin Guangxi 541010, China
2. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu Sichuan 610000, China
3. School of Marine Engineering, Guilin University of Electronic Technology, Beihai Guangxi 536000, China
4. School of Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou Gansu 730070, China

Received:2022-11-18 Accepted:2023-01-18 Online:2023-08-31 Published:2023-08-16
Contact: WANG Xin (1976-), professor, Ph.D. His main research interests cover image processing, network information security, internet of things, data mining and other research, etc. E-mail：304379506@qq.com
About author:
LI Li-xia (1995-), master student. Her main research interests cover image processing and object recognition. E-mail：20032202019@mails.guet.edu.cn
Supported by:
Guangxi Science and Technology Major Project(AA19254016);Guangxi Graduate Student Innovation Project(YCSW2021174);Beihai City Science and Technology Planning Project(202082033);Beihai City Science and Technology Planning Project(202082023)

摘要/Abstract

摘要：

由于无人机航拍图像目标物体尺寸太小、包含的特征信息少，导致现有的检测算法对小目标的检测效果不理想。针对该问题，在YOLOv5主干网络中融入多头注意力机制，可以有效整合全局特征信息。随着网络深度的不断加深，模型将更关注高层的语义信息，进而忽略对小目标检测至关重要的底层细节纹理特征，以致小目标的检测效果较差。因此，提出浅层特征增强模块来学习底层特征信息，达到增强小目标特征信息的目的。此外，为了加强特征融合的能力，设计了一种多级特征融合模块，将不同层级的特征信息进行聚合，使网络能够动态调节各输出检测层的权重。实验结果表明，该算法在公开数据集VisDrone2021平均均值精度达到45.7%，相比原YOLOv5算法提升了3.1%，对高分辨率图像的检测速度FPS达到41帧/秒，满足实时性，与其他主流算法相比该算法检测精度有明显提升。

关键词: 特征融合, 注意力机制, 无人机航拍图像, 小目标检测, YOLOv5

Abstract:

The task of detecting small objects in UAV aerial images is a formidable challenge due to their diminutive size and insufficient amount of feature information. To surmount this predicament, a multi-head attention mechanism was incorporated into the YOLOv5 backbone network in order to seamlessly integrate global feature information. As the network depth increased, the model tended to accentuate high-level semantic information at the expense of underlying detailed texture features vital for the detection of small objects. To address this issue, a shallow feature enhancement module was devised to acquire underlying feature information and augment small object feature information. Furthermore, a multi-level feature fusion module was developed to amalgamate feature information from different layers, thus enabling the network to dynamically adjust the weights of each output detection layer. Experimental results on the publicly available VisDrone2021 dataset demonstrated that the mean average precision of the proposed algorithm, attained a level of 45.7%, representing a 3.1% enhancement over the baseline YOLOv5 algorithm. Additionally, the proposed algorithm achieved a detection speed of 41 frames per second for high-resolution images, satisfying the requirement for real-time performance and exhibiting a noteworthy improvement in detection accuracy over other prevalent methods.

Key words: feature fusion, attention mechanism, UAV aerial imagery, small object detection, YOLOv5

中图分类号:

TP391

李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666.

LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism[J]. Journal of Graphics, 2023, 44(4): 658-666.

图/表 11

参考文献 20

[1]	江波, 屈若锟, 李彦冬. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. 1-524519. 15.
	JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519. 1-524519. 15. (in Chinese).
[2]	周立旺, 潘天翔, 杨泽曦, 等. 多阶段优化的小目标聚焦检测[J]. 图学学报, 2020, 41(1): 93-99.
	ZHOU L W, PAN T X, YANG Z X, et al. FocusNet: coarse-to-fine small object detection network[J]. Journal of Graphics, 2020, 41(1): 93-99 (in Chinese).
[3]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-05-26]. https://arxiv.org/abs/1804.02767.
[4]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBoxsDetector[C]// The 14th European Conference on Computer Vision. Cham: Springer International Publishin, 2016: 21-37.
[5]	GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448.
[6]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99.
[7]	CAO J, CHOLAKKAL H, ANWER R M, et al. D2Det: towards high quality object detection and instance segmentation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11485-11494.
[8]	ZHAN W, SUN C F, WANG M C, et al. An improved Yolov5 real-time detection method for small objects captured by UAV[J]. Soft Computing, 2022, 26(1): 361-373. DOI
[9]	LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]// 2021 International Conference on Artificial Intelligence in Information and Communication. New York: IEEE Press, 2021: 181-186.
[10]	SONG Z Y, ZHANG Y, LIU Y, et al. MSFYOLO: feature fusion-based detection for small objects[J]. IEEE Latin America Transactions, 2022, 20(5): 823-830. DOI URL
[11]	LIU Y J, YANG F B, HU P. Small-object detection in UAV-captured images via multi-branch parallel feature pyramid networks[J]. IEEE Access, 2020, 8: 145740-145750. DOI URL
[12]	胡俊, 顾晶晶, 王秋红. 基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43(2): 197-204.
	HU J, GU J J, WANG Q H. Multimodal small target detection based on remote sensing image[J]. Journal of Graphics, 2022, 43(2): 197-204 (in Chinese).
[13]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2125.
[14]	LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL]. [2022-05-26]. https://arxiv.org/abs/1805.10180.
[15]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[16]	PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 815-825.
[17]	SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 16514-16524.
[18]	CAO Y R, HE Z J, WANG L J, et al. VisDrone-DET2021: the vision meets drone object detection challenge results[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2847-2854.
[19]	LI C L, YANG T, ZHU S J, et al. Density map guided object detection in aerial images[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 737-746.
[20]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-05-26]. https://arxiv.org/abs/2004.10934.

Model	Params (M)	Depth	Width	GFLOPs	mAP (%)	FPS¹⁵³⁶ (帧/秒)
YOLOv5n	1.777	0.33	0.25	4.3	32.4	81
YOLOv5s	7.037	0.33	0.50	15.8	42.6	56
YOLOv5m	20.889	0.67	0.75	48.0	46.1	32
YOLOv5l	46.157	1.00	1.00	107.8	48.2	19

Model	Params (M)	Depth	Width	GFLOPs	mAP (%)	FPS¹⁵³⁶ (帧/秒)
YOLOv5n	1.777	0.33	0.25	4.3	32.4	81
YOLOv5s	7.037	0.33	0.50	15.8	42.6	56
YOLOv5m	20.889	0.67	0.75	48.0	46.1	32
YOLOv5l	46.157	1.00	1.00	107.8	48.2	19

Model	BT-MHSA	SP	MF	P (%)	R (%)	mAP (%)	Params (M)	FPS¹⁵³⁶ (帧/秒)
YOLOv5s	-	-	-	53.2	43.5	42.6	7.037	56
M1	√	-	-	54.0	44.2	43.5	6.719	58
M2	-	√	-	52.7	45.3	43.7	5.388	60
M3	-	-	√	53.1	45.0	43.5	8.174	47
M4	-	√	√	54.9	44.6	43.9	9.159	44
M5	√	√	-	52.1	45.3	43.6	7.061	54
M6	√	-	√	53.3	46.0	44.4	9.747	43
M7	√	√	√	55.6	46.5	45.7	9.832	41

Model	BT-MHSA	SP	MF	P (%)	R (%)	mAP (%)	Params (M)	FPS¹⁵³⁶ (帧/秒)
YOLOv5s	-	-	-	53.2	43.5	42.6	7.037	56
M1	√	-	-	54.0	44.2	43.5	6.719	58
M2	-	√	-	52.7	45.3	43.7	5.388	60
M3	-	-	√	53.1	45.0	43.5	8.174	47
M4	-	√	√	54.9	44.6	43.9	9.159	44
M5	√	√	-	52.1	45.3	43.6	7.061	54
M6	√	-	√	53.3	46.0	44.4	9.747	43
M7	√	√	√	55.6	46.5	45.7	9.832	41

算法	输入尺寸	目标类别										mAP(%)
算法	输入尺寸	Awn-tr	Bicycle	Bus	Car	Motor	Pedestrian	People	Tricycle	Truck	Van	mAP(%)
Faster R-CNN	640×640	8.73	5.86	43.79	44.16	16.83	12.55	8.10	8.53	30.42	20.45	19.94
YOLOv3	640×640	7.71	6.80	39.36	68.87	21.53	22.54	12.50	8.41	26.41	24.31	23.84
CenterNet	640×640	14.28	7.51	42.66	61.96	18.86	22.94	11.67	13.08	24.74	19.38	23.71
DMNet^[19]	640×640	14.11	8.89	49.23	58.90	29.38	27.67	18.93	20.32	29.30	30.27	28.70
YOLOv4^[20]	640×640	12.39	8.68	48.86	69.21	22.71	26.67	14.48	12.67	29.94	27.19	27.28
SSD	640×640	11.15	7.38	49.82	63.17	19.09	18.71	9.01	11.74	33.10	29.96	25.31
YOLOX	640×640	15.43	9.03	51.80	72.16	29.33	25.44	17.07	16.47	39.21	35.16	31.11
本文算法	640×640	18.20	11.90	57.60	74.80	28.50	32.50	18.80	17.60	39.00	35.60	33.45

基于特征融合与注意力机制的无人机图像小目标检测算法

Small object detection algorithm in UAV image based on feature fusion and attention mechanism

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨陈成 , 董秀成 , 侯兵 , 张党成 , 向贤明 , 冯琪茗 . 基于参考的Transformer纹理迁移深度图像超分辨率重建 [J]. 图学学报, 2023, 44(5): 861-867.
[2]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭 . 基于改进教师学生网络的隧道火灾检测 [J]. 图学学报, 2023, 44(5): 978-987.
[3]	郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676.
[4]	李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.
[5]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
[6]	胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
[7]	毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455.
[8]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
[9]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472.
[10]	李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481.
[11]	刘冰, 叶成绪. 面向不平衡数据的肺部疾病细粒度分类模型[J]. 图学学报, 2023, 44(3): 513-520.
[12]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[13]	吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539.
[14]	陆秋, 邵铧泽, 张云磊. 动态平衡多尺度特征融合的结直肠息肉分割[J]. 图学学报, 2023, 44(2): 225-232.
[15]	谢国波, 贺笛轩, 何宇钦, 林志毅. 基于P-CenterNet的光学遥感图像烟囱检测[J]. 图学学报, 2023, 44(2): 233-240.