Improving YOLOv7 remote sensing image target detection algorithm

doi:10.11996/JG.j.2095-302X.2024040650

Abstract

Abstract:

In response to the problem of low detection accuracy caused by significant object scale variations and complex backgrounds in remote sensing images, an improved YOLOv7 object detection algorithm was designed. Firstly, in order to alleviate the interference of complex backgrounds on the detector, an attention-guided efficient layer aggregation network (ALAN) was designed to optimize the multi-path network to focus more on foreground objects, thereby reducing the impact of background. Secondly, in order to reduce the impact of significant object scale variations on detection accuracy, an attention multi-scale feature enhancement (AMSFE) module was designed to expand the receptive field of the backbone network output features, enhancing the network’s feature representation ability for objects with substantial scale variations. Finally, a rotating bounding box loss function was introduced to obtain precise location information of objects in any orientation. The experimental results on the DIOR-R dataset demonstrated that the proposed algorithm achieved a mean average precision (mAP) of 64.51%, an improvement of 3.43% over the baseline original YOLOv7 algorithm. Furthermore, it outperformd other similar algorithms and was capable of handling object detection tasks in remote sensing images with multi-scale and complex backgrounds.

Key words: remote sensing, object detection, feature enhancement, attention mechanism, YOLOv7

CLC Number:

TP391
TP751

LI Daxiang, JI Zhan, LIU Ying, TANG Yao. Improving YOLOv7 remote sensing image target detection algorithm[J]. Journal of Graphics, 2024, 45(4): 650-658.

Figures/Tables 12

References 33

[1]	李德仁, 王密, 沈欣, 等. 从对地观测卫星到对地观测脑[J]. 武汉大学学报: 信息科学版, 2017, 42(2): 143-149.
	LI D R, WANG M, SHEN X, et al. From earth observation satellite to earth observation brain[J]. Geomatics and Information Science of Wuhan University, 2017, 42(2): 143-149 (in Chinese).
[2]	YUAN Q Q, SHEN H F, LI T W, et al. Deep learning in environmental remote sensing: achievements and challenges[J]. Remote Sensing of Environment, 2020, 241: 111716.
[3]	CHENG G, ZHOU P C, YAO X W, et al. Object detection in VHR optical remote sensing images via learning rotation-invariant HOG feature[C]// 2016 4th International Workshop on Earth Observation and Remote Sensing Applications. New York: IEEE Press, 2016: 433-436.
[4]	QI S X, MA J, LIN J, et al. Unsupervised ship detection based on saliency and S-HOG descriptor from optical satellite images[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(7): 1451-1455.
[5]	WU X, HONG D F, TIAN J J, et al. ORSIm detector: a novel object detection framework in optical remote sensing imagery using spatial-frequency channel features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(7): 5146-5158.
[6]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[8]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[9]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[10]	MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122.
[11]	YANG X, YAN J C, FENG Z M, et al. R3Det: refined single-stage detector with feature refinement for rotating object[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(4): 3163-3171.
[12]	HAN J M, DING J, XUE N, et al. ReDet: a rotation- equivariant detector for aerial object detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2785-2794.
[13]	YANG X, YAN J C, MING Q, et al. Rethinking rotated object detection with Gaussian Wasserstein distance loss[EB/OL]. [2023-04-08]. http://arxiv.org/abs/2101.11952.
[14]	YANG X, YANG X J, YANG J R, et al. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence[EB/OL]. [2023-04-08]. http://arxiv.org/abs/2106.01883.
[15]	YANG X, YAN J C. Arbitrary-oriented object detection with circular smooth label[M]//Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 677-694.
[16]	YANG X, HOU L P, ZHOU Y, et al. Dense label encoding for boundary discontinuity free rotation detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 15814-15824.
[17]	毛爱坤, 刘昕明, 陈文壮, 等. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455. DOI
	MAO A K, LIU X M, CHEN W Z, et al. Improved substation instrument target detection method for YOLOv5 algorithm[J]. Journal of Graphics, 2023, 44(3): 448-455 (in Chinese).
[18]	东辉, 陈鑫凯, 孙浩, 等. 基于改进YOLOv4和图像处理的蔬菜田杂草检测[J]. 图学学报, 2022, 43(4): 559-569.
	DONG H, CHEN X K, SUN H, et al. Weed detection in vegetable field based on improved YOLOv4 and image processing[J]. Journal of Graphics, 2022, 43(4): 559-569 (in Chinese).
[19]	LIU L K, LIU Y X, YAN J N, et al. Object detection in large-scale remote sensing images with a distributed deep learning framework[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 8142-8154.
[20]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 7464-7475.
[21]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13728-13737.
[22]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8759-8768.
[23]	YANG Y T, JIAO L C, LIU X, et al. Dual wavelet attention networks for image classification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(4): 1899-1910.
[24]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
[25]	QIN Z Q, ZHANG P Y, WU F, et al. FcaNet: frequency channel attention networks[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 763-772.
[26]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.
[27]	ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[C]// Computer Vision - ACCV 2022: 16th Asian Conference on Computer Vision. New York: ACM, 2022: 541-557.
[28]	LI G Q, FANG Q, ZHA L L, et al. HAM: hybrid attention module in deep convolutional neural networks for image classification[J]. Pattern Recognition, 2022, 129: 108785.
[29]	XU Y C, FU M T, WANG Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452-1459.
[30]	CHENG G, WANG J B, LI K, et al. Anchor-free oriented proposal generator for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625411.
[31]	YAO Y Q, CHENG G, WANG G X, et al. On improving bounding box representations for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 61: 5600111.
[32]	LI W T, CHEN Y J, HU K X, et al. Oriented RepPoints for aerial object detection[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 1819-1828.
[33]	MA Y C, LIU S T, LI Z M, et al. IQDet: instance-wise quality distribution sampling for object detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1717-1725.

方法	APL	APO	BF	BC	BR	CH	DAM	ETS	ESA	GF	GTF
基线	72.48	30.00	81.03	81.56	33.78	72.66	20.11	72.23	79.96	49.88	74.93
+AM	81.45	37.05	81.16	81.60	40.31	72.71	25.20	72.28	80.82	67.32	76.07
+AL	81.57	30.07	81.03	81.48	34.07	72.60	18.93	72.04	80.46	60.84	75.28
Ours	80.48	29.44	80.54	89.06	34.75	75.13	28.48	74.47	81.13	67.06	76.59
方法	HA	OP	SH	STA	STO	TC	TS	VE	WM	mAP
基线	40.57	51.79	81.17	62.25	62.77	81.56	48.25	51.91	72.78	61.08
+AM	41.95	52.68	81.22	62.90	62.05	90.19	50.66	44.48	73.83	63.80
+AL	41.42	52.47	81.10	62.66	62.09	81.54	49.19	51.89	73.47	62.21
Ours	40.61	52.77	88.24	67.67	66.82	88.83	46.95	48.79	72.44	64.51

方法	APL	APO	BF	BC	BR	CH	DAM	ETS	ESA	GF	GTF
基线	72.48	30.00	81.03	81.56	33.78	72.66	20.11	72.23	79.96	49.88	74.93
+AM	81.45	37.05	81.16	81.60	40.31	72.71	25.20	72.28	80.82	67.32	76.07
+AL	81.57	30.07	81.03	81.48	34.07	72.60	18.93	72.04	80.46	60.84	75.28
Ours	80.48	29.44	80.54	89.06	34.75	75.13	28.48	74.47	81.13	67.06	76.59
方法	HA	OP	SH	STA	STO	TC	TS	VE	WM	mAP
基线	40.57	51.79	81.17	62.25	62.77	81.56	48.25	51.91	72.78	61.08
+AM	41.95	52.68	81.22	62.90	62.05	90.19	50.66	44.48	73.83	63.80
+AL	41.42	52.47	81.10	62.66	62.09	81.54	49.19	51.89	73.47	62.21
Ours	40.61	52.77	88.24	67.67	66.82	88.83	46.95	48.79	72.44	64.51

方法	AMSFE	ALAN	Params/M	FLOPs/G
基线	-	-	37.30	164.78
	√	-	40.95	183.32
	-	√	37.53	171.32
	√	√	41.18	189.85

方法	AMSFE	ALAN	Params/M	FLOPs/G
基线	-	-	37.30	164.78
	√	-	40.95	183.32
	-	√	37.53	171.32
	√	√	41.18	189.85

Methods	APL	APO	BF	BC	BR	CH	DAM	ETS	ESA	GF	GTF
Fsater rc-O^[6]	62.79	26.80	71.72	80.91	34.20	72.57	18.95	66.45	65.75	66.63	79.24
RetinaNet-O^[9]	61.49	28.52	73.57	81.17	23.98	72.54	19.94	72.39	58.20	69.25	79.54
Gliding V.^[29]	65.35	28.87	74.96	81.33	33.88	74.31	19.58	70.72	64.70	72.30	78.68
RoI Trans.^[12]	63.34	37.88	71.78	87.53	40.68	72.60	26.86	78.71	68.09	68.96	82.74
AOPG^[30]	62.39	37.79	71.62	87.63	40.90	72.47	31.08	65.42	77.99	73.20	81.94
QPDet^[31]	63.22	41.39	71.97	88.55	41.23	72.63	28.82	78.90	69.00	70.07	83.01
Ours	80.48	29.44	80.54	89.06	34.75	75.13	28.48	74.47	81.13	67.06	76.59
Fsater rc-O^[6]	HA	OP	SH	STA	STO	TC	TS	VE	WM	mAP
RetinaNet-O^[9]	34.95	48.79	81.14	64.34	71.21	81.44	47.31	50.46	65.21	59.54
Gliding V.^[29]	32.14	44.87	77.71	67.57	61.09	81.46	47.33	38.01	60.24	57.55
RoI Trans.^[12]	37.22	49.64	80.22	69.26	61.13	81.49	44.76	47.71	65.04	60.06
AOPG^[30]	47.71	55.61	81.21	78.23	70.26	81.61	54.86	43.27	65.52	63.87
QPDet^[31]	42.32	54.45	81.17	72.69	71.31	81.49	60.04	52.38	69.99	64.41
Ours	47.83	55.54	81.23	72.15	62.66	89.05	58.09	43.38	65.36	64.20
Fsater rc-O^[6]	40.61	52.77	88.24	67.67	66.82	88.83	46.95	48.79	72.44	64.51