二阶段锚框和类均衡损失的遥感图像目标检测

doi:10.11996/JG.j.2095-302X.2023020249

图学学报 ›› 2023, Vol. 44 ›› Issue (2): 249-259.DOI: 10.11996/JG.j.2095-302X.2023020249

• 图像处理与计算机视觉 • 上一篇下一篇

二阶段锚框和类均衡损失的遥感图像目标检测

曾伦杰¹(), 储珺¹^,²(), 陈昭俊²

1.南昌航空大学信息工程学院，江西南昌 330063
2.南昌航空大学软件学院，江西南昌 330063

收稿日期:2022-08-12 接受日期:2022-10-10 出版日期:2023-04-30 发布日期:2023-05-01
通讯作者: 储珺(1967-)，女，教授，博士。主要研究方向为复杂场景的目标检测和跟踪。E-mail：chuj@nchu.edu.cn
作者简介:曾伦杰(1997-)，男，硕士研究生。主要研究方向为深度学习与目标检测。E-mail：13576563600@163.com
基金资助:
国家自然科学基金项目(62162045);江西省重点研发计划项目(20192BBE50073)

Object detection in remote sensing image based on two-stage anchor and class balanced loss

ZENG Lun-jie¹(), CHU Jun¹^,²(), CHEN Zhao-jun²

1. School of Information Engineering, Nanchang Hangkong University, Nanchang Jiangxi 330063, China
2. School of Software Engineering, Nanchang Hangkong University, Nanchang Jiangxi 330063, China

Received:2022-08-12 Accepted:2022-10-10 Online:2023-04-30 Published:2023-05-01
Contact: CHU Jun (1967-), professor, Ph.D. Her main research interests cover object detection and tracking in complex scenes. E-mail：chuj@nchu.edu.cn
About author:ZENG Lun-jie (1997-), master student. His main research interests cover deep learning and object detection. E-mail：13576563600@163.com
Supported by:
National Natural Science Foundation of China(62162045);Jiangxi Province Key R&D Program Project(20192BBE50073)

摘要/Abstract

摘要：

由于现有遥感图像数据集中不同类别目标的数量差异大，数据集中存在类别分布不平衡问题，影响网络模型对少数类别的检测精度。针对以上问题，提出了二阶段锚框和类均衡损失的遥感图像目标检测算法。通过K-means聚类生成遥感数据集的类平衡标签，再将得到的类平衡标签作为第二阶段K-means聚类的初始中心，生成的预设锚框能够兼顾少数类别尺度，提高少数类别实例的检测精度。同时构建类别平衡损失(CEQL)，在平衡损失(EQL)的基础上，采用有效样本构建辅助权重，提高模型在训练过程中对少数类别的关注度。实验表明，改进后模型的平均准确率均值、少数类别平均准确率分别达到76.13%和76.51%，对比基准网络分别提高了1.56%和1.75%。在DOIR和NWPU VHR-10数据集上，与主流方法Faster-RCNN，RetinaNet，CenterNet，YOLOv4，YOLOX-L，YOLOv5及YOLOv7等进行了对比，实验表明改进后的算法能够在保证多数类别检测精度的基础上，有效提高了少数类别的检测精度。

关键词: 遥感检测, 类不平衡, 重加权, K-means, YOLOv4

Abstract:

Due to the large differences in the number of different categories of targets in the existing remote sensing image data collection, the distribution of categories in the dataset is unbalanced, affecting the detection accuracy of network models for a few categories. In light of the aforementioned challenges, a two-stage anchor frame and class-balanced loss target detection algorithm for remote sensing images was presented. The class balance labels of remote sensing datasets were generated by K-means clustering, subsequently utilized as the initial center for the second stage of K-means clustering. The resulting preset anchor frames were able to take into account a few class scales and improve the detection accuracy of a few class instances. At the same time, class equalization loss (CEQL) was constructed. Based on equalization loss (EQL), effective samples were used to construct auxiliary weights to improve the model′s attention to a few categories during training. The experimental results demonstrated that the improved model achieved an average accuracy of 76.13% and a few categories′ average accuracy of 76.51%, increasing by 1.56% and 1.75%, respectively, compared with the datum network. When evaluated on the DOIR and NWPU VHR-10 datasets, and compared with the main methods such as Faster-RCNN, RetinaNet, CenterNet, YOLOv4, YOLOX-L, YOLOv5, and YOLOv7, the experiment showed that the improved algorithm could effectively improve the detection accuracy of a few categories while maintaining the detection accuracy of most categories.

Key words: remote sensing detection, class imbalance, re-weighting, K-means, YOLOv4

中图分类号:

曾伦杰, 储珺, 陈昭俊. 二阶段锚框和类均衡损失的遥感图像目标检测[J]. 图学学报, 2023, 44(2): 249-259.

ZENG Lun-jie, CHU Jun, CHEN Zhao-jun. Object detection in remote sensing image based on two-stage anchor and class balanced loss[J]. Journal of Graphics, 2023, 44(2): 249-259.

图/表 14

参考文献 33

[1]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI PMID
[2]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788.
[3]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[4]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9626-9635.
[5]	DUAN K W, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6568-6577.
[6]	胡俊, 顾晶晶, 王秋红. 基于遥感图像的多模态小目标检测[J]. 图学学报, 2022, 43(2): 197-204.
	HU J, GU J J, WANG Q H. Multimodal small target detection based on remote sensing image[J]. Journal of Graphics, 2022, 43(2): 197-204. (in Chinese)
[7]	张燕, 高鑫, 刘以, 等. 基于改进像素相关性模型的图像分割算法[J]. 图学学报, 2022, 43(2): 205-213.
	ZHANG Y, GAO X, LIU Y, et al. Image segmentation algorithm based on improved pixel correlation model[J]. Journal of Graphics, 2022, 43(2): 205-213. (in Chinese)
[8]	LIU M J, WANG X H, ZHOU A J, et al. UAV-YOLO: small object detection on unmanned aerial vehicle perspective[J]. Sensors: Basel, Switzerland, 2020, 20(8): 2238.
[9]	YANG X, SUN H, SUN X, et al. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network[J]. IEEE Access, 2018, 6: 50839-50849. DOI URL
[10]	YANG X, YANG J R, YAN J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 8231-8240.
[11]	XIA G S, BAI X, DING J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 3974-3983.
[12]	LI K, WAN G, CHENG G, et al. Object detection in optical remote sensing images: a survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296-307. DOI URL
[13]	CHENG G, ZHOU P C, HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7405-7415. DOI URL
[14]	BOTÍA J A, VANDROVCOVA J, FORABOSCO P, et al. An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks[J]. BMC Systems Biology, 2017, 11(1): 47. DOI PMID
[15]	ZHANG Y F, KANG B Y, HOOI B, et al. Deep long-tailed learning: a survey[EB/OL]. [2022-07-09]. https://arxiv.org/abs/2110.04596.
[16]	KIM J, JEONG J, SHIN J. M2m: imbalanced classification via major-to-minor translation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 13893-13902.
[17]	TAN C Q, SUN F C, KONG T, et al. A survey on deep transfer learning[M]//Artificial Neural Networks and Machine Learning - ICANN 2018. Cham: Springer International Publishing, 2018: 270-279.
[18]	YIN X, YU X, SOHN K, et al. Feature transfer learning for face recognition with under-represented data[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5697-5706.
[19]	ZHOU B Y, CUI Q, WEI X S, et al. BBN: bilateral-branch network with cumulative learning for long-tailed visual recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9716-9725.
[20]	CAI J R, WANG Y Z, HWANG J N. ACE: ally complementary experts for solving long-tailed recognition in one-shot[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 112-121.
[21]	HUANG C, LI Y N, LOY C C, et al. Learning deep representation for imbalanced classification[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 5375-5384.
[22]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI URL
[23]	TAN J R, WANG C B, LI B Y, et al. Equalization loss for long-tailed object recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 11659-11668.
[24]	CUI Y, JIA M L, LIN T Y, et al. Class-balanced loss based on effective number of samples[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9260-9269.
[25]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-07-09]. https://arxiv.org/abs/2004.10934.
[26]	ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[EB/OL]. [2022-07-09]. https://arxiv.org/abs/2104.11892.
[27]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6517-6525.
[28]	邓聪颖, 叶波, 苗建国, 等. 基于K-means++聚类与概率神经网络的数控机床变位姿动态特性模糊评估[J]. 仪器仪表学报, 2020, 41(12): 227-235.
	DENG C Y, YE B, MIAO J G, et al. Fuzzy evaluation of machine tool dynamic characteristics for changing machining position based on K-means + + clustering and probabilistic neural network[J]. Chinese Journal of Scientific Instrument, 2020, 41(12): 227-235. (in Chinese)
[29]	LI Y, WANG T, KANG B Y, et al. Overcoming classifier imbalance for long-tail object detection with balanced group softmax[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10988-10997.
[30]	GUPTA A, DOLLÁR P, GIRSHICK R. LVIS: a dataset for large vocabulary instance segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5351-5359.
[31]	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2022-07-09]. https://arxiv.org/abs/2107.08430.
[32]	ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops. New York: IEEE Press, 2021: 2778-2788.
[33]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. [2022-07-06]. https://arxiv.org/abs/2207.02696.

方法	AP_r	AP_c	AP_f	mAP	FPS
YOLOv4	74.76	83.17	67.85	74.57	45.73
+ K-means	73.92	81.75	66.45	73.58	46.39
+ K-means++	74.73	82.65	67.75	74.47	45.94
+TK-means (Ours)	75.65	84.62	68.09	75.41	45.67

方法	AP_r	AP_c	AP_f	mAP	FPS
YOLOv4	74.76	83.17	67.85	74.57	45.73
+ K-means	73.92	81.75	66.45	73.58	46.39
+ K-means++	74.73	82.65	67.75	74.47	45.94
+TK-means (Ours)	75.65	84.62	68.09	75.41	45.67

方法	AP_r	AP_c	AP_f	mAP	FPS
YOLOv4	74.76	83.17	67.86	74.57	45.73
+ Focal Loss	73.31	82.88	67.30	73.38	45.99
+ EQL	57.82	83.85	67.43	61.87	45.33
+ CB Loss	11.17	35.26	41.16	18.08	46.45
+ CEQL (Ours)	75.99	84.85	68.27	75.72	45.78

方法	AP_r	AP_c	AP_f	mAP	FPS
YOLOv4	74.76	83.17	67.86	74.57	45.73
+ Focal Loss	73.31	82.88	67.30	73.38	45.99
+ EQL	57.82	83.85	67.43	61.87	45.33
+ CB Loss	11.17	35.26	41.16	18.08	46.45
+ CEQL (Ours)	75.99	84.85	68.27	75.72	45.78

数据集	TK-means	CEQL	AP_r	AP_c	AP_f	mAP
DIOR	-	-	74.76	83.17	67.85	74.57
	√	-	75.65	84.62	68.09	75.41
	-	√	75.99	84.85	68.27	75.72
	√	√	76.51	84.53	68.62	76.13
NWPU VHR-10	-	-	82.75	89.19	94.55	89.94
	√	-	84.68	91.62	94.84	91.15
	-	√	85.06	92.11	95.12	91.50
	√	√	85.72	93.05	95.04	91.85

二阶段锚框和类均衡损失的遥感图像目标检测

Object detection in remote sensing image based on two-stage anchor and class balanced loss

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 33

相关文章 11

编辑推荐

Metrics

本文评价

类别	方法
类别	Faster-RCNN^[1]	CenterNet^[5]	RetinaNet^[22]	YOLOv4^[25]	YOLOX-L^[31]	YOLOv5-L^[32]	YOLOv7-L^[33]	Ours
桥梁	62.25	76.98	68.79	59.07	61.88	67.74	67.18	62.01
篮球场	86.31	78.10	90.60	90.74	96.80	59.02	89.77	95.47
田径场	94.44	97.74	96.27	98.45	94.66	95.45	96.33	99.68
港口	89.63	93.70	93.76	91.15	96.07	95.33	91.08	96.66
船舶	60.47	78.05	77.84	87.23	86.70	80.08	86.51	89.44
棒球场	92.78	99.13	99.23	98.25	98.71	98.39	98.78	98.43
网球场	80.86	79.21	90.95	97.51	92.83	83.28	86.02	95.76
车辆	50.91	67.08	70.19	91.80	88.80	79.62	82.55	92.62
贮罐	59.46	76.11	76.80	89.90	96.62	90.27	92.30	93.09
飞机	96.99	98.41	99.65	95.27	100.00	99.91	99.82	95.30
AP_f	76.20	83.99	87.36	94.55	95.39	90.29	91.89	95.04
AP_c	75.05	85.88	85.80	89.19	91.39	87.71	88.80	93.05
AP_r	81.00	84.27	85.22	82.75	84.45	74.07	84.43	85.72
mAP	77.41	84.45	86.41	89.94	91.31	84.91	89.03	91.85
FPS	18.62	38.56	37.48	46.84	48.73	46.16	43.13	46.47

[1]	胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
[2]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
[3]	康厚良 , 杨玉婷 . 东巴象形文字文档图像的文本行自动分割算法研究[J]. 图学学报, 2022, 43(5): 865-874.
[4]	东辉, 陈鑫凯, 孙浩, 姚立纲. 基于改进 YOLOv4 和图像处理的蔬菜田杂草检测[J]. 图学学报, 2022, 43(4): 559-569.
[5]	陈昭俊, 储珺, 曾伦杰. 基于动态加权类别平衡损失的多类别口罩佩戴检测[J]. 图学学报, 2022, 43(4): 590-598.
[6]	范新南, 黄伟盛, 史朋飞, 辛元雪, 朱凤婷, 周润康. 基于改进 YOLOv4 的嵌入式变电站仪表检测算法[J]. 图学学报, 2022, 43(3): 396-403.
[7]	李玉珍, 陈辉, 王杰, 荣文 . 基于透视降采样和神经网络的地面标志检测[J]. 图学学报, 2022, 43(2): 288-295.
[8]	李妮妮, 王夏黎, 付阳阳, 郑凤仙, 何丹丹, 袁绍欣. 一种优化 YOLO 模型的交通警察目标检测方法[J]. 图学学报, 2022, 43(2): 296-305.
[9]	蒋镕圻, 彭月平, 谢文宣, 谢郭蓉. 嵌入 scSE 模块的改进 YOLOv4 小目标检测算法[J]. 图学学报, 2021, 42(4): 546-555.
[10]	叶泽聪 , 高志强 , 崔翛龙 , 蒋镕圻 , . 基于模型压缩的 YOLOV3 实时枪支识别方法[J]. 图学学报, 2021, 42(2): 198-205.
[11]	段丽丽，原达，能昌信. 基于DTW 距离的探地雷达数据可视化[J]. 图学学报, 2015, 36(2): 152-158.

类别	方法
类别	Faster-RCNN^[1]	CenterNet^[5]	RetinaNet^[22]	YOLOv4^[25]	YOLOX-L^[31]	YOLOv5-L^[32]	YOLOv7-L^[33]	Ours
火车站	60.81	57.07	55.20	67.56	68.40	62.97	62.78	71.82
大坝	61.99	59.19	62.40	71.97	71.25	67.79	76.80	74.59
高尔夫球场	82.38	78.27	78.60	79.52	82.03	83.97	82.56	82.49
体育场	76.12	54.53	68.40	62.56	66.97	58.01	65.11	66.73
收费站	53.19	54.15	62.80	74.58	79.36	69.01	63.60	78.01
机场	82.85	79.35	77.00	85.25	85.53	87.33	88.24	87.48
烟囱	76.35	74.11	73.20	77.99	79.53	82.51	82.77	78.27
服务区	74.09	69.24	78.60	87.88	87.94	90.07	88.57	88.96
田径场	68.35	70.93	76.60	82.73	82.03	82.66	81.93	83.37
立交桥	55.79	53.94	59.60	62.56	63.02	60.51	61.66	63.34
篮球场	87.24	86.08	85.00	88.89	87.57	87.11	89.20	89.49
桥梁	30.45	32.43	44.10	48.22	49.24	47.50	45.65	49.39
风车	49.08	74.48	85.50	85.46	86.23	84.78	83.85	86.96
港口	53.30	49.39	49.90	63.06	64.97	63.90	63.86	64.25
棒球场	73.38	77.24	69.30	83.19	83.98	76.77	74.44	82.54
飞机	52.45	68.76	53.30	76.60	79.13	80.41	77.63	79.00
网球场	77.42	84.27	81.30	89.74	89.55	90.39	91.32	90.05
贮罐	24.33	46.85	45.80	68.32	69.15	65.61	72.14	69.71
车辆	12.14	34.03	44.40	49.18	50.56	45.60	46.10	49.67
船舶	16.08	57.07	71.10	86.04	87.45	83.80	87.89	86.49
AP_f	17.52	45.98	53.77	67.85	69.05	65.00	68.71	68.62
AP_c	64.94	76.52	67.30	83.17	84.34	85.40	84.48	84.53
AP_r	65.59	64.69	68.41	74.76	75.87	73.66	74.07	76.51
mAP	58.39	63.07	66.11	74.57	75.69	73.54	74.31	76.13
FPS	17.40	37.43	35.86	45.73	46.52	45.62	42.59	45.39

类别	方法
类别	Faster-RCNN^[1]	CenterNet^[5]	RetinaNet^[22]	YOLOv4^[25]	YOLOX-L^[31]	YOLOv5-L^[32]	YOLOv7-L^[33]	Ours
火车站	60.81	57.07	55.20	67.56	68.40	62.97	62.78	71.82
大坝	61.99	59.19	62.40	71.97	71.25	67.79	76.80	74.59
高尔夫球场	82.38	78.27	78.60	79.52	82.03	83.97	82.56	82.49
体育场	76.12	54.53	68.40	62.56	66.97	58.01	65.11	66.73
收费站	53.19	54.15	62.80	74.58	79.36	69.01	63.60	78.01
机场	82.85	79.35	77.00	85.25	85.53	87.33	88.24	87.48
烟囱	76.35	74.11	73.20	77.99	79.53	82.51	82.77	78.27
服务区	74.09	69.24	78.60	87.88	87.94	90.07	88.57	88.96
田径场	68.35	70.93	76.60	82.73	82.03	82.66	81.93	83.37
立交桥	55.79	53.94	59.60	62.56	63.02	60.51	61.66	63.34
篮球场	87.24	86.08	85.00	88.89	87.57	87.11	89.20	89.49
桥梁	30.45	32.43	44.10	48.22	49.24	47.50	45.65	49.39
风车	49.08	74.48	85.50	85.46	86.23	84.78	83.85	86.96
港口	53.30	49.39	49.90	63.06	64.97	63.90	63.86	64.25
棒球场	73.38	77.24	69.30	83.19	83.98	76.77	74.44	82.54
飞机	52.45	68.76	53.30	76.60	79.13	80.41	77.63	79.00
网球场	77.42	84.27	81.30	89.74	89.55	90.39	91.32	90.05
贮罐	24.33	46.85	45.80	68.32	69.15	65.61	72.14	69.71
车辆	12.14	34.03	44.40	49.18	50.56	45.60	46.10	49.67
船舶	16.08	57.07	71.10	86.04	87.45	83.80	87.89	86.49
AP_f	17.52	45.98	53.77	67.85	69.05	65.00	68.71	68.62
AP_c	64.94	76.52	67.30	83.17	84.34	85.40	84.48	84.53
AP_r	65.59	64.69	68.41	74.76	75.87	73.66	74.07	76.51
mAP	58.39	63.07	66.11	74.57	75.69	73.54	74.31	76.13
FPS	17.40	37.43	35.86	45.73	46.52	45.62	42.59	45.39