X-ray image rotating object detection based on improved YOLOv7

doi:10.11996/JG.j.2095-302X.2023020324

Abstract

Abstract:

For prohibited items in X-ray images, an algorithm for the detection of rotating targets based on the improved YOLOv7 was proposed to address the challenges of accurate identification and localization, as well as the neglection of the directionality of the items. Firstly, an efficient attention network module was integrated into the original network to enhance the ability of the model to extract deep important features. Then, the feature fusion path of the extended efficient long-range attention network (E-ELAN) was improved, and the residual structure jump connection and 1×1 convolution were added between modules, allowing the network to extract richer item features. Finally, to tackle the problem of arbitrary placement direction of prohibited items in X-ray images, the angles were discretized using the dense coded label representation method, thereby improving the positioning accuracy of prohibited items. The experimental results revealed that the improved algorithm could achieve a detection accuracy of 91.2%, 92.6%, and 66.4% on HiXray, OPIXray, and PIDray datasets, respectively. Compared with the original YOLOv7 model, the results were improved by 20.2%, 10.6%, and 15.5%, respectively. The proposed algorithm could provide a valuable technical support for public security by effectively improving the accuracy of prohibited item detection in X-ray images.

Key words: rotating target detection, attention mechanism, X-ray images, YOLOv7, prohibited item

CLC Number:

TP391

CHENG Lang, JING Chao. X-ray image rotating object detection based on improved YOLOv7[J]. Journal of Graphics, 2023, 44(2): 324-334.

Figures/Tables 13

References 37

[1]	常青青, 陈嘉敏, 李维姣. 城市轨道交通安检中基于X射线图像的危险品识别技术研究[J]. 城市轨道交通研究, 2022, 25(4): 205-209.
	CHANG Q Q, CHEN J M, LI W J. Dangerous goods detection technology based on X-ray images in urban rail transit security inspection[J]. Urban Mass Transit, 2022, 25(4): 205-209. (in Chinese)
[2]	李柯泉, 陈燕, 刘佳晨, 等. 基于深度学习的目标检测算法综述[J]. 计算机工程, 2022, 48(7): 1-12. DOI
	LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning-based object detection algorithms[J]. Computer Engineering, 2022, 48(7): 1-12. (in Chinese) DOI
[3]	张友康, 苏志刚, 张海刚, 等. X光安检图像多尺度违禁品检测[J]. 信号处理, 2020, 36(7): 1096-1106.
	ZHANG Y K, SU Z G, ZHANG H G, et al. Multi-scale prohibited item detection in X-ray security image[J]. Journal of Signal Processing, 2020, 36(7): 1096-1106. (in Chinese)
[4]	张宇, 马杰, 崔静雯, 等. 融合注意力机制的遥感图像旋转目标检测算法[J]. 激光与光电子学进展, 2022, 59(24): 2415005.
	ZHANG Y, MA J, CUI J, et al. Remote sensing image rotation target detection algorithm based on attention mechanism[J]. Laser & Optoelectronics Progress, 2022, 59(24): 2415005. (in Chinese)
[5]	梁添汾, 张南峰, 张艳喜, 等. 违禁品X光图像检测技术应用研究进展综述[J]. 计算机工程与应用, 2021, 57(16): 74-82. DOI
	LIANG T F, ZHANG N F, ZHANG Y X, et al. Summary of research progress on application of prohibited item detection in X-ray images[J]. Computer Engineering and Applications, 2021, 57(16): 74-82. (in Chinese) DOI
[6]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2004.10934.
[7]	GE Z, LIU S T, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2107.08430.
[8]	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-08-23]. https://arxiv.org/abs/1804.02767.
[9]	朱成, 李柏岩, 刘晓强, 等. 基于YOLO的违禁品检测深度卷积网络[J]. 合肥工业大学学报: 自然科学版, 2021, 44(9): 1198-1203.
	ZHU C, LI B Y, LIU X Q, et al. A deep convolutional neural network based on YOLO for contraband detection[J]. Journal of Hefei University of Technology: Natural Science, 2021, 44(9): 1198-1203. (in Chinese)
[10]	董乙杉, 李兆鑫, 郭靖圆, 等. 一种改进YOLOv5的X光违禁品检测模型[J]. 激光与光电子学进展, 2023, 60(4): 0415005.
	DONG Y S, LI Z X, GUO J Y, et al. An improved YOLOv5 model for X-ray prohibited items detection[J]. Laser & Optoelectronics Progress, 2023, 60(4): 0415005. (in Chinese)
[11]	吴海滨, 魏喜盈, 刘美红, 等. 结合空洞卷积和迁移学习改进YOLOv4的X光安检危险品检测[J]. 中国光学, 2021, 14(6): 1417-1425.
	WU H B, WEI X Y, LIU M H, et al. Improved YOLOv4 for dangerous goods detection in X-ray inspection combined with atrous convolution and transfer learning[J]. Chinese Optics, 2021, 14(6): 1417-1425. (in Chinese) DOI URL
[12]	廖育荣, 王海宁, 林存宝, 等. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203. DOI
	LIAO Y R, WANG H N, LIN C B, et al. Research progress of deep learning-based object detection of optical remote sensing image[J]. Journal on Communications, 2022, 43(5): 190-203. (in Chinese) DOI
[13]	LI W T, CHEN Y J, HU K X, et al. Oriented RepPoints for aerial object detection[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 1819-1828.
[14]	MING Q, MIAO L J, ZHOU Z Q, et al. Optimization for arbitrary-oriented object detection via representation invariance loss[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[15]	YANG X, YAN J C, MING Q, et al. Rethinking rotated object detection with Gaussian Wasserstein distance loss[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2101.11952.
[16]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2207.02696.
[17]	ZHANG H, ZU K K, LU J, et al. EPSANet: an efficient pyramid squeeze attention block on convolutional neural network[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2105.14447.
[18]	ZHANG Q L, YANG Y B. SA-net: shuffle attention for deep convolutional neural networks[C]// 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2021: 2235-2239.
[19]	ZHANG X D, ZENG H, GUO S, et al. Efficient long-range attention network for image super-resolution[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2203.06697.
[20]	YANG X, HOU L P, ZHOU Y, et al. Dense label encoding for boundary discontinuity free rotation detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 15814-15824.
[21]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. Scaled-YOLOv4: scaling cross stage partial network[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13024-13033.
[22]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style ConvNets great again[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13728-13737.
[23]	JIANG T T, CHENG J Y. Target recognition based on CNN with LeakyReLU and PReLU activation functions[C]// 2019 IEEE Conference on Sensing, Diagnostics, Prognostics, and Control. New York: IEEE Press, 2019: 718-722.
[24]	CORREIA, ALANA DE SANTANA, COLOMBINI, et al. Neural attention models in deep learning: survey and taxonomy[EB/OL]. [2022-08-23]. https://arxiv.org/abs/2112.05909.
[25]	张宸嘉, 朱磊, 俞璐. 卷积神经网络中的注意力机制综述[J]. 计算机工程与应用, 2021, 57(20): 64-72. DOI
	ZHANG C J, ZHU L, YU L. Review of attention mechanism in convolutional neural networks[J]. Computer Engineering and Applications, 2021, 57(20): 64-72. (in Chinese) DOI
[26]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[27]	LI X, WANG X, HU X L, et al. Selective kernel networks[C]// 2019 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 510-519.
[28]	HE K M, ZHANG X, REN SQ, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[29]	MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design[EB/OL]. [2022-08-23]. https://arxiv.org/abs/1807.11164.
[30]	YANG X, YANG J R, YAN J C, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 8231-8240.
[31]	TAO R S, WEI Y L, JIANG X J, et al. Towards real-world X-ray security inspection: a high-quality benchmark and lateral inhibition module for prohibited items detection[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 10903-10912.
[32]	WEI Y L, TAO R S, WU Z J, et al. Occluded prohibited items detection: an X-ray security inspection benchmark and de-occlusion attention module[C]/ The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 138-146.
[33]	WANG B Y, ZHANG L B, WEN L Y, et al. Towards real-world prohibited item detection: a large-scale X-ray benchmark[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5392-5401.
[34]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007.
[35]	TIAN Z, SHEN C H, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 9626-9635.
[36]	HAN J M, DING J, XUE N, et al. ReDet: a rotation- equivariant detector for aerial object detection[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2785-2794.
[37]	YANG X, LIU Q Q, YAN J C, et al. R3Det: refined single-stage detector with feature refinement for rotating object[C]// The 35th Conference on Association for the Advancement of Artificial Intelligence. Palo Alto: AAAI, 2021: 3163-3171.

Group	EPSA	SA	E-ELAN	DCL	Params (M)	FLOPs (G)	mAP@0.5 (%)
第1组	×	×	×	×	97.2	515.2	75.8
第2组	√	×	×	×	98.6	535.4	77.2
第3组	×	√	×	×	97.8	521.5	78.4
第4组	√	√	×	×	99.2	541.7	79.9
第5组	√	√	√	×	105.6	553.6	85.4
第6组	×	×	×	√	104.5	527.4	89.5
第7组	√	√	√	√	112.6	565.6	91.2

Group	EPSA	SA	E-ELAN	DCL	Params (M)	FLOPs (G)	mAP@0.5 (%)
第1组	×	×	×	×	97.2	515.2	75.8
第2组	√	×	×	×	98.6	535.4	77.2
第3组	×	√	×	×	97.8	521.5	78.4
第4组	√	√	×	×	99.2	541.7	79.9
第5组	√	√	√	×	105.6	553.6	85.4
第6组	×	×	×	√	104.5	527.4	89.5
第7组	√	√	√	√	112.6	565.6	91.2

Type	Methods	AP@0.5 (%)								mAP@0.5 (%)
Type	Methods	PO1	PO2	WA	LA	MP	TA	CO	NL	mAP@0.5 (%)
H	RetinaNet^[34]	73.5	76.7	76.2	82.3	79.8	81.5	50.6	12.7	66.7
H	FCOS^[35]	77.3	79.3	84.6	85.6	81.5	86.5	53.3	13.9	70.3
H	YOLOv7^[16]	87.6	87.4	87.8	88.9	89.9	87.9	63.6	14.5	75.9
R	ReDet^[36]	92.6	93.5	88.8	90.9	90.7	89.8	67.5	19.7	79.2
R	SCRDet^[30]	94.9	93.2	89.3	91.7	90.8	90.2	74.8	22.4	80.9
R	R3Det^[37]	97.5	95.5	94.8	94.9	97.0	95.9	79.7	28.3	85.5
R	YOLOv7-E6R(Ours)	98.6	96.8	97.9	99.2	98.3	97.4	86.8	54.3	91.2

Type	Methods	AP@0.5 (%)								mAP@0.5 (%)
Type	Methods	PO1	PO2	WA	LA	MP	TA	CO	NL	mAP@0.5 (%)
H	RetinaNet^[34]	73.5	76.7	76.2	82.3	79.8	81.5	50.6	12.7	66.7
H	FCOS^[35]	77.3	79.3	84.6	85.6	81.5	86.5	53.3	13.9	70.3
H	YOLOv7^[16]	87.6	87.4	87.8	88.9	89.9	87.9	63.6	14.5	75.9
R	ReDet^[36]	92.6	93.5	88.8	90.9	90.7	89.8	67.5	19.7	79.2
R	SCRDet^[30]	94.9	93.2	89.3	91.7	90.8	90.2	74.8	22.4	80.9
R	R3Det^[37]	97.5	95.5	94.8	94.9	97.0	95.9	79.7	28.3	85.5
R	YOLOv7-E6R(Ours)	98.6	96.8	97.9	99.2	98.3	97.4	86.8	54.3	91.2

Type	Methods	OPIXray mAP@0.5 (%)	PIDray		mAP@0.5 (%)
Type	Methods	OPIXray mAP@0.5 (%)	Easy	Hard	Hidden	Average
H	RetinaNet^[34]	75.6	61.3	51.3	37.6	50.1
H	FCOS^[35]	82.1	64.6	52.9	40.3	52.6
H	YOLOv7^[16]	83.7	72.3	57.4	42.8	57.5
R	ReDet^[36]	85.4	75.9	58.3	43.7	59.3
R	SCRDet^[30]	87.8	78.2	59.1	45.2	60.8
R	R3Det^[37]	88.0	82.5	64.8	48.9	65.4
R	YOLOv7-E6R(Ours)	92.6	84.5	65.4	49.2	66.4