Journal of Graphics ›› 2024, Vol. 45 ›› Issue (5): 968-978.DOI: 10.11996/JG.j.2095-302X.2024050968
• Image Processing and Computer Vision • Previous Articles Next Articles
ZHANG Dongping1(), WEI Yangyue1, HE Shuji1, XU Yunchao1, HU Haimiao2, HUANG Wenjun3
Received:
2024-07-02
Revised:
2024-07-12
Online:
2024-10-31
Published:
2024-10-31
About author:
First author contact:ZHANG Dongping (1970-), professor, Ph.D. His main research interests cover image processing and computer vision. E-mail:06a0303103@cjlu.edu.cn
Supported by:
CLC Number:
ZHANG Dongping, WEI Yangyue, HE Shuji, XU Yunchao, HU Haimiao, HUANG Wenjun. Feature fusion and inter-layer transmission: an improved object detection method based on Anchor DETR[J]. Journal of Graphics, 2024, 45(5): 968-978.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024050968
模型 | Epoch | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% | 参数量/M | GFLOPs |
---|---|---|---|---|---|---|---|---|---|
RetinaNet | 36 | 38.7 | 58.0 | 41.5 | 23.3 | 42.3 | 50.3 | 38 | 205 |
Faster RCNN | 36 | 40.2 | 61.0 | 43.8 | 24.2 | 43.5 | 52.0 | 42 | 180 |
DETR | 500 | 42.0 | 62.4 | 44.2 | 20.5 | 45.8 | 61.1 | 41 | 86 |
Conditional DETR | 50 | 40.9 | 61.8 | 43.3 | 20.8 | 44.6 | 59.2 | 43 | 90 |
DAB DETR | 50 | 42.2 | 63.1 | 44.7 | 21.5 | 45.7 | 60.3 | 43 | 94 |
Anchor DETR | 50 | 42.1 | 63.1 | 44.9 | 22.3 | 46.2 | 60.0 | 37 | 164 |
改进算法 | 50 | 42.3 | 63.0 | 45.4 | 24.5 | 46.5 | 58.6 | 39 | 169 |
Table 1 Comparison of detection results, parameter quantities and GFLOPs of different algorithms on COCO2017 datasets
模型 | Epoch | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% | 参数量/M | GFLOPs |
---|---|---|---|---|---|---|---|---|---|
RetinaNet | 36 | 38.7 | 58.0 | 41.5 | 23.3 | 42.3 | 50.3 | 38 | 205 |
Faster RCNN | 36 | 40.2 | 61.0 | 43.8 | 24.2 | 43.5 | 52.0 | 42 | 180 |
DETR | 500 | 42.0 | 62.4 | 44.2 | 20.5 | 45.8 | 61.1 | 41 | 86 |
Conditional DETR | 50 | 40.9 | 61.8 | 43.3 | 20.8 | 44.6 | 59.2 | 43 | 90 |
DAB DETR | 50 | 42.2 | 63.1 | 44.7 | 21.5 | 45.7 | 60.3 | 43 | 94 |
Anchor DETR | 50 | 42.1 | 63.1 | 44.9 | 22.3 | 46.2 | 60.0 | 37 | 164 |
改进算法 | 50 | 42.3 | 63.0 | 45.4 | 24.5 | 46.5 | 58.6 | 39 | 169 |
模型 | Epoch | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% | 参数量/M | GFLOPs |
---|---|---|---|---|---|---|---|---|---|
Conditional DETR-R50 | 75 | 35.3 | 62.3 | 34.8 | 7.3 | 21.2 | 45.4 | 43 | 87 |
DAB DETR-R50 | 75 | 35.9 | 64.5 | 35.3 | 8.6 | 24.2 | 45.5 | 43 | 89 |
Sparse DETR-R50 | 75 | 38.3 | 64.8 | 39.4 | 10.7 | 27.6 | 47.1 | 40 | 171 |
Anchor DETR-R50 | 75 | 39.4 | 67.5 | 39.1 | 8.9 | 23.8 | 50.4 | 37 | 172 |
改进算法-R50 | 75 | 40.8 | 69.6 | 42.2 | 11.3 | 28.8 | 51.7 | 39 | 177 |
Conditional DETR-R101 | 75 | 36.5 | 63.5 | 35.3 | 8.0 | 25.8 | 46.0 | 62 | 154 |
DAB DETR-R101 | 75 | 39.3 | 66.9 | 41.0 | 10.5 | 27.8 | 49.1 | 62 | 155 |
Sparse DETR-R101 | 75 | 42.1 | 68.2 | 43.9 | 11.5 | 30.3 | 52.2 | 59 | 238 |
Anchor DETR-R101 | 75 | 42.7 | 70.9 | 44.1 | 9.9 | 30.3 | 53.8 | 56 | 238 |
改进算法-R101 | 75 | 43.1 | 71.1 | 45.3 | 12.8 | 30.7 | 55.6 | 58 | 243 |
Table 2 Comparison of detection results, parameter quantities and GFLOPs of different algorithms based on Transformer framework on VOC2007 datasets
模型 | Epoch | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% | 参数量/M | GFLOPs |
---|---|---|---|---|---|---|---|---|---|
Conditional DETR-R50 | 75 | 35.3 | 62.3 | 34.8 | 7.3 | 21.2 | 45.4 | 43 | 87 |
DAB DETR-R50 | 75 | 35.9 | 64.5 | 35.3 | 8.6 | 24.2 | 45.5 | 43 | 89 |
Sparse DETR-R50 | 75 | 38.3 | 64.8 | 39.4 | 10.7 | 27.6 | 47.1 | 40 | 171 |
Anchor DETR-R50 | 75 | 39.4 | 67.5 | 39.1 | 8.9 | 23.8 | 50.4 | 37 | 172 |
改进算法-R50 | 75 | 40.8 | 69.6 | 42.2 | 11.3 | 28.8 | 51.7 | 39 | 177 |
Conditional DETR-R101 | 75 | 36.5 | 63.5 | 35.3 | 8.0 | 25.8 | 46.0 | 62 | 154 |
DAB DETR-R101 | 75 | 39.3 | 66.9 | 41.0 | 10.5 | 27.8 | 49.1 | 62 | 155 |
Sparse DETR-R101 | 75 | 42.1 | 68.2 | 43.9 | 11.5 | 30.3 | 52.2 | 59 | 238 |
Anchor DETR-R101 | 75 | 42.7 | 70.9 | 44.1 | 9.9 | 30.3 | 53.8 | 56 | 238 |
改进算法-R101 | 75 | 43.1 | 71.1 | 45.3 | 12.8 | 30.7 | 55.6 | 58 | 243 |
特征融合 | 编码器层间传递优化 | 随机跳跃保持 | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|---|---|
- | - | - | 39.4 | 67.5 | 39.1 | 8.9 | 23.8 | 50.4 |
√ | - | - | 39.9 | 68.4 | 40.7 | 10.6 | 26.5 | 50.7 |
- | √ | - | 40.1 | 69.1 | 41.1 | 11.1 | 26.2 | 50.7 |
- | - | √ | 40.3 | 69.2 | 41.8 | 10.2 | 27.0 | 51.1 |
√ | √ | - | 40.4 | 69.0 | 40.8 | 11.2 | 27.4 | 50.6 |
- | √ | √ | 40.4 | 68.7 | 41.6 | 11.0 | 27.6 | 51.1 |
√ | - | √ | 40.5 | 69.4 | 42.0 | 10.8 | 27.6 | 51.4 |
√ | √ | √ | 40.8 | 69.6 | 42.2 | 11.3 | 28.8 | 51.7 |
Table 3 The results of the ablation experiment of the improved algorithm on VOC2007 dataset
特征融合 | 编码器层间传递优化 | 随机跳跃保持 | AP/% | AP50/% | AP75/% | APS/% | APM/% | APL/% |
---|---|---|---|---|---|---|---|---|
- | - | - | 39.4 | 67.5 | 39.1 | 8.9 | 23.8 | 50.4 |
√ | - | - | 39.9 | 68.4 | 40.7 | 10.6 | 26.5 | 50.7 |
- | √ | - | 40.1 | 69.1 | 41.1 | 11.1 | 26.2 | 50.7 |
- | - | √ | 40.3 | 69.2 | 41.8 | 10.2 | 27.0 | 51.1 |
√ | √ | - | 40.4 | 69.0 | 40.8 | 11.2 | 27.4 | 50.6 |
- | √ | √ | 40.4 | 68.7 | 41.6 | 11.0 | 27.6 | 51.1 |
√ | - | √ | 40.5 | 69.4 | 42.0 | 10.8 | 27.6 | 51.4 |
√ | √ | √ | 40.8 | 69.6 | 42.2 | 11.3 | 28.8 | 51.7 |
层间传递方式 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|
方法1 | 40.0 | 68.8 | 40.5 | 9.0 | 26.2 | 51.6 |
方法2 | 39.8 | 69.0 | 40.2 | 10.7 | 26.5 | 51.4 |
方法3 | 40.1 | 69.1 | 41.1 | 11.1 | 26.2 | 50.7 |
Table 4 Experimental results of different interlayer transfer modes of encoders/%
层间传递方式 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|
方法1 | 40.0 | 68.8 | 40.5 | 9.0 | 26.2 | 51.6 |
方法2 | 39.8 | 69.0 | 40.2 | 10.7 | 26.5 | 51.4 |
方法3 | 40.1 | 69.1 | 41.1 | 11.1 | 26.2 | 50.7 |
采样数 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|
50 | 39.3 | 68.0 | 39.4 | 10.3 | 25.2 | 49.9 |
100 | 40.3 | 69.2 | 41.8 | 10.2 | 27.0 | 51.1 |
200 | 39.2 | 67.9 | 39.0 | 10.0 | 24.8 | 50.0 |
300 | 39.0 | 68.1 | 38.8 | 10.5 | 24.3 | 49.8 |
Table 5 Random jump retention method sets the result of different sample numbers/%
采样数 | AP | AP50 | AP75 | APS | APM | APL |
---|---|---|---|---|---|---|
50 | 39.3 | 68.0 | 39.4 | 10.3 | 25.2 | 49.9 |
100 | 40.3 | 69.2 | 41.8 | 10.2 | 27.0 | 51.1 |
200 | 39.2 | 67.9 | 39.0 | 10.0 | 24.8 | 50.0 |
300 | 39.0 | 68.1 | 38.8 | 10.5 | 24.3 | 49.8 |
[1] | CHOUBISA M, KUMAR V, KUMAR M, et al. Object tracking in intelligent video surveillance system based on artificial system[C]// 2023 International Conference on Computational Intelligence, Communication Technology and Networking. New York: IEEE Press, 2023: 160-166. |
[2] | KAPOOR P. A video surveillance detection of moving object using deep learning[C]// 2023 3rd International Conference on Smart Generation Computing, Communication and Networking. New York: IEEE Press, 2023: 1-6. |
[3] | BAJGOTI A, GUPTA R, BALAJI P, et al. SwinAnomaly: real-time video anomaly detection using video Swin transformer and SORT[J]. IEEE Access, 2023, 11: 111093-111105. |
[4] | XIAO B P, GUO J H, HE Z F. Real-time object detection algorithm of autonomous vehicles based on improved YOLOv5s[C]// 2021 5th CAA International Conference on Vehicular Control and Intelligence. New York: IEEE Press, 2021: 1-6. |
[5] | SARDA A, DIXIT S, BHAN A. Object detection for autonomous driving using YOLO [you only look once] algorithm[C]// 2021 3rd International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV). New York: IEEE Press, 2021: 1370-1374. |
[6] | LI Z, GE Y F, WANG X H, et al. Industrial anomaly detection via teacher student network[C]// 2023 International Conference on Advanced Mechatronic Systems. New York: IEEE Press, 2023: 1-5. |
[7] |
翟永杰, 赵晓瑜, 王璐瑶, 等. IDD-YOLOv7: 一种用于输电线路绝缘子多缺陷的轻量化检测方法[J]. 图学学报, 2024, 45(1): 90-101.
DOI |
ZHAI Y J, ZHAO X Y, WANG L Y, et al. IDD-YOLOv7: a lightweight method for multiple defect detection of insulators in transmission lines[J]. Journal of Graphics, 2024, 45(1): 90-101 (in Chinese).
DOI |
|
[8] |
张相胜, 杨骁. 基于改进YOLOv7-tiny的橡胶密封圈缺陷检测方法[J]. 图学学报, 2024, 45(3): 446-453.
DOI |
ZHANG X S, YANG X. Defect detection method of rubber seal ring based on improved YOLOv7-tiny[J]. Journal of Graphics, 2024, 45(3): 446-453 (in Chinese).
DOI |
|
[9] | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2014: 580-587. |
[10] |
HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
DOI PMID |
[11] | GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1440-1448. |
[12] |
REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
DOI PMID |
[13] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 779-788. |
[14] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
[15] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York, United States Curran Associates Inc., 2017: 6000-6010. |
[16] |
黄友文, 林志钦, 章劲, 等. 结合坐标Transformer的轻量级人体姿态估计算法[J]. 图学学报, 2024, 45(3): 516-527.
DOI |
HUANG Y W, LIN Z Q, ZHANG J, et al. Lightweight human pose estimation algorithm combined with coordinate Transformer[J]. Journal of Graphics, 2024, 45(3): 516-527 (in Chinese).
DOI |
|
[17] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2018: 4171-4186. |
[18] | RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. (2018-06-11) [2024-02-12]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. |
[19] | RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9. |
[20] | BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// The 34th International Conference on Neural Information Processing Systems. New York: United States Curran Associates Inc., 2020: 159. |
[21] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03) [2024-02-12]. https://arxiv.org/pdf/2010.11929. |
[22] | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229. |
[23] | WANG Y M, ZHANG X Y, YANG T, et al. Anchor DETR: query design for transformer-based detection[C]// The 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2022: 2567-2575. |
[24] | CHEN F Y, ZHANG H, HU K, et al. Enhanced training of query-based object detection via selective query recollection[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 23756-23765. |
[25] | 周丽娟, 毛嘉宁. 视觉Transformer识别任务研究综述[J]. 中国图象图形学报, 2023, 28(10): 2969-3003. |
ZHOU L J, MAO J N. Vision transformer-based recognition tasks: a critical review[J]. Journal of Image and Graphics, 2023, 28(10): 2969-3003 (in Chinese). | |
[26] | 许正森, 雷相达, 管海燕. 多尺度局部特征增强Transformer道路裂缝检测模型[J]. 中国图象图形学报, 2023, 28(4): 1019-1028. |
XU Z S, LEI X D, GUAN H Y. Multi-scale local feature enhanced transformer network for pavement crack detection[J]. Journal of Image and Graphics, 2023, 28(4): 1019-1028 (in Chinese). | |
[27] | ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18) [2024-02-12]. https://arxiv.org/pdf/2010.04159. |
[28] | ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity[EB/OL]. (2022-03-04) [2024-02-12]. https://arxiv.org/pdf/2111.14330. |
[29] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
[30] | 孙旭辉, 官铮, 王学. 红外与可见光图像分组融合的视觉Transformer[J]. 中国图象图形学报, 2023, 28(1): 166-178. |
SUN X H, GUAN Z, WANG X. Vision transformer for fusing infrared and visible images in groups[J]. Journal of Image and Graphics, 2023, 28(1): 166-178 (in Chinese). | |
[31] | 樊圣澜, 柏正尧, 陆倩杰, 等. 基于Transformer网络的COVID-19肺部CT图像分割[J]. 中国图象图形学报, 2023, 28(10): 3203-3213. |
FAN S L, BAI Z Y, LU Q J, et al. A transformer network based CT image segmentation for COVID-19-derived lung disease[J]. Journal of Image and Graphics, 2023, 28(10): 3203-3213 (in Chinese). | |
[32] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[33] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// The 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. |
[34] | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. (2019-01-04) [2024-02-12]. https://arxiv.org/pdf/1711.05101. |
[35] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2999-3007. |
[36] | MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 3631-3640. |
[37] | LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. (2022-03-30) [2024-02-12]. https://arxiv.org/pdf/2201.12329. |
[38] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 770-778. |
[1] | LI Qiong, KAO Yueying, ZHANG Ying, XU Pei. Review on object detection in UAV aerial images [J]. Journal of Graphics, 2024, 45(6): 1145-1164. |
[2] | LIU Canfeng, SUN Hao, DONG Hui. Molecular amplification time series prediction research combining Transformer with Kolmogorov-Arnold network [J]. Journal of Graphics, 2024, 45(6): 1256-1265. |
[3] | LI Zhenfeng, FU Shichen, XU Le, MENG Bo, ZHANG Xin, QING Jianjun. Research on gangue target detection algorithm based on MBI-YOLOv8 [J]. Journal of Graphics, 2024, 45(6): 1301-1312. |
[4] | YAN Jianhong, RAN Tongxiao. Lightweight UAV image target detection algorithm based on YOLOv8 [J]. Journal of Graphics, 2024, 45(6): 1328-1337. |
[5] | HU Fengkuo, YE Lan, TAN Xianfeng, ZHANG Qinzhan, HU Zhixin, FANG Qing, WANG Lei, MAN Xiaofeng. A refined YOLOv8-based algorithm for lightweight pavement disease detection [J]. Journal of Graphics, 2024, 45(5): 892-900. |
[6] | WANG Yaru, FENG Lilong, SONG Xiaoke, QU Zhuo, YANG Ke, WANG Qianming, ZHAI Yongjie. TFD-YOLOv8: a transmission line foreign object detection method [J]. Journal of Graphics, 2024, 45(5): 901-912. |
[7] | LIU Yiyan, HAO Tingnan, HE Chen, CHANG Yingjie. Photovoltaic cell surface defect detection based on DBBR-YOLO [J]. Journal of Graphics, 2024, 45(5): 913-921. |
[8] | WU Peichen, YUAN Lining, HU Hao, LIU Zhao, GUO Fang. Video anomaly detection based on attention feature fusion [J]. Journal of Graphics, 2024, 45(5): 922-929. |
[9] | LIU Li, ZHANG Qifan, BAI Yuang, HUANG Kaiye. Research on multi-scale remote sensing image change detection using Swin Transformer [J]. Journal of Graphics, 2024, 45(5): 941-956. |
[10] | JIANG Xiaoheng, DUAN Jinzhong, LU Yang, CUI Lisha, XU Mingliang. Fusing prior knowledge reasoning for surface defect detection [J]. Journal of Graphics, 2024, 45(5): 957-967. |
[11] | XIE Guobo, LIN Songze, LIN Zhiyi, WU Chenfeng, LIANG Lihui. Road defect detection algorithm based on improved YOLOv7-tiny [J]. Journal of Graphics, 2024, 45(5): 987-997. |
[12] | XIONG Chao, WANG Yunyan, LUO Yuhao. Multi-view stereo network reconstruction with feature alignment and context-guided [J]. Journal of Graphics, 2024, 45(5): 1008-1016. |
[13] | PENG Wen, LIN Jinwei. A short chromosome classification method based on spatial attention and texture enhancement [J]. Journal of Graphics, 2024, 45(5): 1017-1029. |
[14] | SUN Jilong, LIU Yong, ZHOU Liwei, LU Xin, HOU Xiaolong, WANG Yaqiong, WANG Zhifeng. Research on efficient detection model of tunnel lining crack based on DCNv2 and Transformer Decoder [J]. Journal of Graphics, 2024, 45(5): 1050-1061. |
[15] | LIU Zongming, HONG Wei, LONG Rui, ZHU Yue, ZHANG Xiaoyu. Research on automatic generation and application of Ruyuan Yao embroidery based on self-attention mechanism [J]. Journal of Graphics, 2024, 45(5): 1096-1105. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||