欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (5): 968-978.DOI: 10.11996/JG.j.2095-302X.2024050968

• 图像处理与计算机视觉 • 上一篇    下一篇

特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法

章东平1(), 魏杨悦1, 何数技1, 徐云超1, 胡海苗2, 黄文君3   

  1. 1.中国计量大学信息工程学院,浙江 杭州 310018
    2.北京航空航天大学杭州创新研究院,浙江 杭州 310051
    3.浙江中控技术股份有限公司,浙江 杭州 310053
  • 收稿日期:2024-07-02 修回日期:2024-07-12 出版日期:2024-10-31 发布日期:2024-10-31
  • 第一作者:章东平(1970-),男,教授,博士。主要研究方向为图像处理和计算机视觉。E-mail:06a0303103@cjlu.edu.cn
  • 基金资助:
    浙江省重点研发计划项目(2024C01028);浙江省重点研发计划项目(2024C01108);浙江省重点研发计划项目(2022C01082);浙江省重点研发计划项目(2023C01032)

Feature fusion and inter-layer transmission: an improved object detection method based on Anchor DETR

ZHANG Dongping1(), WEI Yangyue1, HE Shuji1, XU Yunchao1, HU Haimiao2, HUANG Wenjun3   

  1. 1. College of Information Engineering, China Jiliang University, Hangzhou Zhejiang 310018, China
    2. Hangzhou Innovation Institute, Beihang University, Hangzhou Zhejiang 310051, China
    3. Supcon Technology Co.,ltd, Hangzhou Zhejiang 310053, China
  • Received:2024-07-02 Revised:2024-07-12 Published:2024-10-31 Online:2024-10-31
  • First author:ZHANG Dongping (1970-), professor, Ph.D. His main research interests cover image processing and computer vision. E-mail:06a0303103@cjlu.edu.cn
  • Supported by:
    Key Research and Development Program of Zhejiang Province(2024C01028);Key Research and Development Program of Zhejiang Province(2024C01108);Key Research and Development Program of Zhejiang Province(2022C01082);Key Research and Development Program of Zhejiang Province(2023C01032)

摘要:

目标检测是计算机视觉领域中的一项重要任务,旨在从图像或视频中准确识别和定位感兴趣的目标物体。本文提出了一种改进的目标检测算法,通过增加特征融合、优化编码器层间传递方式和设计随机跳跃保持方法,解决一般Transformer模型在目标检测任务中存在的局限性。针对Transformer视觉模型由于计算量限制只应用一层特征,导致目标对象信息感知不足的问题,利用卷积注意力机制实现了多尺度特征的有效融合,提高了对目标的识别和定位能力。通过优化编码器的层间传递方式,使得每层编码器有效地传递和学习更多的信息,减少层间信息的丢失。还针对解码器中间阶段预测优于最终阶段的问题,设计了随机跳跃保持方法,提高了模型的预测准确性和稳定性。实验结果表明,改进方法在目标检测任务中取得了显著的性能提升,在COCO2017数据集上,模型的平均精度AP达到了42.3%,小目标的平均精度提高了2.2%;在PASCAL VOC2007数据集上,模型的平均精度AP提高了1.4%,小目标的平均精度提高了2.4%。

关键词: 目标检测, 特征融合, Transformer, 注意力机制, 图像处理

Abstract:

Object detection is a crucial task in the field of computer vision, aiming to accurately identify and locate objects of interest in images or videos. An improved object detection algorithm was proposed by incorporating feature fusion, optimizing the inter-layer transmission method of the encoder, and designing a random jump retention method. These improvements addressed the limitations of general Transformer models in object detection tasks. Specifically, to counteract the issue of insufficient object information perception due to the computational constraints limiting Transformer vision models to a single layer of features, a convolutional attention mechanism was utilized to achieve effective multi-scale feature fusion, thereby enhancing the capability of object recognition and localization. By optimizing the transfer mode between encoder layers, each encoder layer effectively transmitted and learned more information, reducing information loss between layers. Additionally, to address the problem where predictions in the intermediate stages of the decoder outperformed those in the final stage, a random jump retention method was designed to improve the model’s prediction accuracy and stability. Experimental results demonstrated that the improved method significantly enhanced performance in object detection tasks. On the COCO2017 dataset, the model’s AP reached 42.3%, and the AP for small targets improved by 2.2%; on the PASCAL VOC2007 dataset, the model’s AP improved by 1.4%, and the AP for small targets improved by 2.4%.

Key words: object detection, feature fusion, Transformer, attention mechanism, image processing

中图分类号: