Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (5): 968-978.DOI: 10.11996/JG.j.2095-302X.2024050968

• Image Processing and Computer Vision • Previous Articles     Next Articles

Feature fusion and inter-layer transmission: an improved object detection method based on Anchor DETR

ZHANG Dongping1(), WEI Yangyue1, HE Shuji1, XU Yunchao1, HU Haimiao2, HUANG Wenjun3   

  1. 1. College of Information Engineering, China Jiliang University, Hangzhou Zhejiang 310018, China
    2. Hangzhou Innovation Institute, Beihang University, Hangzhou Zhejiang 310051, China
    3. Supcon Technology Co.,ltd, Hangzhou Zhejiang 310053, China
  • Received:2024-07-02 Revised:2024-07-12 Online:2024-10-31 Published:2024-10-31
  • About author:First author contact:

    ZHANG Dongping (1970-), professor, Ph.D. His main research interests cover image processing and computer vision. E-mail:06a0303103@cjlu.edu.cn

  • Supported by:
    Key Research and Development Program of Zhejiang Province(2024C01028);Key Research and Development Program of Zhejiang Province(2024C01108);Key Research and Development Program of Zhejiang Province(2022C01082);Key Research and Development Program of Zhejiang Province(2023C01032)

Abstract:

Object detection is a crucial task in the field of computer vision, aiming to accurately identify and locate objects of interest in images or videos. An improved object detection algorithm was proposed by incorporating feature fusion, optimizing the inter-layer transmission method of the encoder, and designing a random jump retention method. These improvements addressed the limitations of general Transformer models in object detection tasks. Specifically, to counteract the issue of insufficient object information perception due to the computational constraints limiting Transformer vision models to a single layer of features, a convolutional attention mechanism was utilized to achieve effective multi-scale feature fusion, thereby enhancing the capability of object recognition and localization. By optimizing the transfer mode between encoder layers, each encoder layer effectively transmitted and learned more information, reducing information loss between layers. Additionally, to address the problem where predictions in the intermediate stages of the decoder outperformed those in the final stage, a random jump retention method was designed to improve the model’s prediction accuracy and stability. Experimental results demonstrated that the improved method significantly enhanced performance in object detection tasks. On the COCO2017 dataset, the model’s AP reached 42.3%, and the AP for small targets improved by 2.2%; on the PASCAL VOC2007 dataset, the model’s AP improved by 1.4%, and the AP for small targets improved by 2.4%.

Key words: object detection, feature fusion, Transformer, attention mechanism, image processing

CLC Number: