Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 472-481.DOI: 10.11996/JG.j.2095-302X.2024030472

Previous Articles     Next Articles

Orthogonal fusion image descriptor based on global attention

AI Liefu1(), TAO Yong1,2, JIANG Changyu1   

  1. 1. School of Computer and Information, Anqing Normal University, Anqing Anhui 246133, China
    2. School of Smart Transportation Modern Industry, Anhui Sanlian University, Hefei Anhui 230601, China
  • Received:2023-09-11 Accepted:2023-12-29 Online:2024-06-30 Published:2024-06-11
  • About author:

    AI Liefu (1985-), associate professor, Ph.D. His main research interests cover content-based image retrieval and machine learning. E-mail:ailiefu@qq.com

  • Supported by:
    National Natural Science Foundation of Anhui Province in China(1608085MF144);National Natural Science Foundation of Anhui Province in China(1908085MF194);University Science Research Project of Anhui Province in China(KJ2020A0498)

Abstract:

Image descriptors are important research objects in computer vision tasks and are widely applied to the fields of image classification, segmentation, recognition, and retrieval. The depth image descriptor lacks the correlation between the high-dimensional feature space and channel information in the local feature extraction branch, resulting in insufficient information for local feature expression. Therefore, an image descriptor combining local and global features was proposed. The multi-scale feature map was extracted through dilated convolution in the local feature extraction branch. After the output features were spliced, the relevant channel-space information was captured through a global attention mechanism with a multilayer perceptron. Then the final local features were output after processing. The high-dimensional global branches generated global feature vectors through global pooling and full convolution. The orthogonal values of local features were extracted on the global feature vector, and were then concatenated with the global features to form the final descriptor. At the same time, the robustness of the model in large-scale datasets were enhanced by employing the angular domain loss function containing the sub-class center. The experimental results on the publicly available datasets Roxford5k and Rparis6k demonstrated that in medium and hard modes, the average retrieval accuracy of this descriptor reached 81.87% and 59.74%, and 91.61% and 79.12%, respectively. This represented an improvement of 1.70% and 1.56%, and 2.00% and 1.83% compared to that of deep orthogonal fusion descriptors. It exhibited superior retrieval accuracy over other image descriptors.

Key words: image descriptor, dilated convolution, global attention, feature fusion, sub-center arcface

CLC Number: