Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 472-481.DOI: 10.11996/JG.j.2095-302X.2024030472
Previous Articles Next Articles
AI Liefu1(), TAO Yong1,2, JIANG Changyu1
Received:
2023-09-11
Accepted:
2023-12-29
Online:
2024-06-30
Published:
2024-06-11
About author:
AI Liefu (1985-), associate professor, Ph.D. His main research interests cover content-based image retrieval and machine learning. E-mail:ailiefu@qq.com
Supported by:
CLC Number:
AI Liefu, TAO Yong, JIANG Changyu. Orthogonal fusion image descriptor based on global attention[J]. Journal of Graphics, 2024, 45(3): 472-481.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024030472
消融实验 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
空洞卷积(×) | 81.81 | 59.35 | 89.87 | 79.08 |
全局注意力(×) | 81.44 | 59.18 | 91.13 | 78.87 |
自注意力(×) | 80.84 | 58.67 | 90.77 | 78.34 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
Table 1 Experimental results of eliminating partial modules/%
消融实验 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
空洞卷积(×) | 81.81 | 59.35 | 89.87 | 79.08 |
全局注意力(×) | 81.44 | 59.18 | 91.13 | 78.87 |
自注意力(×) | 80.84 | 58.67 | 90.77 | 78.34 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DELF | 76.00 | 52.40 | 80.20 | 58.60 |
ASMK | 79.10 | 52.70 | 91.00 | 81.00 |
DELG | 79.08 | 58.40 | 88.78 | 76.20 |
How-ASMK | 79.40 | 56.90 | 81.60 | 62.40 |
Hot-Refresh | 67.34 | 53.28 | 81.63 | 68.96 |
DOLG | 80.50 | 58.82 | 89.81 | 77.70 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
Table 2 Experimental results of different descriptor algorithms/%
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DELF | 76.00 | 52.40 | 80.20 | 58.60 |
ASMK | 79.10 | 52.70 | 91.00 | 81.00 |
DELG | 79.08 | 58.40 | 88.78 | 76.20 |
How-ASMK | 79.40 | 56.90 | 81.60 | 62.40 |
Hot-Refresh | 67.34 | 53.28 | 81.63 | 68.96 |
DOLG | 80.50 | 58.82 | 89.81 | 77.70 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DOLG | 92.57 | 71.14 | 98.43 | 93.71 |
GA-DOLG | 93.76 | 72.27 | 99.41 | 94.17 |
Table 3 mAP@10 Experimental results/%
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DOLG | 92.57 | 71.14 | 98.43 | 93.71 |
GA-DOLG | 93.76 | 72.27 | 99.41 | 94.17 |
[1] | LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. |
[2] | BAY H, TUYTELAARS T, VAN GOOL L. SURF: speeded up robust features[M]//Computer Vision - ECCV 2006. Heidelberg: Springer, 2006: 404-417. |
[3] | SIVIC J, ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]// The 9th IEEE International Conference on Computer Vision. New York: IEEE Press, 2008: 1470-1477. |
[4] | PERRONNIN F, DANCE C. Fisher kernels on visual vocabularies for image categorization[C]// 2007 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2007: 1-8. |
[5] | JÉGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]// 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2010: 3304-3311. |
[6] | 魏本昌, 郑丽, 管涛. 残差增强的图像描述符[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 1039-1045. |
WEI B C, ZHENG L, GUAN T. Residual enhanced image descriptor[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(6): 1039-1045 (in Chinese). | |
[7] |
JÉGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128.
DOI PMID |
[8] |
WANG J D, ZHANG T, SONG J K, et al. A survey on learning to hash[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 769-790.
DOI PMID |
[9] | BEIS J S, LOWE D G. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces[C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2002: 1000-1006. |
[10] | NISTER D, STEWENIUS H. Scalable recognition with a vocabulary tree[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2006: 2161-2168. |
[11] | 吴泽斌, 于俊清, 何云峰, 等. 一种用于图像检索的多层语义二值描述符[J]. 计算机学报, 2020, 43(9): 1641-1655. |
WU Z B, YU J Q, HE Y F, et al. Multi-level semantic binary descriptor for image retrieval[J]. Chinese Journal of Computers, 2020, 43(9): 1641-1655 (in Chinese). | |
[12] | SIMÉONI O, AVRITHIS Y, CHUM O. Local features and visual words emerge in activations[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11643-11652. |
[13] | FISCHLER M A, BOLLES R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[M]// Readings in Computer Vision. Amsterdam: Elsevier, 1987: 726-740. |
[14] | TOLIAS G, AVRITHIS Y, JÉGOU H. Image search with selective match kernels: aggregation across single and multiple images[J]. International Journal of Computer Vision, 2016, 116(3): 247-261. |
[15] | NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 3476-3485. |
[16] | CAO B Y, ARAUJO A, SIM J. Unifying deep local and global features for image search[C]// European Conference on Computer Vision. Cham: Springer, 2020: 726-743. |
[17] | ÖZTÜRK Ş, ÇELIK E, ÇUKUR T. Content-based medical image retrieval with opponent class adaptive margin loss[J]. Information Sciences, 2023, 637: 118938. |
[18] | ARANDJELOVIĆ R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. New York: IEEE Press, 2018: 1437-1451. |
[19] | HAUSLER S, GARG S, XU M, et al. Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14136-14147. |
[20] | TOLIAS G, JENICEK T, CHUM O. Learning and aggregating deep local descriptors for instance-level recognition[C]// European Conference on Computer Vision. Cham: Springer, 2020: 460-477. |
[21] |
RADENOVIC F, TOLIAS G, CHUM O. Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7): 1655-1668.
DOI PMID |
[22] | ZHU Y Y, CAO G, YANG Z Y, et al. Learning relation-based features for fine-grained image retrieval[J]. Pattern Recognition, 2023, 140: 109543. |
[23] | YANG M, HE D L, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11752-11761. |
[24] | LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2112.05561.pdf. |
[25] | WANG P, LI X, YARAS C, et al. Understanding deep representation learning via layerwise feature compression and discrimination[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2311.02960.pdf. |
[26] | DENG J K, GUO J, XUE N N, et al. ArcFace: additive angular margin loss for deep face recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4685-4694. |
[27] | DENG J K, GUO J, LIU T L, et al. Sub-center ArcFace: boosting face recognition by large-scale noisy web faces[C]// European Conference on Computer Vision. Cham: Springer, 2020: 741-757. |
[28] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[29] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[30] | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module[EB/OL]. [2023-01-23]. http://arxiv.org/abs/1807.06514.pdf. |
[31] | MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 3138-3147. |
[32] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2023-01-23]. http://arxiv.org/abs/1706.05587.pdf. |
[33] | NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 3476-3485. |
[34] | QIN Q, HU W P, LIU B. Feature projection for improved text classification[C]// The 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 8161-8171. |
[35] | HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[M]. Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 630-645. |
[36] | WEYAND T, ARAUJO A, CAO B Y, et al. Google landmarks dataset v2-A large-scale benchmark for instance-level recognition and retrieval[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2572-2581. |
[37] | RADENOVIC F, ISCEN A, TOLIAS G, et al. Revisiting Oxford and Paris: large-scale image retrieval benchmarking[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5706-5715. |
[38] | ZHANG B J, GE Y X, SHEN Y T, et al. Hot-refresh model upgrades with regression-alleviating compatible training in image retrieval[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2201.09724.pdf. |
[1] | LUO Zhihui, HU Haitao, MA Xiaofeng, CHENG Wengang. A network based on the homogeneous middle modality for cross-modality person re-identification [J]. Journal of Graphics, 2024, 45(4): 670-682. |
[2] | WEI Min, YAO Xin. Two-stage storm entity prediction based on multiscale and attention [J]. Journal of Graphics, 2024, 45(4): 696-704. |
[3] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. |
[4] | CUI Kebin, JIAO Jingyi. Steel surface defect detection algorithm based on MCB-FAH-YOLOv8 [J]. Journal of Graphics, 2024, 45(1): 112-125. |
[5] | ZHANG Li-yuan, ZHAO Hai-rong, HE Wei, TANG Xiong-feng. Knee cysts detection algorithm based on Mask R-CNN integrating global-local attention module [J]. Journal of Graphics, 2023, 44(6): 1183-1190. |
[6] | SHI Jia-hao, YAO Li. Video captioning based on semantic guidance [J]. Journal of Graphics, 2023, 44(6): 1191-1201. |
[7] | LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism [J]. Journal of Graphics, 2023, 44(4): 658-666. |
[8] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[9] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[10] | LIU Bing, YE Cheng-xu. Fine-grained classification model of lung disease for imbalanced data [J]. Journal of Graphics, 2023, 44(3): 513-520. |
[11] | SHI Cai-juan, SHI Ze, YAN Jin-wei, BI Yang-yang. Bi-directionally aligned VAE based on double semantics for generalized zero-shot learning [J]. Journal of Graphics, 2023, 44(3): 521-530. |
[12] | LU Qiu, SHAO Hua-ze, ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. |
[13] | LI Xiao-bo, LI Yang-gui, GUO Ning, FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism [J]. Journal of Graphics, 2023, 44(1): 16-25. |
[14] | ZHANG Qian, WANG Xia-li, WANG Wei-hao, WU Li-zhan, LI Chao. Cell counting method based on multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(1): 41-49. |
[15] | GUO Wen , LI Dong , YUAN Fei. 1. School of Information and Electronic Engineering, Shandong Technology and Business University, Yantai Shandong 264005, China; 2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100195, China [J]. Journal of Graphics, 2022, 43(6): 1124-1133. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||