图学学报 ›› 2024, Vol. 45 ›› Issue (3): 472-481.DOI: 10.11996/JG.j.2095-302X.2024030472
收稿日期:
2023-09-11
接受日期:
2023-12-29
出版日期:
2024-06-30
发布日期:
2024-06-11
第一作者:
艾列富(1985-),男,副教授,博士。主要研究方向为基于内容的图像检索和机器学习。E-mail:ailiefu@qq.com
基金资助:
AI Liefu1(), TAO Yong1,2, JIANG Changyu1
Received:
2023-09-11
Accepted:
2023-12-29
Published:
2024-06-30
Online:
2024-06-11
First author:
AI Liefu (1985-), associate professor, Ph.D. His main research interests cover content-based image retrieval and machine learning. E-mail:ailiefu@qq.com
Supported by:
摘要:
图像描述符是计算机视觉任务重要研究对象,被广泛应用于图像分类、分割、识别与检索等领域。深度图像描述符在局部特征提取分支缺少高维特征的空间与通道信息的关联性,导致局部特征表达的信息不充分。为此,提出一种融合局部、全局特征的图像描述符,在局部特征提取分支进行膨胀卷积提取多尺度特征图,输出的特征拼接后经过含有多层感知器的全局注意力机制捕捉具有关联性的通道-空间信息,再加工后输出最终的局部特征;高维的全局分支经过全局池化和全卷积生成全局特征向量;提取局部特征在全局特征向量上的正交值与全局特征串联后聚合形成最终的描述符。同时,在特征约束方面,使用包含子类心的角域度损失函数增大模型在大规模数据集的鲁棒性。在国际公开数据集Roxford5k和Rparis6k上进行实验,所提出描述符的平均检索精度在medium和hard模式分别为81.87%和59.74%以及91.61%和79.12%,比深度正交融合描述符分别提升了1.70%,1.56%,2.00%和1.83%,较其他图像描述符具有更好的检索精度。
中图分类号:
艾列富, 陶勇, 蒋常玉. 基于全局注意力的正交融合图像描述符[J]. 图学学报, 2024, 45(3): 472-481.
AI Liefu, TAO Yong, JIANG Changyu. Orthogonal fusion image descriptor based on global attention[J]. Journal of Graphics, 2024, 45(3): 472-481.
消融实验 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
空洞卷积(×) | 81.81 | 59.35 | 89.87 | 79.08 |
全局注意力(×) | 81.44 | 59.18 | 91.13 | 78.87 |
自注意力(×) | 80.84 | 58.67 | 90.77 | 78.34 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
表1 消除部分模块的实验结果/%
Table 1 Experimental results of eliminating partial modules/%
消融实验 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
空洞卷积(×) | 81.81 | 59.35 | 89.87 | 79.08 |
全局注意力(×) | 81.44 | 59.18 | 91.13 | 78.87 |
自注意力(×) | 80.84 | 58.67 | 90.77 | 78.34 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DELF | 76.00 | 52.40 | 80.20 | 58.60 |
ASMK | 79.10 | 52.70 | 91.00 | 81.00 |
DELG | 79.08 | 58.40 | 88.78 | 76.20 |
How-ASMK | 79.40 | 56.90 | 81.60 | 62.40 |
Hot-Refresh | 67.34 | 53.28 | 81.63 | 68.96 |
DOLG | 80.50 | 58.82 | 89.81 | 77.70 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
表2 不同描述符算法的实验结果/%
Table 2 Experimental results of different descriptor algorithms/%
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DELF | 76.00 | 52.40 | 80.20 | 58.60 |
ASMK | 79.10 | 52.70 | 91.00 | 81.00 |
DELG | 79.08 | 58.40 | 88.78 | 76.20 |
How-ASMK | 79.40 | 56.90 | 81.60 | 62.40 |
Hot-Refresh | 67.34 | 53.28 | 81.63 | 68.96 |
DOLG | 80.50 | 58.82 | 89.81 | 77.70 |
GA-DOLG | 81.87 | 59.74 | 91.61 | 79.12 |
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DOLG | 92.57 | 71.14 | 98.43 | 93.71 |
GA-DOLG | 93.76 | 72.27 | 99.41 | 94.17 |
表3 mAP@10的实验结果/%
Table 3 mAP@10 Experimental results/%
描述符 | Roxf- medium | Roxf- hard | Rpar- medium | Rpar- hard |
---|---|---|---|---|
DOLG | 92.57 | 71.14 | 98.43 | 93.71 |
GA-DOLG | 93.76 | 72.27 | 99.41 | 94.17 |
[1] | LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. |
[2] | BAY H, TUYTELAARS T, VAN GOOL L. SURF: speeded up robust features[M]//Computer Vision - ECCV 2006. Heidelberg: Springer, 2006: 404-417. |
[3] | SIVIC J, ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]// The 9th IEEE International Conference on Computer Vision. New York: IEEE Press, 2008: 1470-1477. |
[4] | PERRONNIN F, DANCE C. Fisher kernels on visual vocabularies for image categorization[C]// 2007 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2007: 1-8. |
[5] | JÉGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]// 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2010: 3304-3311. |
[6] | 魏本昌, 郑丽, 管涛. 残差增强的图像描述符[J]. 计算机辅助设计与图形学学报, 2019, 31(6): 1039-1045. |
WEI B C, ZHENG L, GUAN T. Residual enhanced image descriptor[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(6): 1039-1045 (in Chinese). | |
[7] |
JÉGOU H, DOUZE M, SCHMID C. Product quantization for nearest neighbor search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(1): 117-128.
DOI PMID |
[8] |
WANG J D, ZHANG T, SONG J K, et al. A survey on learning to hash[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 769-790.
DOI PMID |
[9] | BEIS J S, LOWE D G. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces[C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2002: 1000-1006. |
[10] | NISTER D, STEWENIUS H. Scalable recognition with a vocabulary tree[C]// 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2006: 2161-2168. |
[11] | 吴泽斌, 于俊清, 何云峰, 等. 一种用于图像检索的多层语义二值描述符[J]. 计算机学报, 2020, 43(9): 1641-1655. |
WU Z B, YU J Q, HE Y F, et al. Multi-level semantic binary descriptor for image retrieval[J]. Chinese Journal of Computers, 2020, 43(9): 1641-1655 (in Chinese). | |
[12] | SIMÉONI O, AVRITHIS Y, CHUM O. Local features and visual words emerge in activations[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11643-11652. |
[13] | FISCHLER M A, BOLLES R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[M]// Readings in Computer Vision. Amsterdam: Elsevier, 1987: 726-740. |
[14] | TOLIAS G, AVRITHIS Y, JÉGOU H. Image search with selective match kernels: aggregation across single and multiple images[J]. International Journal of Computer Vision, 2016, 116(3): 247-261. |
[15] | NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 3476-3485. |
[16] | CAO B Y, ARAUJO A, SIM J. Unifying deep local and global features for image search[C]// European Conference on Computer Vision. Cham: Springer, 2020: 726-743. |
[17] | ÖZTÜRK Ş, ÇELIK E, ÇUKUR T. Content-based medical image retrieval with opponent class adaptive margin loss[J]. Information Sciences, 2023, 637: 118938. |
[18] | ARANDJELOVIĆ R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. New York: IEEE Press, 2018: 1437-1451. |
[19] | HAUSLER S, GARG S, XU M, et al. Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14136-14147. |
[20] | TOLIAS G, JENICEK T, CHUM O. Learning and aggregating deep local descriptors for instance-level recognition[C]// European Conference on Computer Vision. Cham: Springer, 2020: 460-477. |
[21] |
RADENOVIC F, TOLIAS G, CHUM O. Fine-tuning CNN image retrieval with No human annotation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7): 1655-1668.
DOI PMID |
[22] | ZHU Y Y, CAO G, YANG Z Y, et al. Learning relation-based features for fine-grained image retrieval[J]. Pattern Recognition, 2023, 140: 109543. |
[23] | YANG M, HE D L, FAN M, et al. DOLG: single-stage image retrieval with deep orthogonal fusion of local and global features[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 11752-11761. |
[24] | LIU Y C, SHAO Z R, HOFFMANN N. Global attention mechanism: retain information to enhance channel-spatial interactions[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2112.05561.pdf. |
[25] | WANG P, LI X, YARAS C, et al. Understanding deep representation learning via layerwise feature compression and discrimination[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2311.02960.pdf. |
[26] | DENG J K, GUO J, XUE N N, et al. ArcFace: additive angular margin loss for deep face recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4685-4694. |
[27] | DENG J K, GUO J, LIU T L, et al. Sub-center ArcFace: boosting face recognition by large-scale noisy web faces[C]// European Conference on Computer Vision. Cham: Springer, 2020: 741-757. |
[28] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[29] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[30] | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module[EB/OL]. [2023-01-23]. http://arxiv.org/abs/1807.06514.pdf. |
[31] | MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 3138-3147. |
[32] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. [2023-01-23]. http://arxiv.org/abs/1706.05587.pdf. |
[33] | NOH H, ARAUJO A, SIM J, et al. Large-scale image retrieval with attentive deep local features[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 3476-3485. |
[34] | QIN Q, HU W P, LIU B. Feature projection for improved text classification[C]// The 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 8161-8171. |
[35] | HE K M, ZHANG X Y, REN S Q, et al. Identity mappings in deep residual networks[M]. Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 630-645. |
[36] | WEYAND T, ARAUJO A, CAO B Y, et al. Google landmarks dataset v2-A large-scale benchmark for instance-level recognition and retrieval[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2572-2581. |
[37] | RADENOVIC F, ISCEN A, TOLIAS G, et al. Revisiting Oxford and Paris: large-scale image retrieval benchmarking[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5706-5715. |
[38] | ZHANG B J, GE Y X, SHEN Y T, et al. Hot-refresh model upgrades with regression-alleviating compatible training in image retrieval[EB/OL]. [2023-01-23]. http://arxiv.org/abs/2201.09724.pdf. |
[1] | 罗智徽, 胡海涛, 马潇峰, 程文刚. 基于同质中间模态的跨模态行人再识别方法[J]. 图学学报, 2024, 45(4): 670-682. |
[2] | 牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735. |
[3] | 崔克彬, 焦静颐. 基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法[J]. 图学学报, 2024, 45(1): 112-125. |
[4] | 张丽媛, 赵海蓉, 何巍, 唐雄风. 融合全局-局部注意模块的Mask R-CNN膝关节囊肿检测方法[J]. 图学学报, 2023, 44(6): 1183-1190. |
[5] | 石佳豪, 姚莉. 基于语义引导的视频描述生成[J]. 图学学报, 2023, 44(6): 1191-1201. |
[6] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. |
[7] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. |
[8] | 李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481. |
[9] | 刘冰, 叶成绪. 面向不平衡数据的肺部疾病细粒度分类模型[J]. 图学学报, 2023, 44(3): 513-520. |
[10] | 史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530. |
[11] | 陆秋, 邵铧泽, 张云磊. 动态平衡多尺度特征融合的结直肠息肉分割[J]. 图学学报, 2023, 44(2): 225-232. |
[12] | 李小波, 李阳贵, 郭宁, 范震. 融合注意力机制的YOLOv5口罩检测算法[J]. 图学学报, 2023, 44(1): 16-25. |
[13] | 张倩, 王夏黎, 王炜昊, 武历展, 李超. 基于多尺度特征融合的细胞计数方法[J]. 图学学报, 2023, 44(1): 41-49. |
[14] | 武历展, 王夏黎, 张 倩, 王炜昊, 李 超. 基于优化 YOLOv5s 的跌倒人物目标检测方法[J]. 图学学报, 2022, 43(5): 791-802. |
[15] | 王素琴, 任琪, 石敏, 朱登明. 基于异常检测的产品表面缺陷检测与分割[J]. 图学学报, 2022, 43(3): 377-386. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||