欢迎访问《图学学报》 分享到:

图学学报 ›› 2026, Vol. 47 ›› Issue (1): 90-98.DOI: 10.11996/JG.j.2095-302X.2026010090

• 图像处理与计算机视觉 • 上一篇    下一篇

一种大视角变换场景下的图像匹配方法

向梦丽, 黄志勇(), 佘雅丽, 丁妥君   

  1. 三峡大学计算机与信息学院湖北 宜昌 443000
  • 收稿日期:2025-06-24 接受日期:2025-08-27 出版日期:2026-02-28 发布日期:2026-03-16
  • 通讯作者:黄志勇,E-mail:hzy@hzy.org.cn
  • 基金资助:
    国家自然科学基金(62371271)

An image matching method for large viewpoint variation scenarios

XIANG Mengli, HUANG Zhiyong(), SHE Yali, DING Tuojun   

  1. College of Computer and Information Technology, China Three Gorges University, Yichang Hubei 443000, China
  • Received:2025-06-24 Accepted:2025-08-27 Published:2026-02-28 Online:2026-03-16
  • Supported by:
    National Natural Science Foundation of China(62371271)

摘要:

针对现有图像匹配方法在大视角变换场景中匹配精度和匹配数量显著下降的问题,提出了一种改进的E-LoFTR图像匹配方法。首先,采用先视角调整后精细匹配策略,提出了一种新颖的双阶段SIFT视角矫正模块,该模块结合了尺度不变特征变换(SIFT)算法的视角不变性与单应性变换(homography)的几何对齐能力,提高了模型对大视角变换的适应能力。然后,设计了方向感知门控注意力机制,使用多方向卷积和动态门控的级联结构提取查询(Q)、键(K)、值(V),注入的几何先验显著提升了模型的鲁棒性。最后,为了避免特征融合过程中的信息损失问题,使用Fusion-DySample上采样模块提升匹配性能。在公开数据集MegaDepth上的实验结果表明,所提出的方法在旋转误差阈值为5°,10°和20°下的相对位姿估计累计曲线下面积分别为57.1%,72.7%和83.9%,较E-LoFTR分别提升0.7%,0.5%和0.4%;在基于MegaDepth构建的全新数据集NewMega和私有工业数据集上,匹配点对数量和匹配正确率均显著提升。

关键词: 图像匹配, E-LoFTR, 大视角变换, SIFT, 注意力机制

Abstract:

To address the significant decline in matching accuracy and the number of correspondences exhibited by existing image-matching methods under large viewpoint variations, an improved image-matching approach based on E-LoFTR was proposed. Firstly, based on a strategy of viewpoint rectification followed by fine-grained matching, a novel two-stage SIFT-based viewpoint-rectification module was proposed, which leveraged the viewpoint invariance of the Scale-Invariant Feature Transform (SIFT) algorithm and the geometric alignment capability of homography to enhance matching accuracy under large viewpoint variations. Then, a directional-gated attention mechanism was designed that employed a cascaded structure of multi-directional convolutions and dynamic gating to extract queries (Q), keys (K), and values (V). The injected geometric priors significantly enhanced the model’s robustness. Lastly, to mitigate information loss during the upsampling of fused features, the Fusion-DySample module was incorporated to further improve performance. Experimental results on the public MegaDepth dataset showed that our method achieved relative pose estimation AUCs of 57.1%, 72.7%, and 83.9% under rotation error thresholds of 5°, 10°, and 20°, respectively, outperforming E-LoFTR by 0.7%, 0.5%, and 0.4%. On the newly constructed NewMega dataset based on MegaDepth and on a private industrial dataset, our method also demonstrated substantial improvements in both the number of matches and matching accuracy.

Key words: image matching, E-LoFTR, large perspective variation, SIFT, attention mechanism

中图分类号: