一种大视角变换场景下的图像匹配方法

doi:10.11996/JG.j.2095-302X.2026010090

摘要/Abstract

摘要：

针对现有图像匹配方法在大视角变换场景中匹配精度和匹配数量显著下降的问题，提出了一种改进的E-LoFTR图像匹配方法。首先，采用先视角调整后精细匹配策略，提出了一种新颖的双阶段SIFT视角矫正模块，该模块结合了尺度不变特征变换(SIFT)算法的视角不变性与单应性变换(homography)的几何对齐能力，提高了模型对大视角变换的适应能力。然后，设计了方向感知门控注意力机制，使用多方向卷积和动态门控的级联结构提取查询(Q)、键(K)、值(V)，注入的几何先验显著提升了模型的鲁棒性。最后，为了避免特征融合过程中的信息损失问题，使用Fusion-DySample上采样模块提升匹配性能。在公开数据集MegaDepth上的实验结果表明，所提出的方法在旋转误差阈值为5°，10°和20°下的相对位姿估计累计曲线下面积分别为57.1%，72.7%和83.9%，较E-LoFTR分别提升0.7%，0.5%和0.4%；在基于MegaDepth构建的全新数据集NewMega和私有工业数据集上，匹配点对数量和匹配正确率均显著提升。

关键词: 图像匹配, E-LoFTR, 大视角变换, SIFT, 注意力机制

Abstract:

To address the significant decline in matching accuracy and the number of correspondences exhibited by existing image-matching methods under large viewpoint variations, an improved image-matching approach based on E-LoFTR was proposed. Firstly, based on a strategy of viewpoint rectification followed by fine-grained matching, a novel two-stage SIFT-based viewpoint-rectification module was proposed, which leveraged the viewpoint invariance of the Scale-Invariant Feature Transform (SIFT) algorithm and the geometric alignment capability of homography to enhance matching accuracy under large viewpoint variations. Then, a directional-gated attention mechanism was designed that employed a cascaded structure of multi-directional convolutions and dynamic gating to extract queries (Q), keys (K), and values (V). The injected geometric priors significantly enhanced the model’s robustness. Lastly, to mitigate information loss during the upsampling of fused features, the Fusion-DySample module was incorporated to further improve performance. Experimental results on the public MegaDepth dataset showed that our method achieved relative pose estimation AUCs of 57.1%, 72.7%, and 83.9% under rotation error thresholds of 5°, 10°, and 20°, respectively, outperforming E-LoFTR by 0.7%, 0.5%, and 0.4%. On the newly constructed NewMega dataset based on MegaDepth and on a private industrial dataset, our method also demonstrated substantial improvements in both the number of matches and matching accuracy.

Key words: image matching, E-LoFTR, large perspective variation, SIFT, attention mechanism

中图分类号:

TP391.41

向梦丽, 黄志勇, 佘雅丽, 丁妥君. 一种大视角变换场景下的图像匹配方法[J]. 图学学报, 2026, 47(1): 90-98.

XIANG Mengli, HUANG Zhiyong, SHE Yali, DING Tuojun. An image matching method for large viewpoint variation scenarios[J]. Journal of Graphics, 2026, 47(1): 90-98.

图/表 11

图1 方法总体架构((a) 预处理；(b) 特征提取；(c) 特征重构；(d) 粗匹配；(e) 特征融合；(f) 精匹配；(g) 逆变换)

Fig. 1 Overall framework of the method ((a) Preprocessing；(b) Feature extraction；(c) Feature reconstruction；(d) Coarse matching；(e) Feature fusion；(f) Fine matching；(g) Inverse transformation)

图2 双阶段SIFT视角矫正过程((a) 第一阶段；(b) RANSAC筛选；(c) 第二阶段；(d) 单应性变换)

Fig. 2 Two-stage SIFT-based viewpoint rectification Process ((a) First stage; (b) RANSAC filtering; (c) Second stage; (d) Homography warping)

图3 Fusion-DySample结构图

Fig. 3 Architecure of Fusion-DySample

表1 单应性估计结果对比/%

Table 1 Comparison of homography estimation results/%

方法	@3px	@5px	@10px
DISK+NN	52.3	64.9	78.9
SP+SuperGlue	53.9	68.3	81.7
SP+LightGlue	54.5	68.7	82.1
LoFTR	65.9	75.6	84.6
MatchFormer	66.2	76.1	85.6
E-LoFTR	66.5	76.4	85.5
Ours	67.0	77.1	86.2

图4 NewMega示例图((a) 原始图像；(b) 旋转60°后的图像；(c) 旋转90°后的图像；(d) 弱单应性变换图像；(e) 强单应性变换图像)

Fig. 4 Example images from NewMega dataset ((a) Original image; (b) Image rotated by 60°; (c) Image rotated by 90°; (d) Image with weak homography transformation; (e) Image with strong homography transformation)

表2 NewMega上的图像匹配结果对比/(m/p/MSR)

Table 2 Image matching results on NewMega dataset/(m/p/MSR)

方法	NewMega (r = 60)			NewMega (r =90)			NewMega (H-easy)			NewMega (H-hard)
方法	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%
SIFT	5 995	6 340	94.6	7 446	7 643	97.4	2 558	2 877	88.9	571	759	75.2
LightGlue	0	15	0	0	14	0	790	1 449	54.5	649	1 207	53.7
E-LoFTR	0	0	0	0	2	0	5 602	6 542	85.6	1 755	3 063	57.2
Ours	13 310	13 366	99.6	12 404	12 515	99.1	16 330	16 341	99.9	14 531	15 879	91.5

图5 NewMega上的图像匹配可视化对比结果

Fig. 5 Qualitative comparison of image matching on NewMega dataset ((a) SIFT; (b) LightGlue; (c) E-LoFTR; (d) Ours)

图6 OurData示例图((a) 原始图像；(b) 旋转60°后的图像；(c) 旋转90°后的图像；(d) 弱单应性变换图像；(e) 强单应性变换图像)

Fig. 6 Example images from OurData dataset ((a) Original image; (b) Image rotated by 60°; (c) Image rotated by 90°; (d) Image with weak homography transformation; (e) Image with strong homography transformation)

表3 OurData上的图像匹配结果对比

Table 3 Image matching results on OurData dataset

方法	OurData (r=60)			OurData (r=90)			OurData (H-easy)			OurData (H-hard)
方法	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%
SIFT	856	959	89.3	1 305	1 349	96.7	329	503	65.4	117	251	46.6
LightGlue	1	20	5.0	0	35	0	485	789	61.4	373	691	53.9
E-LoFTR	6	71	8.5	9	647	1.4	3 352	4 107	81.6	1 432	2 847	50.3
Ours	7 843	8 640	90.8	7 920	8 682	91.2	9 575	10 093	94.9	9 286	9 623	96.5

图7 OurData上的图像匹配可视化对比结果

Fig. 7 Qualitative comparison of image matching on OurData dataset ((a) SIFT; (b) LightGlue; (c) E-LoFTR; (d) Ours)

表4 消融实验

Table 4 Ablation study

方法	@5°	@10°	@20°
基础模型E-LoFTR	56.4	72.2	83.5
A	56.9	72.5	83.8
B	56.7	72.4	83.7
C	56.6	72.4	83.6
D	57.1	72.7	83.9

参考文献 28

[1]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI URL
[2]	BROWN M, LOWE D G. Automatic panoramic image stitching using invariant features[J]. International Journal of Computer Vision, 2007, 74(1): 59-73. DOI URL
[3]	CADENA C, CARLONE L, CARRILLO H, et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32(6): 1309-1332. DOI URL
[4]	石虹, 徐伟, 刘少清. 基于LW-LoFTR的增强现实三维注册算法[J]. 西安工程大学学报, 2025, 39(2): 75-83.
	SHI H, XU W, LIU S Q. Augmented reality 3D registration algorithm based on LW-LoFTR[J]. Journal of Xi’an Polytechnic University, 2025, 39(2): 75-83 (in Chinese).
[5]	舒军, 王江舸, 杨莉, 等. 改进R-LoFTR++的智能巡检特征匹配算法[J]. 重庆理工大学学报(自然科学), 2025, 39(2): 86-96.
	SHU J, WANG J G, YANG L, et al. Intelligent inspection feature matching algorithm of R-LoFTR++[J]. Journal of Chongqing University of Technology (Natural Science), 2025, 39(2): 86-96 (in Chinese).
[6]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI
[7]	BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359. DOI URL
[8]	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// 2011 International Conference on Computer Vision. New York: IEEE Press, 2011: 2564-2571.
[9]	ROSTEN E, DRUMMOND T. Machine learning for high-speed corner detection[C]// The 9th European Conference on Computer Vision. Cham: Springer, 2006: 430-443.
[10]	CALONDER M, LEPETIT V, STRECHA C, et al. BRIEF: binary robust independent elementary features[C]// The 11th European Conference on Computer Vision. Cham: Springer, 2010: 778-792.
[11]	SARLIN P E, DETONE D, MALISIEWICZ T, et al. SuperGlue: learning feature matching with graph neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4937-4946.
[12]	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: self-supervised interest point detection and description[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2018: 337-349.
[13]	LINDENBERGER P, SARLIN P E, POLLEFEYS M. LightGlue: local feature matching at light speed[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 17581-17592.
[14]	SHI Y, CAI J X, SHAVIT Y, et al. ClusterGNN: cluster-based coarse-to-fine graph neural network for efficient feature matching[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12507-12516.
[15]	JIANG H W, KARPUR A, CAO B Y, et al. OmniGlue: generalizable feature matching with foundation model guidance[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 19865-19875.
[16]	ZHENG F J, CAO C Q, ZHANG Z Y, et al. Ada-Matcher: a deep detector-based local feature matcher with adaptive weight sharing[J]. Knowledge-Based Systems, 2025, 316: 113350. DOI URL
[17]	SUN J M, SHEN Z H, WANG Y, et al. LoFTR: detector-free local feature matching with transformers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8918-8927.
[18]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[19]	WANG Q, ZHANG J M, YANG K L, et al. MatchFormer: interleaving attention in transformers for feature matching[C]// The 16th Asian Conference on Computer Vision. Cham: Springer, 2023: 256-273.
[20]	CHEN H K, LUO Z X, ZHOU L, et al. Aspanformer: detector-free image matching with adaptive span transformer[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 20-36.
[21]	YU J H, CHANG J H, HE J F, et al. Adaptive spot-guided transformer for consistent local feature matching[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 21898-21908.
[22]	张震宇, 杨小冈, 卢瑞涛, 等. Ada-LoFTR:自适应图像块增强的局部特征匹配算法[EB/OL]. 电光与控制. (2024-07-14) [2025-03-25]. https://link.cnki.net/urlid/41.1227.TN.20240711.1543.005.
	ZHANG Z Y, YANG X G, LU R T, et al. Ada-LoFTR:local feature matching algorithm for adaptive image block enhancement[EB/OL]. Electronics Optics & Control.(2024-07-14) [2025-03-25]. https://link.cnki.net/urlid/41.1227.TN.20240711.1543.005 (in Chinese).
[23]	郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746. DOI
	GUO Y H, WANG L C, LI S. Image feature matching based on repeatability and specificity constraints[J]. Journal of Graphics, 2023, 44(4): 739-746 (in Chinese).
[24]	WANG Y F, HE X Y, PENG S D, et al. Efficient LoFTR: semi-dense local feature matching with sparse-like speed[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 21666-21675.
[25]	DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly- annotated 3D reconstructions of indoor scenes[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2432-2443.
[26]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[27]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style convnets great again[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13728-13737.
[28]	LIU W Z, LU H, FU H T, et al. Learning to upsample by learning to sample[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 6004-6014.