An image matching method for large viewpoint variation scenarios

doi:10.11996/JG.j.2095-302X.2026010090

Abstract

Abstract:

To address the significant decline in matching accuracy and the number of correspondences exhibited by existing image-matching methods under large viewpoint variations, an improved image-matching approach based on E-LoFTR was proposed. Firstly, based on a strategy of viewpoint rectification followed by fine-grained matching, a novel two-stage SIFT-based viewpoint-rectification module was proposed, which leveraged the viewpoint invariance of the Scale-Invariant Feature Transform (SIFT) algorithm and the geometric alignment capability of homography to enhance matching accuracy under large viewpoint variations. Then, a directional-gated attention mechanism was designed that employed a cascaded structure of multi-directional convolutions and dynamic gating to extract queries (Q), keys (K), and values (V). The injected geometric priors significantly enhanced the model’s robustness. Lastly, to mitigate information loss during the upsampling of fused features, the Fusion-DySample module was incorporated to further improve performance. Experimental results on the public MegaDepth dataset showed that our method achieved relative pose estimation AUCs of 57.1%, 72.7%, and 83.9% under rotation error thresholds of 5°, 10°, and 20°, respectively, outperforming E-LoFTR by 0.7%, 0.5%, and 0.4%. On the newly constructed NewMega dataset based on MegaDepth and on a private industrial dataset, our method also demonstrated substantial improvements in both the number of matches and matching accuracy.

Key words: image matching, E-LoFTR, large perspective variation, SIFT, attention mechanism

CLC Number:

TP391.41

XIANG Mengli, HUANG Zhiyong, SHE Yali, DING Tuojun. An image matching method for large viewpoint variation scenarios[J]. Journal of Graphics, 2026, 47(1): 90-98.

Figures/Tables 11

Fig. 1 Overall framework of the method ((a) Preprocessing；(b) Feature extraction；(c) Feature reconstruction；(d) Coarse matching；(e) Feature fusion；(f) Fine matching；(g) Inverse transformation)

Fig. 2 Two-stage SIFT-based viewpoint rectification Process ((a) First stage; (b) RANSAC filtering; (c) Second stage; (d) Homography warping)

Fig. 3 Architecure of Fusion-DySample

Table 1 Comparison of homography estimation results/%

方法	@3px	@5px	@10px
DISK+NN	52.3	64.9	78.9
SP+SuperGlue	53.9	68.3	81.7
SP+LightGlue	54.5	68.7	82.1
LoFTR	65.9	75.6	84.6
MatchFormer	66.2	76.1	85.6
E-LoFTR	66.5	76.4	85.5
Ours	67.0	77.1	86.2

Fig. 4 Example images from NewMega dataset ((a) Original image; (b) Image rotated by 60°; (c) Image rotated by 90°; (d) Image with weak homography transformation; (e) Image with strong homography transformation)

Table 2 Image matching results on NewMega dataset/(m/p/MSR)

方法	NewMega (r = 60)			NewMega (r =90)			NewMega (H-easy)			NewMega (H-hard)
方法	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%
SIFT	5 995	6 340	94.6	7 446	7 643	97.4	2 558	2 877	88.9	571	759	75.2
LightGlue	0	15	0	0	14	0	790	1 449	54.5	649	1 207	53.7
E-LoFTR	0	0	0	0	2	0	5 602	6 542	85.6	1 755	3 063	57.2
Ours	13 310	13 366	99.6	12 404	12 515	99.1	16 330	16 341	99.9	14 531	15 879	91.5

Fig. 5 Qualitative comparison of image matching on NewMega dataset ((a) SIFT; (b) LightGlue; (c) E-LoFTR; (d) Ours)

Fig. 6 Example images from OurData dataset ((a) Original image; (b) Image rotated by 60°; (c) Image rotated by 90°; (d) Image with weak homography transformation; (e) Image with strong homography transformation)

Table 3 Image matching results on OurData dataset

方法	OurData (r=60)			OurData (r=90)			OurData (H-easy)			OurData (H-hard)
方法	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%	m	p	MSR/%
SIFT	856	959	89.3	1 305	1 349	96.7	329	503	65.4	117	251	46.6
LightGlue	1	20	5.0	0	35	0	485	789	61.4	373	691	53.9
E-LoFTR	6	71	8.5	9	647	1.4	3 352	4 107	81.6	1 432	2 847	50.3
Ours	7 843	8 640	90.8	7 920	8 682	91.2	9 575	10 093	94.9	9 286	9 623	96.5

Fig. 7 Qualitative comparison of image matching on OurData dataset ((a) SIFT; (b) LightGlue; (c) E-LoFTR; (d) Ours)

Table 4 Ablation study

方法	@5°	@10°	@20°
基础模型E-LoFTR	56.4	72.2	83.5
A	56.9	72.5	83.8
B	56.7	72.4	83.7
C	56.6	72.4	83.6
D	57.1	72.7	83.9

References 28

[1]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI URL
[2]	BROWN M, LOWE D G. Automatic panoramic image stitching using invariant features[J]. International Journal of Computer Vision, 2007, 74(1): 59-73. DOI URL
[3]	CADENA C, CARLONE L, CARRILLO H, et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age[J]. IEEE Transactions on Robotics, 2016, 32(6): 1309-1332. DOI URL
[4]	石虹, 徐伟, 刘少清. 基于LW-LoFTR的增强现实三维注册算法[J]. 西安工程大学学报, 2025, 39(2): 75-83.
	SHI H, XU W, LIU S Q. Augmented reality 3D registration algorithm based on LW-LoFTR[J]. Journal of Xi’an Polytechnic University, 2025, 39(2): 75-83 (in Chinese).
[5]	舒军, 王江舸, 杨莉, 等. 改进R-LoFTR++的智能巡检特征匹配算法[J]. 重庆理工大学学报(自然科学), 2025, 39(2): 86-96.
	SHU J, WANG J G, YANG L, et al. Intelligent inspection feature matching algorithm of R-LoFTR++[J]. Journal of Chongqing University of Technology (Natural Science), 2025, 39(2): 86-96 (in Chinese).
[6]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI
[7]	BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF)[J]. Computer Vision and Image Understanding, 2008, 110(3): 346-359. DOI URL
[8]	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// 2011 International Conference on Computer Vision. New York: IEEE Press, 2011: 2564-2571.
[9]	ROSTEN E, DRUMMOND T. Machine learning for high-speed corner detection[C]// The 9th European Conference on Computer Vision. Cham: Springer, 2006: 430-443.
[10]	CALONDER M, LEPETIT V, STRECHA C, et al. BRIEF: binary robust independent elementary features[C]// The 11th European Conference on Computer Vision. Cham: Springer, 2010: 778-792.
[11]	SARLIN P E, DETONE D, MALISIEWICZ T, et al. SuperGlue: learning feature matching with graph neural networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4937-4946.
[12]	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: self-supervised interest point detection and description[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2018: 337-349.
[13]	LINDENBERGER P, SARLIN P E, POLLEFEYS M. LightGlue: local feature matching at light speed[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 17581-17592.
[14]	SHI Y, CAI J X, SHAVIT Y, et al. ClusterGNN: cluster-based coarse-to-fine graph neural network for efficient feature matching[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12507-12516.
[15]	JIANG H W, KARPUR A, CAO B Y, et al. OmniGlue: generalizable feature matching with foundation model guidance[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 19865-19875.
[16]	ZHENG F J, CAO C Q, ZHANG Z Y, et al. Ada-Matcher: a deep detector-based local feature matcher with adaptive weight sharing[J]. Knowledge-Based Systems, 2025, 316: 113350. DOI URL
[17]	SUN J M, SHEN Z H, WANG Y, et al. LoFTR: detector-free local feature matching with transformers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8918-8927.
[18]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[19]	WANG Q, ZHANG J M, YANG K L, et al. MatchFormer: interleaving attention in transformers for feature matching[C]// The 16th Asian Conference on Computer Vision. Cham: Springer, 2023: 256-273.
[20]	CHEN H K, LUO Z X, ZHOU L, et al. Aspanformer: detector-free image matching with adaptive span transformer[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 20-36.
[21]	YU J H, CHANG J H, HE J F, et al. Adaptive spot-guided transformer for consistent local feature matching[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 21898-21908.
[22]	张震宇, 杨小冈, 卢瑞涛, 等. Ada-LoFTR:自适应图像块增强的局部特征匹配算法[EB/OL]. 电光与控制. (2024-07-14) [2025-03-25]. https://link.cnki.net/urlid/41.1227.TN.20240711.1543.005.
	ZHANG Z Y, YANG X G, LU R T, et al. Ada-LoFTR:local feature matching algorithm for adaptive image block enhancement[EB/OL]. Electronics Optics & Control.(2024-07-14) [2025-03-25]. https://link.cnki.net/urlid/41.1227.TN.20240711.1543.005 (in Chinese).
[23]	郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746. DOI
	GUO Y H, WANG L C, LI S. Image feature matching based on repeatability and specificity constraints[J]. Journal of Graphics, 2023, 44(4): 739-746 (in Chinese).
[24]	WANG Y F, HE X Y, PENG S D, et al. Efficient LoFTR: semi-dense local feature matching with sparse-like speed[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 21666-21675.
[25]	DAI A, CHANG A X, SAVVA M, et al. ScanNet: richly- annotated 3D reconstructions of indoor scenes[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2432-2443.
[26]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[27]	DING X H, ZHANG X Y, MA N N, et al. RepVGG: making VGG-style convnets great again[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13728-13737.
[28]	LIU W Z, LU H, FU H T, et al. Learning to upsample by learning to sample[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 6004-6014.