Image feature matching based on repeatability and specificity constraints

doi:10.11996/JG.j.2095-302X.2023040739

Abstract

Abstract:

Image feature matching ascertains whether a pair of pixels can be matched by comparing their distance in the feature space. Therefore, how to learn robust pixel features constitutes one of the primary concerns in the field of image feature matching based on deep learning. In addition, the learning of pixel feature representation is also affected by the quality of the source image. As a solution to the challenge of learning more robust pixel feature representations, the proposed method improved the image feature matching network LoFTR. For the coarse granularity feature reconstruction branch, the specificity constraint was defined to maximize the feature distance between pixels within the same image, enabling strong distinguishability between different pixels. The repeatability constraint was defined to minimize the feature distance between the matched pixels from different images, enabling strong similarity between the matched pixels across different images and thus enhancing the accuracy of matching. Additionally, an image reconstruction layer was incorporated into the decoding phase of the Backbone, and image reconstruction loss was defined to constrain the encoder to learn more robust feature representation. The experimental results on indoor dataset ScanNet and outdoor dataset MegeDepth show the effectiveness of the proposed method. Furthermore, based on images with different qualities, it is verified that the proposed method can better adapt to image feature matching when the source images have different quality.

Key words: deep learning, image feature matching, repeatability, specificity, image reconstruction loss

CLC Number:

TP391

GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints[J]. Journal of Graphics, 2023, 44(4): 739-746.

Figures/Tables 9

References 26

[1]	吴凡, 宗艳桃, 汤霞清. 视觉SLAM的研究现状与展望[J]. 计算机应用研究, 2020, 37(8): 2248-2254.
	WU F, ZONG Y T, TANG X Q. Research status and prospect of vision SLAM[J]. Application Research of Computers, 2020, 37(8): 2248-2254 (in Chinese).
[2]	SUN J M, SHEN Z H, WANG Y A, et al. LoFTR: detector-free local feature matching with transformers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8922-8931.
[3]	KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: fast autoregressive transformers with linear attention[EB/OL]. [2022-05-11]. https://arxiv.org/abs/2006.16236.
[4]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI URL
[5]	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// 2011 International Conference on Computer Vision. New York: IEEE Press, 2011: 2564-2571.
[6]	ROSTEN E, DRUMMOND T. Machine learning for high-speed corner detection[C]// The 9th European Conference on Computer Vision - Volume Part I. New York: ACM, 2006: 430-443.
[7]	CALONDER M, LEPETIT V, STRECHA C, et al. Brief: binary robust independent elementary features[C]// European Conference on Computer Vision. Heidelberg: Springer, 2010: 778-792.
[8]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI URL
[9]	YI K M, TRULLS E, LEPETIT V, et al. LIFT: Learned Invariant Feature Transform[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2016: 467-483.
[10]	ETONE D, MALISIEWICZ T, RABINOVICH A. Toward geometric deep SLAM[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1707.07410.
[11]	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: self-supervised interest point detection and description[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2018: 337:1-337:12.
[12]	SARLIN P E, DETONE D, MALISIEWICZ T, et al. Superglue: learning feature matching with graph neural networks[C]// 2020 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4938-4947.
[13]	白铂, 刘玉婷, 马驰骋, 等. 图神经网络[J]. 中国科学: 数学, 2020, 3: 367-384.
	BAI B, LIU Y T, MA C C, et al. Graph neural network[J]. Science in China: Mathematics, 2020, 3: 367-384 (in Chinese).
[14]	ROCCO I, ARANDJELOVIĆ R, SIVIC J. Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 605-621.
[15]	YANG G, RAMANAN D. Volumetric correspondence networks for optical flow[C]// Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2019: 794-805.
[16]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2125.
[17]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[18]	ROCCO I, CIMPOI M, ARANDJELOVIĆ R, et al. Neighbourhood consensus networks[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1810.10510.
[19]	TYSZKIEWICZ M J, FUA P, TRULLS E. DISK: learning local features with policy gradient[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 14254-14265.
[20]	BALNTAS V, LENC K, VEDALDI A, et al. HPatches: a benchmark and evaluation of handcrafted and learned local descriptors[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5173-5182.
[21]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from Internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[22]	REVAUD J, WEINZAEPFEL P, DE SOUZA C, et al. R2D2: repeatable and reliable detector and descriptor[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1906.06195.
[23]	DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable cnn for joint detection and description of local features[EB/OL]. [2022-05-16]. https://arxiv.org/abs1905.03561.
[24]	TYSZKIEWICZ M J, FUA P, TRULLS E. DISK: learning local features with policy gradient[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 14254-14265.
[25]	LI X H, HAN K, LI S D, et al. Dual-resolution correspondence networks[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17346-17357.
[26]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.

类别	方法	单应性估计AUC
类别	方法	@3px	@5px	@10px
有检测器	D2Net+NN	23.2	35.9	53.6
	R2D2+NN	50.6	63.9	76.8
	DISK+NN	52.3	64.9	78.9
	SP+ SuperGlue	53.9	68.3	81.7
无检测器	DRC-Net	50.6	56.2	68.3
	LoFTR	65.9	75.6	84.6
	Ours	66.8	76.9	86.1

类别	方法	单应性估计AUC
类别	方法	@3px	@5px	@10px
有检测器	D2Net+NN	23.2	35.9	53.6
	R2D2+NN	50.6	63.9	76.8
	DISK+NN	52.3	64.9	78.9
	SP+ SuperGlue	53.9	68.3	81.7
无检测器	DRC-Net	50.6	56.2	68.3
	LoFTR	65.9	75.6	84.6
	Ours	66.8	76.9	86.1

类别	方法	位姿估计AUC
类别	方法	@5°	@10°	@20°
有检测器	SP^[11]+Superglue	16.16	33.81	51.84
无检测器	DRC-Net	7.69	17.93	30.49
	LoFTR	22.06	40.80	57.96
	Ours	22.87	41.75	59.10

类别	方法	位姿估计AUC
类别	方法	@5°	@10°	@20°
有检测器	SP^[11]+Superglue	16.16	33.81	51.84
无检测器	DRC-Net	7.69	17.93	30.49
	LoFTR	22.06	40.80	57.96
	Ours	22.87	41.75	59.10

类别	方法	位姿估计AUC
类别	方法	@5°	@10°	@20°
有检测器	SP+Superglue	42.18	61.16	75.96
无检测器	DRC-Net	27.01	42.96	58.31
	LoFTR	52.81	69.19	81.18
	Ours	53.63	70.20	83.56