基于重复性和特异性约束的图像特征匹配

doi:10.11996/JG.j.2095-302X.2023040739

图学学报 ›› 2023, Vol. 44 ›› Issue (4): 739-746.DOI: 10.11996/JG.j.2095-302X.2023040739

• 图像处理与计算机视觉 • 上一篇下一篇

基于重复性和特异性约束的图像特征匹配

郭印宏(), 王立春(), 李爽

北京工业大学信息学部，北京 100124

收稿日期:2022-11-28 接受日期:2023-04-06 出版日期:2023-08-31 发布日期:2023-08-16
通讯作者: 王立春(1975-)，女，教授，博士。主要研究方向为计算机视觉、人机交互等。E-mail：wanglc@bjut.edu.cn
作者简介:
郭印宏(1997-)，男，硕士研究生。主要研究方向为计算机视觉。E-mail：gyh20200216@163.com
基金资助:
科技创新2030-“新一代人工智能”重大项目(2021ZD0111902);国家自然科学基金项目(U21B2038);国家自然科学基金项目(61876012);国家自然科学基金项目(62172022);中国高校产学研创新基金项目(2021JQR023)

Image feature matching based on repeatability and specificity constraints

GUO Yin-hong(), WANG Li-chun(), LI Shuang

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

Received:2022-11-28 Accepted:2023-04-06 Online:2023-08-31 Published:2023-08-16
Contact: Wang Li-chun (1975-), professo, Ph.D. Her main research interests cover computer vision and human-computer interaction, etc. E-mail：wanglc@bjut.edu.cn
About author:
GUO Yin-hong (1997-), master student. His main research interest covers computer vision. E-mail：gyh20200216@163.com
Supported by:
Science and Technology Innovation 2030 - “New Generation of Artificial Intelligence” Major Project(2021ZD0111902);National Natural Science Foundation of China(U21B2038);National Natural Science Foundation of China(61876012);National Natural Science Foundation of China(62172022);Foundation for China University Industry-University Research Innovation(2021JQR023)

摘要/Abstract

摘要：

图像特征匹配通过比较一对像素在特征空间的距离确定其是否可匹配，如何学习鲁棒的像素特征是基于深度学习的图像特征匹配要解决的关键问题之一，另外，像素特征表示的学习也受到源图像质量的影响。针对学习更鲁棒的像素特征表示的问题，对图像特征匹配网络LoFTR进行改进。针对粗粒度特征重构分支，定义特异性约束使得同一幅图像内像素的特征距离尽可能远，使不同像素间具有强区分性；定义重复性约束使得不同图像的匹配点对的特征距离尽可能近，使不同图像间的匹配像素点具有强相似性，以增强匹配的准确性。在Backbone的解码阶段增加图像重建层，定义图像重建损失约束编码器学习更鲁棒的特征表示。在室内数据集ScanNet与室外数据集MegaDepth上的实验结果证明了本文方法的有效性，构建了不同质量图像数据并验证了方法能够更好地适应不同质量图像的特征匹配。

关键词: 深度学习, 图像特征匹配, 重复性, 特异性, 图像重建损失

Abstract:

Image feature matching ascertains whether a pair of pixels can be matched by comparing their distance in the feature space. Therefore, how to learn robust pixel features constitutes one of the primary concerns in the field of image feature matching based on deep learning. In addition, the learning of pixel feature representation is also affected by the quality of the source image. As a solution to the challenge of learning more robust pixel feature representations, the proposed method improved the image feature matching network LoFTR. For the coarse granularity feature reconstruction branch, the specificity constraint was defined to maximize the feature distance between pixels within the same image, enabling strong distinguishability between different pixels. The repeatability constraint was defined to minimize the feature distance between the matched pixels from different images, enabling strong similarity between the matched pixels across different images and thus enhancing the accuracy of matching. Additionally, an image reconstruction layer was incorporated into the decoding phase of the Backbone, and image reconstruction loss was defined to constrain the encoder to learn more robust feature representation. The experimental results on indoor dataset ScanNet and outdoor dataset MegeDepth show the effectiveness of the proposed method. Furthermore, based on images with different qualities, it is verified that the proposed method can better adapt to image feature matching when the source images have different quality.

Key words: deep learning, image feature matching, repeatability, specificity, image reconstruction loss

中图分类号:

TP391

郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746.

GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints[J]. Journal of Graphics, 2023, 44(4): 739-746.

图/表 9

图1 总体框架

Fig. 1 Overview of framework

图2 Transformer编码器层

Fig. 2 Transformer encoder layer

表1 HPatches上单应性估计

Table 1 Homography estimation on HPatches

类别	方法	单应性估计AUC
类别	方法	@3px	@5px	@10px
有检测器	D2Net+NN	23.2	35.9	53.6
	R2D2+NN	50.6	63.9	76.8
	DISK+NN	52.3	64.9	78.9
	SP+ SuperGlue	53.9	68.3	81.7
无检测器	DRC-Net	50.6	56.2	68.3
	LoFTR	65.9	75.6	84.6
	Ours	66.8	76.9	86.1

表2 室内数据集ScanNet上的相对位姿估计

Table 2 Relative pose estimation on indoor dataset ScanNet

类别	方法	位姿估计AUC
类别	方法	@5°	@10°	@20°
有检测器	SP^[11]+Superglue	16.16	33.81	51.84
无检测器	DRC-Net	7.69	17.93	30.49
	LoFTR	22.06	40.80	57.96
	Ours	22.87	41.75	59.10

表3 室外数据集MegaDepth上的相对位姿估计

Table 3 Relative pose estimation on outdoor dataset MegaDepth

类别	方法	位姿估计AUC
类别	方法	@5°	@10°	@20°
有检测器	SP+Superglue	42.18	61.16	75.96
无检测器	DRC-Net	27.01	42.96	58.31
	LoFTR	52.81	69.19	81.18
	Ours	53.63	70.20	83.56

图3 模糊图像示例((a)原图像；(b)模糊核5×5；(c)模糊核12×12；(d)模糊核24×24)

Fig. 3 Examples of blurred image ((a) Original image; (b) Blurring kernel 5×5; (c) Blurring kernel 12×12; (d) Blurring kernel 24×24)

表4 基于不同模糊程度图像的位姿估计对比

Table 4 Comparison of pose estimation using images with different blurriness

方法	模糊核	位姿估计AUC
方法	模糊核	@5°	@10°	@20°
LoFTR	5×5	40.63	56.70	70.53
Ours	5×5	44.60	63.50	76.52
LoFTR	12×12	32.37	47.32	61.68
Ours	12×12	41.10	59.5	70.63
LoFTR	24×24	18.86	31.86	47.18
Ours	24×24	32.68	45.20	57.24

表5 消融实验

Table 5 Ablation Experiment

数据集	图像重建	重复性和特异性约束	位姿估计AUC
数据集	图像重建	重复性和特异性约束	@5°	@10°	@20°
MegaDepth	√	√	53.63	70.20	83.56
	√	-	52.88	69.30	81.18
	-	√	53.58	70.16	83.47
	-	-	52.81	69.19	81.18
MegaDepth-B (模糊核12×12)	√	√	41.10	59.5	70.63
	√	-	40.56	58.4	69.03
	-	√	33.96	49.12	63.98
	-	-	32.37	47.32	61.68

图4 MegaDepth数据集上可视化结果((a) MegaDepth中的清晰图像；(b)模糊程度适中(模糊核12×12)；(c)图像质量较差、模糊程度较高(模糊核24×24))

Fig. 4 Visualization results on MegaDepth dataset ((a) Clear images in MegaDepth; (b) Moderate blurring (blurring kernel 12×12); (c) Poor image quality and high blurring (blurring kernel 24×24))

参考文献 26

[1]	吴凡, 宗艳桃, 汤霞清. 视觉SLAM的研究现状与展望[J]. 计算机应用研究, 2020, 37(8): 2248-2254.
	WU F, ZONG Y T, TANG X Q. Research status and prospect of vision SLAM[J]. Application Research of Computers, 2020, 37(8): 2248-2254 (in Chinese).
[2]	SUN J M, SHEN Z H, WANG Y A, et al. LoFTR: detector-free local feature matching with transformers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 8922-8931.
[3]	KATHAROPOULOS A, VYAS A, PAPPAS N, et al. Transformers are RNNs: fast autoregressive transformers with linear attention[EB/OL]. [2022-05-11]. https://arxiv.org/abs/2006.16236.
[4]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. DOI URL
[5]	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// 2011 International Conference on Computer Vision. New York: IEEE Press, 2011: 2564-2571.
[6]	ROSTEN E, DRUMMOND T. Machine learning for high-speed corner detection[C]// The 9th European Conference on Computer Vision - Volume Part I. New York: ACM, 2006: 430-443.
[7]	CALONDER M, LEPETIT V, STRECHA C, et al. Brief: binary robust independent elementary features[C]// European Conference on Computer Vision. Heidelberg: Springer, 2010: 778-792.
[8]	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras[J]. IEEE Transactions on Robotics, 2017, 33(5): 1255-1262. DOI URL
[9]	YI K M, TRULLS E, LEPETIT V, et al. LIFT: Learned Invariant Feature Transform[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2016: 467-483.
[10]	ETONE D, MALISIEWICZ T, RABINOVICH A. Toward geometric deep SLAM[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1707.07410.
[11]	DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: self-supervised interest point detection and description[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2018: 337:1-337:12.
[12]	SARLIN P E, DETONE D, MALISIEWICZ T, et al. Superglue: learning feature matching with graph neural networks[C]// 2020 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4938-4947.
[13]	白铂, 刘玉婷, 马驰骋, 等. 图神经网络[J]. 中国科学: 数学, 2020, 3: 367-384.
	BAI B, LIU Y T, MA C C, et al. Graph neural network[J]. Science in China: Mathematics, 2020, 3: 367-384 (in Chinese).
[14]	ROCCO I, ARANDJELOVIĆ R, SIVIC J. Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 605-621.
[15]	YANG G, RAMANAN D. Volumetric correspondence networks for optical flow[C]// Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2019: 794-805.
[16]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2125.
[17]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[18]	ROCCO I, CIMPOI M, ARANDJELOVIĆ R, et al. Neighbourhood consensus networks[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1810.10510.
[19]	TYSZKIEWICZ M J, FUA P, TRULLS E. DISK: learning local features with policy gradient[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 14254-14265.
[20]	BALNTAS V, LENC K, VEDALDI A, et al. HPatches: a benchmark and evaluation of handcrafted and learned local descriptors[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5173-5182.
[21]	LI Z Q, SNAVELY N. MegaDepth: learning single-view depth prediction from Internet photos[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2041-2050.
[22]	REVAUD J, WEINZAEPFEL P, DE SOUZA C, et al. R2D2: repeatable and reliable detector and descriptor[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1906.06195.
[23]	DUSMANU M, ROCCO I, PAJDLA T, et al. D2-net: a trainable cnn for joint detection and description of local features[EB/OL]. [2022-05-16]. https://arxiv.org/abs1905.03561.
[24]	TYSZKIEWICZ M J, FUA P, TRULLS E. DISK: learning local features with policy gradient[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 14254-14265.
[25]	LI X H, HAN K, LI S D, et al. Dual-resolution correspondence networks[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17346-17357.
[26]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.

基于重复性和特异性约束的图像特征匹配

Image feature matching based on repeatability and specificity constraints

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨陈成 , 董秀成 , 侯兵 , 张党成 , 向贤明 , 冯琪茗 . 基于参考的Transformer纹理迁移深度图像超分辨率重建 [J]. 图学学报, 2023, 44(5): 861-867.
[2]	党宏社 , 许怀彪 , 张选德 . 融合结构信息的深度学习立体匹配算法 [J]. 图学学报, 2023, 44(5): 899-906.
[3]	翟永杰, 郭聪彬, 王乾铭, 赵宽, 白云山, 张冀 . 基于隐含空间知识融合的输电线路多金具检测方法 [J]. 图学学报, 2023, 44(5): 918-927.
[4]	杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型 [J]. 图学学报, 2023, 44(5): 955-965.
[5]	毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
[6]	曹义亲, 周一纬, 徐露. 基于E-YOLOX的实时金属表面缺陷检测算法[J]. 图学学报, 2023, 44(4): 677-690.
[7]	邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717.
[8]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
[9]	毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455.
[10]	王佳婧, 王晨, 朱媛媛, 王笑梅. 基于民国纸币的图元素匹配检索[J]. 图学学报, 2023, 44(3): 492-501.
[11]	杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215.
[12]	曾武, 朱恒亮, 邢树礼, 林江宏, 毛国君. 显著性检测引导的图像数据增强方法[J]. 图学学报, 2023, 44(2): 260-270.
[13]	罗启明, 吴昊, 夏信, 袁国武. 基于Dual Dense U-Net的云南壁画破损区域预测[J]. 图学学报, 2023, 44(2): 304-312.
[14]	李洪安, 郑峭雪, 陶若霖, 张敏, 李占利, 康宝生. 基于深度学习的图像超分辨率研究综述[J]. 图学学报, 2023, 44(1): 1-15.
[15]	邵英杰, 尹辉, 谢颖, 黄华. 草图引导的选择循环推理式人脸图像修复网络[J]. 图学学报, 2023, 44(1): 67-76.