A homography estimation method robust to illumination and occlusion

doi:10.11996/JG.j.2095-302X.2023010166

Abstract

Abstract:

Homography estimation is a basic task in the field of computer vision. In order to improve the robustness of homography estimation to illumination and occlusion, a homography estimation model based on unsupervised learning was proposed. This model took two stacked images as input and the estimated homography matrix as output. The bidirectional homography was proposed to estimate the average photometric loss. Then, in order to increase the receptive field and improve the resistance of the network model to deformation and position change, we introduced the spatial transformer networks (STN) module and deformation convolution to the network model. Finally, by inserting random occlusion shapes, the occlusion factors were introduced into the synthetic dataset of the homography estimation task for the first time, thus making the trained model robust to occlusion. Compared with the traditional methods, the proposed method could maintain the same or achieve better accuracy, and give superior performance in estimating the homography of image pairs with low texture or large illumination changes. Compared with the learning-based homography estimation method, the proposed method is robust to occlusion and performs better on real datasets.

Key words: homography estimation, deep learning, unsupervised, data augmentation

CLC Number:

TP391

FAN Zhen, LIU Xiao-jing, LI Xiao-bo, CUI Ya-chao. A homography estimation method robust to illumination and occlusion[J]. Journal of Graphics, 2023, 44(1): 166-176.

Figures/Tables 17

Fig. 1 Schematic diagram of unsupervised homography estimation method

Fig. 2 CNN network model with STN and deformation convolution

Fig. 3 Examples of different convolutions ((a) Ordinary convolution; (b) General deformation convolution; (c) Dilated convolution; (d) Special deformation convolution)

Fig. 4 S-COCO dataset generation algorithm ((a) Randomly obtain a square image block named Patch A from a picture; (b) Randomly perturb 4 corners of the square; (c) Calculate HAB according to (Δxi,Δyi) from step 2; (d) Calculate the inverse HAB matrix and apply it to the whole picture, and then obtain square image blocks of the same size at the same location)

Fig. 5 Examples of manually inserted random occlusion shapes (insert occlusion shape for each image in the image pair)

Fig. 6 Schematic diagram of occlusion shape insertion strategy ((a) Image pairs generated by the original dataset generation algorithm; (b) Image pairs generated by adding random occlusion insertion strategy; (c) Specific process of occlusion insertion strategy)

Table 1 Comparison of PDSO-COCO with other synthetic dataset

因素	数据集
因素	S-COCO	PDS-COCO	PDSO-COCO
光照	×	√	√
噪声	×	√	√
位移	√	√	√
视差	×	√	√
遮挡	×	×	√

Fig. 7 Example of real dataset ((a) Different parallax; (b) Different displacement degree; (c) Different scenarios)

Table 2 RMSE of each model on WarpedMS-COCO dataset

排名	方法
排名	SIFT+RANSAC	PFNet	HomographyNet	CAUDHEN	UDHEN	Ours
Top 0~30%	0.533	2.013	3.277	14.867	2.227	2.243
31%~60%	1.174	3.768	4.919	18.066	3.361	2.671
61%~100%	19.017	5.437	7.688	23.421	6.374	3.095
平均	9.738	3.857	5.673	18.798	4.176	2.781

Fig. 8 Description of the overlap rate of the pictures used for stitching ((a) Image pairs with very low overlap; (b) Image pairs with relatively high overlap)

Table 3 Laplacian of pictures assembled from various models on the real dataset

排名	方法
排名	SIFT+RANSAC	PFNet	HomographyNet	CAUDHEN	UDHEN	Ours
Top 0~30%	1133.175	962.593	898.766	-	933.278	1074.563
31%~60%	721.158	654.946	578.645	-	664.295	698.279
61%~100%	425.337	392.551	367.527	-	381.527	474.325
平均	724.475	645.341	590.234	-	632.784	715.443

Table 4 RMSE of the model on WarpedMS-COCO dataset when different loss functions are backpropagated

排名	回传双向单应性估计平均光度损失	回传普通光度损失
Top 0~30%	2.243	2.218
31%~60%	2.671	3.074
61%~100%	3.095	5.983
平均	2.781	4.056

Table 5 Laplacian of images obtained from image stitching using the homography estimated by model on real dataset when different loss functions are backpropagated

排名	回传双向单应性估计平均光度损失	回传普通光度损失
Top 0~30%	1074.563	974.263
31%~60%	698.279	652.379
61%~100%	474.325	399.281
平均	715.443	649.883

Fig. 9 Visual comparison of the image stitching quality ((a) Less texture; (b) Night; (c) Day; (d) Large parallax; (e) Repeated texture)

Table 6 RMSE of model on WarpedMS-COCO dataset with and without STN and deformation convolution

排名	引入STN与变形卷积	未引入STN与变形卷积
Top 0~30%	2.243	2.219
31%~60%	2.671	3.278
61%~100%	3.095	6.221
平均	2.781	4.109

Table 7 Laplacian of images obtained from image stitching using the homography estimated by model on real dataset when with and without STN and deformation convolution

排名	引入STN与变形卷积	未引入STN与变形卷积
Top 0~30%	1074.563	946.674
31%~60%	698.279	668.151
61%~100%	474.325	392.463
平均	715.443	647.386

Fig. 10 Visual comparison of the image stitching quality ((a) Less texture; (b) Night; (c) Day; (d) Large parallax; (e) Repeated texture)

References 24

[1]	LE H, LIU F, ZHANG S, et al. Deep homography estimation for dynamic scenes[C]//202 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7652-7661.
[2]	BAKER S, MATTHEWS I. Lucas-kanade 20 years on: a unifying framework[J]. International Journal of Computer Vision, 2004, 56(3): 221-255. DOI URL
[3]	LOWE D G. Object recognition from local scale-invariant features[C]//The 7th IEEE International Conference on Computer Vision. New York: IEEE Press, 1999: 1150-1157.
[4]	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]//2011 IEEE International Conference on Computer Vision. New York: IEEE Press, 2011: 2564-2571.
[5]	BAY H, TUYTELAARS T, GOOL L. Surf: speeded up robust features[C]//European Conference on Computer Vision. Cham: Springer International Publishin, 2006: 404-417.
[6]	BARONE F, MARRAZZO M, OTON C J. Camera calibration with weighted direct linear transformation and anisotropic uncertainties of image control points[J]. Sensors, 2020, 20(4): E1175.
[7]	DETONE D, MALISIEWICZ T, RABINOVICH A. Deep image homography estimation[EB/OL]. [2022-01-20]. https://doi.org/10.48550/arXiv.1606.03798.
[8]	ZENG R, DENMAN S, SRIDHARAN S, et al. Rethinking planar homography estimation using perspective fields[C]// Asian Conference on Computer Vision. Cham: Springer International Publishin, 2018: 571-586.
[9]	NGUYEN T, CHEN S, SHIVAKUMAR S. Unsupervised deep homography: a fast and robust homography estimation model[J]. IEEE Robotics and Automation Letters, 2018, 3(3): 2346-2353.
[10]	ZHANG J, WANG C, LIU S, et al. Content-aware unsupervised deep homography estimation[C]//European Conference on Computer Vision. Cham: Springer International Publishin, 2020: 653-669.
[11]	ZHANG S, NG W, ZHANG J, et al. Human activity recognition using radial basis function neural network trained via a minimization of localized generalization error[C]// International Conference on Ubiquitous Computing and Ambient Intelligence. Cham: Springer International Publishin, 2017: 498-507.
[12]	EVANGELIDIS G D, PSARAKIS E Z. Parametric image alignment using enhanced correlation coefficient maximization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(10): 1858-1865. DOI PMID
[13]	FISCHLER M, BOLLES R. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24(6): 381-395. DOI URL
[14]	BARATH D, MATAS J, NOSKOVA J. MAGSAC: marginalizing sample consensus[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10189-10197.
[15]	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks[C]//Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2015: 2017-2025.
[16]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 764-773.
[17]	HE K M, SUN J. Convolutional neural networks at constrained time cost[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 5353-5360.
[18]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[19]	GODARD C, AODHA O M, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6602-6611.
[20]	AMIRI A J, YAN LOO S, ZHANG H. Semi-supervised monocular depth estimation with left-right consistency using deep neural network[C]// 2019 IEEE International Conference on Robotics and Biomimetics. New York: IEEE Press, 2019: 602-607.
[21]	KOGUCIUK D, ARANI E, ZONOOZ B. Perceptual loss for robust unsupervised homography estimation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New York: IEEE Press, 2021: 4269-4278.
[22]	NIE L, LIN C Y, LIAO K, et al. Unsupervised deep image stitching: reconstructing stitched features to images[J]. IEEE Transactions on Image Processing, 2021, 30: 6184-6197. DOI URL
[23]	CHEN Y, GAO Y. Image denoising via steerable directional Laplacian regularizer[J]. Circuits, Systems, and Signal Processing, 2021, 40(12): 6265-6283. DOI
[24]	LI Z G, SHU H Y, ZHENG C B. Multi-scale single image dehazing using Laplacian and Gaussian Pyramids[J]. IEEE Transactions on Image Processing, 2021, 30: 9270-9279. DOI URL