A local optimization generation model for image inpainting

doi:10.11996/JG.j.2095-302X.2023050955

Abstract

Abstract:

Image inpainting has extensive applications in photo editing and removal. In order to address the limitations of existing deep learning-based image inpainting model, which is affected by the receptive field of convolution operators and results in distorted structure or blurred texture, a locally optimized generation model LesT-GAN was proposed. This model comprised a generator and a discriminator. The generator consisted of a locally enhanced sliding window Transformer module. This module combined the translation invariance and locality advantages of deep convolution with the Transformer’s ability to model global information. As a result, it could cover a wide range of receptive fields while optimizing local details. The discriminator part was a relative average discriminator based on mask guidance and patch. It simulated pixel propagation around the boundary of the missing region by estimating the average probability of a given real image being more realistic than a generated image. As a result, during the generator training, it could generate clearer local textures directly from real images. In comparison experiments with other advanced image inpainting methods on the Places2, CelebA-HQ, and PairsStreet datasets, LesT-GAN improved L₁ and FID by more than 10.8% and 41.36%, respectively. Experimental results demonstrated that LesT-GAN exhibited superior restoration performance across multiple scenes, and that it could be well generalized to images with higher resolution than those used during training.

Key words: deep learning, image inpainting, generation model, Transformer, local optimization

CLC Number:

TP391

YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting[J]. Journal of Graphics, 2023, 44(5): 955-965.

Figures/Tables 12

References 27

[1]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. DOI URL
[2]	YU F, KOLTUN V, FUNKHOUSER T. Dilated residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 636-644.
[3]	LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]// Computer Vision - ECCV 2018: 15th European Conference. New York: ACM, 2018: 89-105.
[4]	YU J H, LIN Z, YANG J M, et al. Free-form image inpainting with gated convolution[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4470-4479.
[5]	LIU H Y, JIANG B, SONG Y B, et al. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 725-741.
[6]	YU J H, LIN Z, YANG J M, et al. Generative image inpainting with contextual attention[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5505-5514.
[7]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2010.11929.
[8]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241.
[9]	PATHAK D, KRÄHENBÜHL P, DONAHUE J, et al. Context encoders: feature learning by inpainting[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2536-2544.
[10]	YAN Z Y, LI X M, LI M, et al. Shift-net: image inpainting via deep feature rearrangement[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[11]	REN Y R, YU X M, ZHANG R N, et al. StructureFlow: image inpainting via structure-aware appearance flow[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 181-190.
[12]	SONG Y H, YANG C, SHEN Y J, et al. SPG-net: segmentation prediction and guidance network for image inpainting[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1805.03356.
[13]	NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: structure guided image inpainting using edge prediction[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop. New York: IEEE Press, 2020: 3265-3274.
[14]	IIZUKA S, SIMO-SERRA E, ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14.
[15]	DEMIR U, UNAL G. Patch-based image inpainting with generative adversarial networks[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1803.07422.
[16]	JOLICOEUR-MARTINEAU A. The relativistic discriminator: a key element missing from standard GAN[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1807.00734.
[17]	ZHU M Y, HE D L, LI X, et al. Image inpainting by end-to-end cascaded refinement with mask awareness[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2021, 30: 4855-4866. DOI URL
[18]	ZENG Y H, FU J L, CHAO H Y, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 29(7): 3266-3280. DOI URL
[19]	LIU H Y, JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4169-4178.
[20]	ZHENG C X, CHAM T J, CAI J F. Pluralistic image completion[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1438-1447.
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[22]	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 9992-10002.
[23]	WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 22-31.
[24]	LI Y W, ZHANG K, CAO J Z, et al. LocalViT: bringing locality to vision transformers[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2104.05707.
[25]	ZHOU B L, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452-1464. DOI PMID
[26]	KARRAS T, AILA T M, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1710.10196.
[27]	DOERSCH C, SINGH S, GUPTA A, et al. What makes Paris look like Paris?[J]. ACM Transactions on Graphics, 31(4): 101: 1-101:9.

数据集		Places2					CelebA-HQ
掩码大小		10%~20%	21%~30%	31%~40%	41%~50%	51%~60%	10%~20%	21%~30%	31%~40%	41%~50%	51%~60%
L₁(%) ↓	CA^[6]	2.89	4.82	6.69	8.36	11.43	2.13	2.32	3.47	5.44	7.29
	EdgeConnect^[13]	2.21	3.13	3.95	5.02	8.17	1.64	2.02	2.66	3.39	5.17
	GatedConv^[4]	2.46	3.39	4.28	6.21	8.43	1.89	2.10	2.87	3.81	5.23
	MADF^[17]	2.39	3.04	3.87	4.89	7.11	1.52	1.93	2.50	3.20	4.89
	AOT-GAN^[18]	2.02	2.72	3.71	4.85	7.31	1.26	1.74	2.39	3.17	5.12
	LesT-GAN(本文)	1.03	1.85	2.84	4.03	6.50	0.75	1.25	1.91	2.67	4.41
PRNS ↑	CA^[6]	24.43	21.19	19.84	17.79	16.21	29.69	26.92	24.80	22.49	18.23
	EdgeConnect^[13]	27.97	25.04	22.32	21.21	18.97	32.08	29.59	27.13	25.18	22.07
	GatedConv^[4]	27.34	24.27	21.45	19.76	17.80	31.83	29.47	26.89	24.87	21.66
	MADF^[17]	28.71	26.28	24.20	22.43	19.78	32.27	29.85	27.55	25.60	22.46
	AOT-GAN^[18]	29.47	26.51	24.15	22.21	19.44	33.17	30.20	27.63	25.51	22.12
	LesT-GAN(本文)	31.06	27.35	24.75	22.72	19.81	34.53	30.98	28.29	26.16	22.85
SSIM ↑	CA^[6]	0.853	0.766	0.702	0.644	9.541	0.902	0.866	0.803	0.741	0.667
	EdgeConnect^[13]	0.921	0.871	0.812	0.750	0.652	0.941	0.911	0.872	0.826	0.752
	GatedConv^[4]	0.909	0.857	0.790	0.721	0.644	0.934	0.906	0.867	0.815	0.749
	MADF^[17]	0.923	0.881	0.831	0.774	0.678	0.947	0.918	0.880	0.838	0.758
	AOT-GAN^[18]	0.932	0.887	0.833	0.771	0.669	0.953	0.923	0.884	0.839	0.756
	LesT-GAN(本文)	0.950	0.904	0.851	0.788	0.684	0.962	0.929	0.889	0.844	0.762
FID ↓	CA^[6]	7.43	17.25	31.40	53.47	66.23	5.83	7.72	9.79	13.07	21.61
	EdgeConnect^[13]	2.84	4.11	7.72	15.51	40.32	4.07	5.14	7.23	9.13	15.39
	GatedConv^[4]	2.70	3.83	6.58	13.72	36.43	3.82	4.96	6.76	8.53	14.26
	MADF^[17]	1.65	2.98	5.14	8.46	21.18	2.60	3.43	4.69	6.21	10.88
	AOT-GAN^[18]	2.08	3.37	5.83	10.41	26.61	3.48	4.49	5.88	7.49	12.17
	LesT-GAN(本文)	0.18	0.76	1.43	2.45	6.65	0.73	1.48	2.46	3.67	6.38

数据集		Places2					CelebA-HQ
掩码大小		10%~20%	21%~30%	31%~40%	41%~50%	51%~60%	10%~20%	21%~30%	31%~40%	41%~50%	51%~60%
L₁(%) ↓	CA^[6]	2.89	4.82	6.69	8.36	11.43	2.13	2.32	3.47	5.44	7.29
	EdgeConnect^[13]	2.21	3.13	3.95	5.02	8.17	1.64	2.02	2.66	3.39	5.17
	GatedConv^[4]	2.46	3.39	4.28	6.21	8.43	1.89	2.10	2.87	3.81	5.23
	MADF^[17]	2.39	3.04	3.87	4.89	7.11	1.52	1.93	2.50	3.20	4.89
	AOT-GAN^[18]	2.02	2.72	3.71	4.85	7.31	1.26	1.74	2.39	3.17	5.12
	LesT-GAN(本文)	1.03	1.85	2.84	4.03	6.50	0.75	1.25	1.91	2.67	4.41
PRNS ↑	CA^[6]	24.43	21.19	19.84	17.79	16.21	29.69	26.92	24.80	22.49	18.23
	EdgeConnect^[13]	27.97	25.04	22.32	21.21	18.97	32.08	29.59	27.13	25.18	22.07
	GatedConv^[4]	27.34	24.27	21.45	19.76	17.80	31.83	29.47	26.89	24.87	21.66
	MADF^[17]	28.71	26.28	24.20	22.43	19.78	32.27	29.85	27.55	25.60	22.46
	AOT-GAN^[18]	29.47	26.51	24.15	22.21	19.44	33.17	30.20	27.63	25.51	22.12
	LesT-GAN(本文)	31.06	27.35	24.75	22.72	19.81	34.53	30.98	28.29	26.16	22.85
SSIM ↑	CA^[6]	0.853	0.766	0.702	0.644	9.541	0.902	0.866	0.803	0.741	0.667
	EdgeConnect^[13]	0.921	0.871	0.812	0.750	0.652	0.941	0.911	0.872	0.826	0.752
	GatedConv^[4]	0.909	0.857	0.790	0.721	0.644	0.934	0.906	0.867	0.815	0.749
	MADF^[17]	0.923	0.881	0.831	0.774	0.678	0.947	0.918	0.880	0.838	0.758
	AOT-GAN^[18]	0.932	0.887	0.833	0.771	0.669	0.953	0.923	0.884	0.839	0.756
	LesT-GAN(本文)	0.950	0.904	0.851	0.788	0.684	0.962	0.929	0.889	0.844	0.762
FID ↓	CA^[6]	7.43	17.25	31.40	53.47	66.23	5.83	7.72	9.79	13.07	21.61
	EdgeConnect^[13]	2.84	4.11	7.72	15.51	40.32	4.07	5.14	7.23	9.13	15.39
	GatedConv^[4]	2.70	3.83	6.58	13.72	36.43	3.82	4.96	6.76	8.53	14.26
	MADF^[17]	1.65	2.98	5.14	8.46	21.18	2.60	3.43	4.69	6.21	10.88
	AOT-GAN^[18]	2.08	3.37	5.83	10.41	26.61	3.48	4.49	5.88	7.49	12.17
	LesT-GAN(本文)	0.18	0.76	1.43	2.45	6.65	0.73	1.48	2.46	3.67	6.38

实验块	L₁(%) ↓	PRNS ↑	SSIM ↑	FID ↓
GatedConv块	3.19	26.87	0.853	7.28
AOT块	2.27	29.11	0.880	3.22
Swin Transformer块	1.85	30.72	0.898	2.10
LesT块	1.82	30.96	0.906	2.09

实验块	L₁(%) ↓	PRNS ↑	SSIM ↑	FID ↓
GatedConv块	3.19	26.87	0.853	7.28
AOT块	2.27	29.11	0.880	3.22
Swin Transformer块	1.85	30.72	0.898	2.10
LesT块	1.82	30.96	0.906	2.09

实验块	L₁(%) ↓	PRNS ↑	SSIM ↑	FID ↓
FFN块	1.83	30.81	0.902	2.07
LeFF块	1.80	31.23	0.909	2.04