图学学报 ›› 2023, Vol. 44 ›› Issue (5): 955-965.DOI: 10.11996/JG.j.2095-302X.2023050955
杨红菊1,2(), 高敏1, 张常有3, 薄文3, 武文佳3, 曹付元1,2
收稿日期:
2023-02-24
接受日期:
2023-05-06
出版日期:
2023-10-31
发布日期:
2023-10-31
作者简介:
杨红菊(1975-),女,副教授,博士。主要研究方向为计算机视觉、机器学习等。E-mail:yhju@sxu.edu.cn
基金资助:
YANG Hong-ju1,2(), GAO Min1, ZHANG Chang-you3, BO Wen3, WU Wen-jia3, CAO Fu-yuan1,2
Received:
2023-02-24
Accepted:
2023-05-06
Online:
2023-10-31
Published:
2023-10-31
About author:
YANG Hong-ju (1975-), associate professor, Ph.D. Her main research interests cover computer vision, machine learning, etc. E-mail:yhju@sxu.edu.cn
Supported by:
摘要:
图像修复在照片的编辑、去除等方面有着广泛地应用。针对现有深度学习图像修复模型因受卷积算子感受野局限性的影响,导致修复结果存在结构扭曲或纹理模糊的问题,提出一种局部优化生成模型LesT-GAN,该模型由生成器和鉴别器两部分组成。其中,生成器部分由局部增强滑动窗口Transformer模块构成,该模块将深度卷积的平移不变性、局部性优势与Transformer的全局信息建模能力相结合,既能够覆盖较大范围的感受野又能实现局部细节的优化。鉴别器部分是一种基于掩码指导和补丁的相对平均鉴别器,通过估计给定的真实图像比生成图像更真实的平均概率,模拟缺失区域边界周围的像素传播,使生成器训练时能够直接借助真实图像生成更清晰的局部纹理。在Places2,CelebA-HQ和PairsStreet的3种数据集上,与其他先进的图像修复方法进行对比实验,LesT-GAN在L1和FID评价指标方面分别有10.8%和41.36%的提升。实验结果表明,LesT-GAN在多个场景中有更好的修复效果,同时能很好地泛化到比训练时分辨率更高分辨率的图像中。
中图分类号:
杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965.
YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting[J]. Journal of Graphics, 2023, 44(5): 955-965.
图1 模型框架((a) LesT-GAN整体网络结构;(b) LesT块;(c) LeFF块)
Fig. 1 Model framework ((a) The overall framework of LesT-GAN model; (b) LesT module; (c) LeFF module)
图3 不同模型在Places2数据集上修复结果对比((a)掩码图像;(b) CA;(c) EdgeConnect;(d) GatedConv;(e) MADF;(f) AOT-GAN;(g)本文方法)
Fig. 3 Comparison of inpainting results of different models on Places2 dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
图4 不同模型在CelebA-HQ数据集上修复结果对比((a)掩码图像;(b) CA;(c) EdgeConnect;(d) GatedConv;(e) MADF;(f) AOT-GAN;(g)本文方法)
Fig. 4 Comparison of inpainting results of different models on CelebA-HQ dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
图5 不同模型在Paris StreetView数据集上修复结果对比((a)掩码图像;(b) CA;(c) EdgeConnect;(d) GatedConv;(e) MADF;(f) AOT-GAN;(g)本文方法)
Fig. 5 Comparison of inpainting results of different models on Paris StreetView dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
数据集 | Places2 | CelebA-HQ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
掩码大小 | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | |
L1(%) ↓ | CA[ | 2.89 | 4.82 | 6.69 | 8.36 | 11.43 | 2.13 | 2.32 | 3.47 | 5.44 | 7.29 |
EdgeConnect[ | 2.21 | 3.13 | 3.95 | 5.02 | 8.17 | 1.64 | 2.02 | 2.66 | 3.39 | 5.17 | |
GatedConv[ | 2.46 | 3.39 | 4.28 | 6.21 | 8.43 | 1.89 | 2.10 | 2.87 | 3.81 | 5.23 | |
MADF[ | 2.39 | 3.04 | 3.87 | 4.89 | 7.11 | 1.52 | 1.93 | 2.50 | 3.20 | 4.89 | |
AOT-GAN[ | 2.02 | 2.72 | 3.71 | 4.85 | 7.31 | 1.26 | 1.74 | 2.39 | 3.17 | 5.12 | |
LesT-GAN(本文) | 1.03 | 1.85 | 2.84 | 4.03 | 6.50 | 0.75 | 1.25 | 1.91 | 2.67 | 4.41 | |
PRNS ↑ | CA[ | 24.43 | 21.19 | 19.84 | 17.79 | 16.21 | 29.69 | 26.92 | 24.80 | 22.49 | 18.23 |
EdgeConnect[ | 27.97 | 25.04 | 22.32 | 21.21 | 18.97 | 32.08 | 29.59 | 27.13 | 25.18 | 22.07 | |
GatedConv[ | 27.34 | 24.27 | 21.45 | 19.76 | 17.80 | 31.83 | 29.47 | 26.89 | 24.87 | 21.66 | |
MADF[ | 28.71 | 26.28 | 24.20 | 22.43 | 19.78 | 32.27 | 29.85 | 27.55 | 25.60 | 22.46 | |
AOT-GAN[ | 29.47 | 26.51 | 24.15 | 22.21 | 19.44 | 33.17 | 30.20 | 27.63 | 25.51 | 22.12 | |
LesT-GAN(本文) | 31.06 | 27.35 | 24.75 | 22.72 | 19.81 | 34.53 | 30.98 | 28.29 | 26.16 | 22.85 | |
SSIM ↑ | CA[ | 0.853 | 0.766 | 0.702 | 0.644 | 9.541 | 0.902 | 0.866 | 0.803 | 0.741 | 0.667 |
EdgeConnect[ | 0.921 | 0.871 | 0.812 | 0.750 | 0.652 | 0.941 | 0.911 | 0.872 | 0.826 | 0.752 | |
GatedConv[ | 0.909 | 0.857 | 0.790 | 0.721 | 0.644 | 0.934 | 0.906 | 0.867 | 0.815 | 0.749 | |
MADF[ | 0.923 | 0.881 | 0.831 | 0.774 | 0.678 | 0.947 | 0.918 | 0.880 | 0.838 | 0.758 | |
AOT-GAN[ | 0.932 | 0.887 | 0.833 | 0.771 | 0.669 | 0.953 | 0.923 | 0.884 | 0.839 | 0.756 | |
LesT-GAN(本文) | 0.950 | 0.904 | 0.851 | 0.788 | 0.684 | 0.962 | 0.929 | 0.889 | 0.844 | 0.762 | |
FID ↓ | CA[ | 7.43 | 17.25 | 31.40 | 53.47 | 66.23 | 5.83 | 7.72 | 9.79 | 13.07 | 21.61 |
EdgeConnect[ | 2.84 | 4.11 | 7.72 | 15.51 | 40.32 | 4.07 | 5.14 | 7.23 | 9.13 | 15.39 | |
GatedConv[ | 2.70 | 3.83 | 6.58 | 13.72 | 36.43 | 3.82 | 4.96 | 6.76 | 8.53 | 14.26 | |
MADF[ | 1.65 | 2.98 | 5.14 | 8.46 | 21.18 | 2.60 | 3.43 | 4.69 | 6.21 | 10.88 | |
AOT-GAN[ | 2.08 | 3.37 | 5.83 | 10.41 | 26.61 | 3.48 | 4.49 | 5.88 | 7.49 | 12.17 | |
LesT-GAN(本文) | 0.18 | 0.76 | 1.43 | 2.45 | 6.65 | 0.73 | 1.48 | 2.46 | 3.67 | 6.38 |
表1 Place2和CelebA-HQ数据集上各个算法修复结果定量比较
Table 1 Quantitative comparison of inpainting results of each algorithm on Place2 and CelebA-HQ datasets
数据集 | Places2 | CelebA-HQ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
掩码大小 | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | |
L1(%) ↓ | CA[ | 2.89 | 4.82 | 6.69 | 8.36 | 11.43 | 2.13 | 2.32 | 3.47 | 5.44 | 7.29 |
EdgeConnect[ | 2.21 | 3.13 | 3.95 | 5.02 | 8.17 | 1.64 | 2.02 | 2.66 | 3.39 | 5.17 | |
GatedConv[ | 2.46 | 3.39 | 4.28 | 6.21 | 8.43 | 1.89 | 2.10 | 2.87 | 3.81 | 5.23 | |
MADF[ | 2.39 | 3.04 | 3.87 | 4.89 | 7.11 | 1.52 | 1.93 | 2.50 | 3.20 | 4.89 | |
AOT-GAN[ | 2.02 | 2.72 | 3.71 | 4.85 | 7.31 | 1.26 | 1.74 | 2.39 | 3.17 | 5.12 | |
LesT-GAN(本文) | 1.03 | 1.85 | 2.84 | 4.03 | 6.50 | 0.75 | 1.25 | 1.91 | 2.67 | 4.41 | |
PRNS ↑ | CA[ | 24.43 | 21.19 | 19.84 | 17.79 | 16.21 | 29.69 | 26.92 | 24.80 | 22.49 | 18.23 |
EdgeConnect[ | 27.97 | 25.04 | 22.32 | 21.21 | 18.97 | 32.08 | 29.59 | 27.13 | 25.18 | 22.07 | |
GatedConv[ | 27.34 | 24.27 | 21.45 | 19.76 | 17.80 | 31.83 | 29.47 | 26.89 | 24.87 | 21.66 | |
MADF[ | 28.71 | 26.28 | 24.20 | 22.43 | 19.78 | 32.27 | 29.85 | 27.55 | 25.60 | 22.46 | |
AOT-GAN[ | 29.47 | 26.51 | 24.15 | 22.21 | 19.44 | 33.17 | 30.20 | 27.63 | 25.51 | 22.12 | |
LesT-GAN(本文) | 31.06 | 27.35 | 24.75 | 22.72 | 19.81 | 34.53 | 30.98 | 28.29 | 26.16 | 22.85 | |
SSIM ↑ | CA[ | 0.853 | 0.766 | 0.702 | 0.644 | 9.541 | 0.902 | 0.866 | 0.803 | 0.741 | 0.667 |
EdgeConnect[ | 0.921 | 0.871 | 0.812 | 0.750 | 0.652 | 0.941 | 0.911 | 0.872 | 0.826 | 0.752 | |
GatedConv[ | 0.909 | 0.857 | 0.790 | 0.721 | 0.644 | 0.934 | 0.906 | 0.867 | 0.815 | 0.749 | |
MADF[ | 0.923 | 0.881 | 0.831 | 0.774 | 0.678 | 0.947 | 0.918 | 0.880 | 0.838 | 0.758 | |
AOT-GAN[ | 0.932 | 0.887 | 0.833 | 0.771 | 0.669 | 0.953 | 0.923 | 0.884 | 0.839 | 0.756 | |
LesT-GAN(本文) | 0.950 | 0.904 | 0.851 | 0.788 | 0.684 | 0.962 | 0.929 | 0.889 | 0.844 | 0.762 | |
FID ↓ | CA[ | 7.43 | 17.25 | 31.40 | 53.47 | 66.23 | 5.83 | 7.72 | 9.79 | 13.07 | 21.61 |
EdgeConnect[ | 2.84 | 4.11 | 7.72 | 15.51 | 40.32 | 4.07 | 5.14 | 7.23 | 9.13 | 15.39 | |
GatedConv[ | 2.70 | 3.83 | 6.58 | 13.72 | 36.43 | 3.82 | 4.96 | 6.76 | 8.53 | 14.26 | |
MADF[ | 1.65 | 2.98 | 5.14 | 8.46 | 21.18 | 2.60 | 3.43 | 4.69 | 6.21 | 10.88 | |
AOT-GAN[ | 2.08 | 3.37 | 5.83 | 10.41 | 26.61 | 3.48 | 4.49 | 5.88 | 7.49 | 12.17 | |
LesT-GAN(本文) | 0.18 | 0.76 | 1.43 | 2.45 | 6.65 | 0.73 | 1.48 | 2.46 | 3.67 | 6.38 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
GatedConv块 | 3.19 | 26.87 | 0.853 | 7.28 |
AOT块 | 2.27 | 29.11 | 0.880 | 3.22 |
Swin Transformer块 | 1.85 | 30.72 | 0.898 | 2.10 |
LesT块 | 1.82 | 30.96 | 0.906 | 2.09 |
表2 在CelebA-HQ数据集对LesT块的消融实验结果
Table 2 Results of ablation experiments on LesT blocks on the CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
GatedConv块 | 3.19 | 26.87 | 0.853 | 7.28 |
AOT块 | 2.27 | 29.11 | 0.880 | 3.22 |
Swin Transformer块 | 1.85 | 30.72 | 0.898 | 2.10 |
LesT块 | 1.82 | 30.96 | 0.906 | 2.09 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
FFN块 | 1.83 | 30.81 | 0.902 | 2.07 |
LeFF块 | 1.80 | 31.23 | 0.909 | 2.04 |
表3 在CelebA-HQ数据集对LeFF块的消融实验结果
Table 3 Results of ablation experiments on LeFF blocks on the CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
FFN块 | 1.83 | 30.81 | 0.902 | 2.07 |
LeFF块 | 1.80 | 31.23 | 0.909 | 2.04 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
PatchGAN | 1.89 | 30.57 | 0.887 | 2.16 |
SM-PatchGAN | 1.87 | 30.71 | 0.895 | 2.12 |
MRA-PatchGAN | 1.84 | 30.91 | 0.903 | 2.07 |
表4 在CelebA-HQ数据集对MAR-PatchGAN的消融实验结果
Table 4 Results of ablation experiments on MAR-PatchGAN on CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
PatchGAN | 1.89 | 30.57 | 0.887 | 2.16 |
SM-PatchGAN | 1.87 | 30.71 | 0.895 | 2.12 |
MRA-PatchGAN | 1.84 | 30.91 | 0.903 | 2.07 |
[1] |
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
DOI URL |
[2] | YU F, KOLTUN V, FUNKHOUSER T. Dilated residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 636-644. |
[3] | LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]// Computer Vision - ECCV 2018: 15th European Conference. New York: ACM, 2018: 89-105. |
[4] | YU J H, LIN Z, YANG J M, et al. Free-form image inpainting with gated convolution[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4470-4479. |
[5] | LIU H Y, JIANG B, SONG Y B, et al. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 725-741. |
[6] | YU J H, LIN Z, YANG J M, et al. Generative image inpainting with contextual attention[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5505-5514. |
[7] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2010.11929. |
[8] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[9] | PATHAK D, KRÄHENBÜHL P, DONAHUE J, et al. Context encoders: feature learning by inpainting[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2536-2544. |
[10] | YAN Z Y, LI X M, LI M, et al. Shift-net: image inpainting via deep feature rearrangement[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[11] | REN Y R, YU X M, ZHANG R N, et al. StructureFlow: image inpainting via structure-aware appearance flow[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 181-190. |
[12] | SONG Y H, YANG C, SHEN Y J, et al. SPG-net: segmentation prediction and guidance network for image inpainting[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1805.03356. |
[13] | NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: structure guided image inpainting using edge prediction[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop. New York: IEEE Press, 2020: 3265-3274. |
[14] | IIZUKA S, SIMO-SERRA E, ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14. |
[15] | DEMIR U, UNAL G. Patch-based image inpainting with generative adversarial networks[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1803.07422. |
[16] | JOLICOEUR-MARTINEAU A. The relativistic discriminator: a key element missing from standard GAN[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1807.00734. |
[17] |
ZHU M Y, HE D L, LI X, et al. Image inpainting by end-to-end cascaded refinement with mask awareness[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2021, 30: 4855-4866.
DOI URL |
[18] |
ZENG Y H, FU J L, CHAO H Y, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 29(7): 3266-3280.
DOI URL |
[19] | LIU H Y, JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4169-4178. |
[20] | ZHENG C X, CHAM T J, CAI J F. Pluralistic image completion[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1438-1447. |
[21] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[22] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 9992-10002. |
[23] | WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 22-31. |
[24] | LI Y W, ZHANG K, CAO J Z, et al. LocalViT: bringing locality to vision transformers[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2104.05707. |
[25] |
ZHOU B L, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452-1464.
DOI PMID |
[26] | KARRAS T, AILA T M, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1710.10196. |
[27] | DOERSCH C, SINGH S, GUPTA A, et al. What makes Paris look like Paris?[J]. ACM Transactions on Graphics, 31(4): 101: 1-101:9. |
[1] | 周锐闯, 田瑾, 闫丰亭, 朱天晓, 张玉金 .
融合外部注意力和图卷积的点云分类模型
[J]. 图学学报, 2023, 44(6): 1162-1172. |
[2] | 黄少年, 文沛然, 全琪, 陈荣元 .
基于多支路聚合的帧预测轻量化视频异常检测
[J]. 图学学报, 2023, 44(6): 1173-1182. |
[3] | 石佳豪, 姚莉.
基于语义引导的视频描述生成
[J]. 图学学报, 2023, 44(6): 1191-1201. |
[4] | 王吉, 王森, 蒋智文, 谢志峰, 李梦甜.
基于深度条件扩散模型的零样本文本驱动虚拟人生成方法
[J]. 图学学报, 2023, 44(6): 1218-1226. |
[5] | 杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867. |
[6] | 党宏社, 许怀彪, 张选德. 融合结构信息的深度学习立体匹配算法[J]. 图学学报, 2023, 44(5): 899-906. |
[7] | 翟永杰, 郭聪彬, 王乾铭, 赵宽, 白云山, 张冀. 基于隐含空间知识融合的输电线路多金具检测方法[J]. 图学学报, 2023, 44(5): 918-927. |
[8] | 毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639. |
[9] | 郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676. |
[10] | 曹义亲, 周一纬, 徐露. 基于E-YOLOX的实时金属表面缺陷检测算法[J]. 图学学报, 2023, 44(4): 677-690. |
[11] | 邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717. |
[12] | 余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738. |
[13] | 郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746. |
[14] | 李刚, 张运涛, 汪文凯, 张东阳. 采用DETR与先验知识融合的输电线路螺栓缺陷检测方法[J]. 图学学报, 2023, 44(3): 438-447. |
[15] | 毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||