Journal of Graphics ›› 2023, Vol. 44 ›› Issue (5): 955-965.DOI: 10.11996/JG.j.2095-302X.2023050955
• Image Processing and Computer Vision • Previous Articles Next Articles
YANG Hong-ju1,2(), GAO Min1, ZHANG Chang-you3, BO Wen3, WU Wen-jia3, CAO Fu-yuan1,2
Received:
2023-02-24
Accepted:
2023-05-06
Online:
2023-10-31
Published:
2023-10-31
About author:
YANG Hong-ju (1975-), associate professor, Ph.D. Her main research interests cover computer vision, machine learning, etc. E-mail:yhju@sxu.edu.cn
Supported by:
CLC Number:
YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting[J]. Journal of Graphics, 2023, 44(5): 955-965.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023050955
Fig. 3 Comparison of inpainting results of different models on Places2 dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
Fig. 4 Comparison of inpainting results of different models on CelebA-HQ dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
Fig. 5 Comparison of inpainting results of different models on Paris StreetView dataset ((a) Masked image; (b) CA; (c) EdgeConnect; (d) GatedConv; (e) MADF; (f) AOT-GAN; (g) Ours)
数据集 | Places2 | CelebA-HQ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
掩码大小 | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | |
L1(%) ↓ | CA[ | 2.89 | 4.82 | 6.69 | 8.36 | 11.43 | 2.13 | 2.32 | 3.47 | 5.44 | 7.29 |
EdgeConnect[ | 2.21 | 3.13 | 3.95 | 5.02 | 8.17 | 1.64 | 2.02 | 2.66 | 3.39 | 5.17 | |
GatedConv[ | 2.46 | 3.39 | 4.28 | 6.21 | 8.43 | 1.89 | 2.10 | 2.87 | 3.81 | 5.23 | |
MADF[ | 2.39 | 3.04 | 3.87 | 4.89 | 7.11 | 1.52 | 1.93 | 2.50 | 3.20 | 4.89 | |
AOT-GAN[ | 2.02 | 2.72 | 3.71 | 4.85 | 7.31 | 1.26 | 1.74 | 2.39 | 3.17 | 5.12 | |
LesT-GAN(本文) | 1.03 | 1.85 | 2.84 | 4.03 | 6.50 | 0.75 | 1.25 | 1.91 | 2.67 | 4.41 | |
PRNS ↑ | CA[ | 24.43 | 21.19 | 19.84 | 17.79 | 16.21 | 29.69 | 26.92 | 24.80 | 22.49 | 18.23 |
EdgeConnect[ | 27.97 | 25.04 | 22.32 | 21.21 | 18.97 | 32.08 | 29.59 | 27.13 | 25.18 | 22.07 | |
GatedConv[ | 27.34 | 24.27 | 21.45 | 19.76 | 17.80 | 31.83 | 29.47 | 26.89 | 24.87 | 21.66 | |
MADF[ | 28.71 | 26.28 | 24.20 | 22.43 | 19.78 | 32.27 | 29.85 | 27.55 | 25.60 | 22.46 | |
AOT-GAN[ | 29.47 | 26.51 | 24.15 | 22.21 | 19.44 | 33.17 | 30.20 | 27.63 | 25.51 | 22.12 | |
LesT-GAN(本文) | 31.06 | 27.35 | 24.75 | 22.72 | 19.81 | 34.53 | 30.98 | 28.29 | 26.16 | 22.85 | |
SSIM ↑ | CA[ | 0.853 | 0.766 | 0.702 | 0.644 | 9.541 | 0.902 | 0.866 | 0.803 | 0.741 | 0.667 |
EdgeConnect[ | 0.921 | 0.871 | 0.812 | 0.750 | 0.652 | 0.941 | 0.911 | 0.872 | 0.826 | 0.752 | |
GatedConv[ | 0.909 | 0.857 | 0.790 | 0.721 | 0.644 | 0.934 | 0.906 | 0.867 | 0.815 | 0.749 | |
MADF[ | 0.923 | 0.881 | 0.831 | 0.774 | 0.678 | 0.947 | 0.918 | 0.880 | 0.838 | 0.758 | |
AOT-GAN[ | 0.932 | 0.887 | 0.833 | 0.771 | 0.669 | 0.953 | 0.923 | 0.884 | 0.839 | 0.756 | |
LesT-GAN(本文) | 0.950 | 0.904 | 0.851 | 0.788 | 0.684 | 0.962 | 0.929 | 0.889 | 0.844 | 0.762 | |
FID ↓ | CA[ | 7.43 | 17.25 | 31.40 | 53.47 | 66.23 | 5.83 | 7.72 | 9.79 | 13.07 | 21.61 |
EdgeConnect[ | 2.84 | 4.11 | 7.72 | 15.51 | 40.32 | 4.07 | 5.14 | 7.23 | 9.13 | 15.39 | |
GatedConv[ | 2.70 | 3.83 | 6.58 | 13.72 | 36.43 | 3.82 | 4.96 | 6.76 | 8.53 | 14.26 | |
MADF[ | 1.65 | 2.98 | 5.14 | 8.46 | 21.18 | 2.60 | 3.43 | 4.69 | 6.21 | 10.88 | |
AOT-GAN[ | 2.08 | 3.37 | 5.83 | 10.41 | 26.61 | 3.48 | 4.49 | 5.88 | 7.49 | 12.17 | |
LesT-GAN(本文) | 0.18 | 0.76 | 1.43 | 2.45 | 6.65 | 0.73 | 1.48 | 2.46 | 3.67 | 6.38 |
Table 1 Quantitative comparison of inpainting results of each algorithm on Place2 and CelebA-HQ datasets
数据集 | Places2 | CelebA-HQ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
掩码大小 | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | 10%~20% | 21%~30% | 31%~40% | 41%~50% | 51%~60% | |
L1(%) ↓ | CA[ | 2.89 | 4.82 | 6.69 | 8.36 | 11.43 | 2.13 | 2.32 | 3.47 | 5.44 | 7.29 |
EdgeConnect[ | 2.21 | 3.13 | 3.95 | 5.02 | 8.17 | 1.64 | 2.02 | 2.66 | 3.39 | 5.17 | |
GatedConv[ | 2.46 | 3.39 | 4.28 | 6.21 | 8.43 | 1.89 | 2.10 | 2.87 | 3.81 | 5.23 | |
MADF[ | 2.39 | 3.04 | 3.87 | 4.89 | 7.11 | 1.52 | 1.93 | 2.50 | 3.20 | 4.89 | |
AOT-GAN[ | 2.02 | 2.72 | 3.71 | 4.85 | 7.31 | 1.26 | 1.74 | 2.39 | 3.17 | 5.12 | |
LesT-GAN(本文) | 1.03 | 1.85 | 2.84 | 4.03 | 6.50 | 0.75 | 1.25 | 1.91 | 2.67 | 4.41 | |
PRNS ↑ | CA[ | 24.43 | 21.19 | 19.84 | 17.79 | 16.21 | 29.69 | 26.92 | 24.80 | 22.49 | 18.23 |
EdgeConnect[ | 27.97 | 25.04 | 22.32 | 21.21 | 18.97 | 32.08 | 29.59 | 27.13 | 25.18 | 22.07 | |
GatedConv[ | 27.34 | 24.27 | 21.45 | 19.76 | 17.80 | 31.83 | 29.47 | 26.89 | 24.87 | 21.66 | |
MADF[ | 28.71 | 26.28 | 24.20 | 22.43 | 19.78 | 32.27 | 29.85 | 27.55 | 25.60 | 22.46 | |
AOT-GAN[ | 29.47 | 26.51 | 24.15 | 22.21 | 19.44 | 33.17 | 30.20 | 27.63 | 25.51 | 22.12 | |
LesT-GAN(本文) | 31.06 | 27.35 | 24.75 | 22.72 | 19.81 | 34.53 | 30.98 | 28.29 | 26.16 | 22.85 | |
SSIM ↑ | CA[ | 0.853 | 0.766 | 0.702 | 0.644 | 9.541 | 0.902 | 0.866 | 0.803 | 0.741 | 0.667 |
EdgeConnect[ | 0.921 | 0.871 | 0.812 | 0.750 | 0.652 | 0.941 | 0.911 | 0.872 | 0.826 | 0.752 | |
GatedConv[ | 0.909 | 0.857 | 0.790 | 0.721 | 0.644 | 0.934 | 0.906 | 0.867 | 0.815 | 0.749 | |
MADF[ | 0.923 | 0.881 | 0.831 | 0.774 | 0.678 | 0.947 | 0.918 | 0.880 | 0.838 | 0.758 | |
AOT-GAN[ | 0.932 | 0.887 | 0.833 | 0.771 | 0.669 | 0.953 | 0.923 | 0.884 | 0.839 | 0.756 | |
LesT-GAN(本文) | 0.950 | 0.904 | 0.851 | 0.788 | 0.684 | 0.962 | 0.929 | 0.889 | 0.844 | 0.762 | |
FID ↓ | CA[ | 7.43 | 17.25 | 31.40 | 53.47 | 66.23 | 5.83 | 7.72 | 9.79 | 13.07 | 21.61 |
EdgeConnect[ | 2.84 | 4.11 | 7.72 | 15.51 | 40.32 | 4.07 | 5.14 | 7.23 | 9.13 | 15.39 | |
GatedConv[ | 2.70 | 3.83 | 6.58 | 13.72 | 36.43 | 3.82 | 4.96 | 6.76 | 8.53 | 14.26 | |
MADF[ | 1.65 | 2.98 | 5.14 | 8.46 | 21.18 | 2.60 | 3.43 | 4.69 | 6.21 | 10.88 | |
AOT-GAN[ | 2.08 | 3.37 | 5.83 | 10.41 | 26.61 | 3.48 | 4.49 | 5.88 | 7.49 | 12.17 | |
LesT-GAN(本文) | 0.18 | 0.76 | 1.43 | 2.45 | 6.65 | 0.73 | 1.48 | 2.46 | 3.67 | 6.38 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
GatedConv块 | 3.19 | 26.87 | 0.853 | 7.28 |
AOT块 | 2.27 | 29.11 | 0.880 | 3.22 |
Swin Transformer块 | 1.85 | 30.72 | 0.898 | 2.10 |
LesT块 | 1.82 | 30.96 | 0.906 | 2.09 |
Table 2 Results of ablation experiments on LesT blocks on the CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
GatedConv块 | 3.19 | 26.87 | 0.853 | 7.28 |
AOT块 | 2.27 | 29.11 | 0.880 | 3.22 |
Swin Transformer块 | 1.85 | 30.72 | 0.898 | 2.10 |
LesT块 | 1.82 | 30.96 | 0.906 | 2.09 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
FFN块 | 1.83 | 30.81 | 0.902 | 2.07 |
LeFF块 | 1.80 | 31.23 | 0.909 | 2.04 |
Table 3 Results of ablation experiments on LeFF blocks on the CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
FFN块 | 1.83 | 30.81 | 0.902 | 2.07 |
LeFF块 | 1.80 | 31.23 | 0.909 | 2.04 |
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
PatchGAN | 1.89 | 30.57 | 0.887 | 2.16 |
SM-PatchGAN | 1.87 | 30.71 | 0.895 | 2.12 |
MRA-PatchGAN | 1.84 | 30.91 | 0.903 | 2.07 |
Table 4 Results of ablation experiments on MAR-PatchGAN on CelebA-HQ dataset
实验块 | L1(%) ↓ | PRNS ↑ | SSIM ↑ | FID ↓ |
---|---|---|---|---|
PatchGAN | 1.89 | 30.57 | 0.887 | 2.16 |
SM-PatchGAN | 1.87 | 30.71 | 0.895 | 2.12 |
MRA-PatchGAN | 1.84 | 30.91 | 0.903 | 2.07 |
[1] |
GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
DOI URL |
[2] | YU F, KOLTUN V, FUNKHOUSER T. Dilated residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 636-644. |
[3] | LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]// Computer Vision - ECCV 2018: 15th European Conference. New York: ACM, 2018: 89-105. |
[4] | YU J H, LIN Z, YANG J M, et al. Free-form image inpainting with gated convolution[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4470-4479. |
[5] | LIU H Y, JIANG B, SONG Y B, et al. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 725-741. |
[6] | YU J H, LIN Z, YANG J M, et al. Generative image inpainting with contextual attention[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5505-5514. |
[7] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2010.11929. |
[8] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[9] | PATHAK D, KRÄHENBÜHL P, DONAHUE J, et al. Context encoders: feature learning by inpainting[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2536-2544. |
[10] | YAN Z Y, LI X M, LI M, et al. Shift-net: image inpainting via deep feature rearrangement[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[11] | REN Y R, YU X M, ZHANG R N, et al. StructureFlow: image inpainting via structure-aware appearance flow[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 181-190. |
[12] | SONG Y H, YANG C, SHEN Y J, et al. SPG-net: segmentation prediction and guidance network for image inpainting[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1805.03356. |
[13] | NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: structure guided image inpainting using edge prediction[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop. New York: IEEE Press, 2020: 3265-3274. |
[14] | IIZUKA S, SIMO-SERRA E, ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14. |
[15] | DEMIR U, UNAL G. Patch-based image inpainting with generative adversarial networks[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1803.07422. |
[16] | JOLICOEUR-MARTINEAU A. The relativistic discriminator: a key element missing from standard GAN[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1807.00734. |
[17] |
ZHU M Y, HE D L, LI X, et al. Image inpainting by end-to-end cascaded refinement with mask awareness[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2021, 30: 4855-4866.
DOI URL |
[18] |
ZENG Y H, FU J L, CHAO H Y, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 29(7): 3266-3280.
DOI URL |
[19] | LIU H Y, JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2020: 4169-4178. |
[20] | ZHENG C X, CHAM T J, CAI J F. Pluralistic image completion[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1438-1447. |
[21] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[22] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 9992-10002. |
[23] | WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 22-31. |
[24] | LI Y W, ZHANG K, CAO J Z, et al. LocalViT: bringing locality to vision transformers[EB/OL]. [2023-05-02]. https://arxiv.org/abs/2104.05707. |
[25] |
ZHOU B L, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452-1464.
DOI PMID |
[26] | KARRAS T, AILA T M, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. [2023-05-02]. https://arxiv.org/abs/1710.10196. |
[27] | DOERSCH C, SINGH S, GUPTA A, et al. What makes Paris look like Paris?[J]. ACM Transactions on Graphics, 31(4): 101: 1-101:9. |
[1] |
ZHOU Rui-chuang, TIAN Jin, YAN Feng-ting, ZHU Tian-xiao, ZHANG Yu-jin.
Point cloud classification model incorporating external attention and graph convolution
[J]. Journal of Graphics, 2023, 44(6): 1162-1172.
|
[2] |
HUANG Shao-nian, WEN Pei-ran, QUAN Qi, CHEN Rong-yuan.
Future frame prediction based on multi-branch aggregation for lightweight video anomaly detection
[J]. Journal of Graphics, 2023, 44(6): 1173-1182.
|
[3] |
SHI Jia-hao, YAO Li.
Video captioning based on semantic guidance
[J]. Journal of Graphics, 2023, 44(6): 1191-1201.
|
[4] |
WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian.
Zero-shot text-driven avatar generation based on depth-conditioned diffusion model
[J]. Journal of Graphics, 2023, 44(6): 1218-1226.
|
[5] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[6] | DANG Hong-she, XU Huai-biao, ZHANG Xuan-de. Deep learning stereo matching algorithm fusing structural information [J]. Journal of Graphics, 2023, 44(5): 899-906. |
[7] | ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji. Multi-fitting detection method for transmission lines based on implicit spatial knowledge fusion [J]. Journal of Graphics, 2023, 44(5): 918-927. |
[8] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
[9] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. |
[10] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[11] | SHAO Jun-qi, QIAN Wen-hua, XU Qi-hao. Landscape image generation based on conditional residual generative adversarial network [J]. Journal of Graphics, 2023, 44(4): 710-717. |
[12] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[13] | GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints [J]. Journal of Graphics, 2023, 44(4): 739-746. |
[14] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. |
[15] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||