欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (5): 955-965.DOI: 10.11996/JG.j.2095-302X.2023050955

• 图像处理与计算机视觉 • 上一篇    下一篇

一种面向图像修复的局部优化生成模型

杨红菊1,2(), 高敏1, 张常有3, 薄文3, 武文佳3, 曹付元1,2   

  1. 1.山西大学计算机与信息技术学院,山西 太原 030006
    2.山西大学计算智能与中文信息处理教育部重点实验室,山西 太原 030006
    3.中国科学院软件研究院,北京 100190
  • 收稿日期:2023-02-24 接受日期:2023-05-06 出版日期:2023-10-31 发布日期:2023-10-31
  • 作者简介:杨红菊(1975-),女,副教授,博士。主要研究方向为计算机视觉、机器学习等。E-mail:yhju@sxu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61976128);山西省回国留学人员科研资助项目(2022-008)

A local optimization generation model for image inpainting

YANG Hong-ju1,2(), GAO Min1, ZHANG Chang-you3, BO Wen3, WU Wen-jia3, CAO Fu-yuan1,2   

  1. 1. School of Computer and Information, Shanxi University, Taiyuan Shanxi 030006, China
    2. Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan Shanxi 030006, China
    3. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2023-02-24 Accepted:2023-05-06 Online:2023-10-31 Published:2023-10-31
  • About author:YANG Hong-ju (1975-), associate professor, Ph.D. Her main research interests cover computer vision, machine learning, etc. E-mail:yhju@sxu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61976128);Shanxi Scholarship Council of China(2022-008)

摘要:

图像修复在照片的编辑、去除等方面有着广泛地应用。针对现有深度学习图像修复模型因受卷积算子感受野局限性的影响,导致修复结果存在结构扭曲或纹理模糊的问题,提出一种局部优化生成模型LesT-GAN,该模型由生成器和鉴别器两部分组成。其中,生成器部分由局部增强滑动窗口Transformer模块构成,该模块将深度卷积的平移不变性、局部性优势与Transformer的全局信息建模能力相结合,既能够覆盖较大范围的感受野又能实现局部细节的优化。鉴别器部分是一种基于掩码指导和补丁的相对平均鉴别器,通过估计给定的真实图像比生成图像更真实的平均概率,模拟缺失区域边界周围的像素传播,使生成器训练时能够直接借助真实图像生成更清晰的局部纹理。在Places2,CelebA-HQ和PairsStreet的3种数据集上,与其他先进的图像修复方法进行对比实验,LesT-GAN在L1和FID评价指标方面分别有10.8%和41.36%的提升。实验结果表明,LesT-GAN在多个场景中有更好的修复效果,同时能很好地泛化到比训练时分辨率更高分辨率的图像中。

关键词: 深度学习, 图像修复, 生成模型, Transformer, 局部优化

Abstract:

Image inpainting has extensive applications in photo editing and removal. In order to address the limitations of existing deep learning-based image inpainting model, which is affected by the receptive field of convolution operators and results in distorted structure or blurred texture, a locally optimized generation model LesT-GAN was proposed. This model comprised a generator and a discriminator. The generator consisted of a locally enhanced sliding window Transformer module. This module combined the translation invariance and locality advantages of deep convolution with the Transformer’s ability to model global information. As a result, it could cover a wide range of receptive fields while optimizing local details. The discriminator part was a relative average discriminator based on mask guidance and patch. It simulated pixel propagation around the boundary of the missing region by estimating the average probability of a given real image being more realistic than a generated image. As a result, during the generator training, it could generate clearer local textures directly from real images. In comparison experiments with other advanced image inpainting methods on the Places2, CelebA-HQ, and PairsStreet datasets, LesT-GAN improved L1 and FID by more than 10.8% and 41.36%, respectively. Experimental results demonstrated that LesT-GAN exhibited superior restoration performance across multiple scenes, and that it could be well generalized to images with higher resolution than those used during training.

Key words: deep learning, image inpainting, generation model, Transformer, local optimization

中图分类号: