欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (4): 727-738.DOI: 10.11996/JG.j.2095-302X.2025040727

• 图像处理与计算机视觉 • 上一篇    下一篇

基于可解耦扩散模型的零样本风格迁移

雷松林1(), 赵征鹏1(), 阳秋霞1, 普园媛1,2, 谷金晶1, 徐丹1   

  1. 1.云南大学信息学院,云南 昆明 650504
    2.云南省高校物联网技术及应用重点实验室,云南 昆明 650504
  • 收稿日期:2024-10-05 接受日期:2025-01-15 出版日期:2025-08-30 发布日期:2025-08-11
  • 通讯作者:赵征鹏(1974-),男,副教授,硕士。主要研究方向为图像去噪、图像生成等。E-mail:zhpzhao@ynu.edu.cn
  • 第一作者:雷松林(2000-),男,硕士研究生。主要研究方向为图像风格迁移。E-mail:leisonglin@stu.ynu.edu.cn
  • 基金资助:
    国家自然科学基金(61761046);国家自然科学基金(52102382);国家自然科学基金(62362070);云南省科技厅应用基础研究计划重点项目(202001BB050043);云南省科技厅应用基础研究计划重点项目(202401AS070149);云南省科技重大专项(202302AF080006);云南大学研究生科研创新项目(ZC-23235984)

Zero-shot style transfer based on decoupled diffusion models

LEI Songlin1(), ZHAO Zhengpeng1(), YANG Qiuxia1, PU Yuanyuan1,2, GU Jinjing1, XU Dan1   

  1. 1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650504, China
    2. Internet of Things Technology and Application Key Laboratory of Universities in Yunnan, Kunming Yunnan 650504, China
  • Received:2024-10-05 Accepted:2025-01-15 Published:2025-08-30 Online:2025-08-11
  • First author:LEI Songlin (2000-), master student. His main research interest covers image style transfer. E-mail:leisonglin@stu.ynu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61761046);National Natural Science Foundation of China(52102382);National Natural Science Foundation of China(62362070);Key Project of Applied Basic Research Programme of Yunnan Provincial Department of Science and Technology(202001BB050043);Key Project of Applied Basic Research Programme of Yunnan Provincial Department of Science and Technology(202401AS070149);Yunnan Provincial Science and Technology Major Project(202302AF080006);Postgraduate Science Foundation of Yunnan University under Grants(ZC-23235984)

摘要:

零样本风格迁移旨在将给定源图像的风格转换至目标文本所描述的风格域,而无需风格图像的指导。现有的零样本风格迁移方法大部分需要耗时在微调和优化过程,而其他无需微调和优化的方法不能很好地实现内容和风格的对齐。借助扩散模型Unet去噪网络的特性,提出了一种无需训练和优化的双支路框架,可以实现内容和风格对齐的零样本风格迁移。首先,该网络通过在内容支路上将噪声图像进行去噪,提取内容支路采样过程中的内容特征以保持源域的内容结构;然后,在风格支路上使用梯度引导的方式从目标文本提示中获取风格信息,并将获取到的风格信息传递到去噪图像中,提取风格支路采样过程中Unet网络的跳连接特征作为风格特征以传递目标风格信息。这种双支路的设计实现了风格迁移过程中内容和风格特征的解耦,避免了单一风格迁移网络中内容和风格特征的纠缠。最后,设计了一个特征调制模块(FMM)来调制和融合来自内容支路和风格支路的内容和风格特征,以实现内容和风格特征的对齐,从而在传递风格的同时最小化影响内容。通过实验结果表明,该方法在无需训练和优化的前提下,可以在任意内容图像上实现高质量的风格迁移。

关键词: 风格迁移, 扩散模型, 骨干特征, 跳连接特征, 特征调制模块

Abstract:

Zero-shot style transfer aims to apply the style of a given source image to a target style domain described by a text prompt, without relying on a style image. Existing methods typically require time-consuming fine-tuning or optimization processes, while those avoiding such steps often fail to achieve a satisfactory alignment between content and style. A two-branch framework was proposed that enabled zero-shot style transfer with content and style alignment, without the need for training or optimization. Leveraging the diffusion models U-Net denoising network, the content branch first denoised the input image and extracts content features, preserving the source domain’s content structure. The style branch then employed a gradient-guided method to extract style information from the text prompt, which was transferred to the denoised image. Additionally, the style features were derived from the U-Net’s skip connection in the style branch sampling process, ensuring a clear separation between content and style. This decoupling of content and style allowed for effective style transfer while mitigating their entanglement within a single network. Finally, a feature modulation module (FMM) wais introduced to fuse the content and style features from the two branches, ensuring alignment and minimizing the impact on the content during the style transfer. Experimental results demonstrate that the proposed method achieved high-quality style transfer on any content image without the need for training or optimization.

Key words: style transfer, diffusion models, backbone features, skip connection features, feature modulation module

中图分类号: