Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (4): 727-738.DOI: 10.11996/JG.j.2095-302X.2025040727

• Image Processing and Computer Vision • Previous Articles     Next Articles

Zero-shot style transfer based on decoupled diffusion models

LEI Songlin1(), ZHAO Zhengpeng1(), YANG Qiuxia1, PU Yuanyuan1,2, GU Jinjing1, XU Dan1   

  1. 1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650504, China
    2. Internet of Things Technology and Application Key Laboratory of Universities in Yunnan, Kunming Yunnan 650504, China
  • Received:2024-10-05 Accepted:2025-01-15 Online:2025-08-30 Published:2025-08-11
  • Contact: ZHAO Zhengpeng
  • About author:First author contact:

    LEI Songlin (2000-), master student. His main research interest covers image style transfer. E-mail:leisonglin@stu.ynu.edu.cn

  • Supported by:
    National Natural Science Foundation of China(61761046);National Natural Science Foundation of China(52102382);National Natural Science Foundation of China(62362070);Key Project of Applied Basic Research Programme of Yunnan Provincial Department of Science and Technology(202001BB050043);Key Project of Applied Basic Research Programme of Yunnan Provincial Department of Science and Technology(202401AS070149);Yunnan Provincial Science and Technology Major Project(202302AF080006);Postgraduate Science Foundation of Yunnan University under Grants(ZC-23235984)

Abstract:

Zero-shot style transfer aims to apply the style of a given source image to a target style domain described by a text prompt, without relying on a style image. Existing methods typically require time-consuming fine-tuning or optimization processes, while those avoiding such steps often fail to achieve a satisfactory alignment between content and style. A two-branch framework was proposed that enabled zero-shot style transfer with content and style alignment, without the need for training or optimization. Leveraging the diffusion models U-Net denoising network, the content branch first denoised the input image and extracts content features, preserving the source domain’s content structure. The style branch then employed a gradient-guided method to extract style information from the text prompt, which was transferred to the denoised image. Additionally, the style features were derived from the U-Net’s skip connection in the style branch sampling process, ensuring a clear separation between content and style. This decoupling of content and style allowed for effective style transfer while mitigating their entanglement within a single network. Finally, a feature modulation module (FMM) wais introduced to fuse the content and style features from the two branches, ensuring alignment and minimizing the impact on the content during the style transfer. Experimental results demonstrate that the proposed method achieved high-quality style transfer on any content image without the need for training or optimization.

Key words: style transfer, diffusion models, backbone features, skip connection features, feature modulation module

CLC Number: