图学学报 ›› 2025, Vol. 46 ›› Issue (4): 727-738.DOI: 10.11996/JG.j.2095-302X.2025040727
雷松林1(), 赵征鹏1(
), 阳秋霞1, 普园媛1,2, 谷金晶1, 徐丹1
收稿日期:
2024-10-05
接受日期:
2025-01-15
出版日期:
2025-08-30
发布日期:
2025-08-11
通讯作者:
赵征鹏(1974-),男,副教授,硕士。主要研究方向为图像去噪、图像生成等。E-mail:zhpzhao@ynu.edu.cn第一作者:
雷松林(2000-),男,硕士研究生。主要研究方向为图像风格迁移。E-mail:leisonglin@stu.ynu.edu.cn
基金资助:
LEI Songlin1(), ZHAO Zhengpeng1(
), YANG Qiuxia1, PU Yuanyuan1,2, GU Jinjing1, XU Dan1
Received:
2024-10-05
Accepted:
2025-01-15
Published:
2025-08-30
Online:
2025-08-11
First author:
LEI Songlin (2000-), master student. His main research interest covers image style transfer. E-mail:leisonglin@stu.ynu.edu.cn
Supported by:
摘要:
零样本风格迁移旨在将给定源图像的风格转换至目标文本所描述的风格域,而无需风格图像的指导。现有的零样本风格迁移方法大部分需要耗时在微调和优化过程,而其他无需微调和优化的方法不能很好地实现内容和风格的对齐。借助扩散模型Unet去噪网络的特性,提出了一种无需训练和优化的双支路框架,可以实现内容和风格对齐的零样本风格迁移。首先,该网络通过在内容支路上将噪声图像进行去噪,提取内容支路采样过程中的内容特征以保持源域的内容结构;然后,在风格支路上使用梯度引导的方式从目标文本提示中获取风格信息,并将获取到的风格信息传递到去噪图像中,提取风格支路采样过程中Unet网络的跳连接特征作为风格特征以传递目标风格信息。这种双支路的设计实现了风格迁移过程中内容和风格特征的解耦,避免了单一风格迁移网络中内容和风格特征的纠缠。最后,设计了一个特征调制模块(FMM)来调制和融合来自内容支路和风格支路的内容和风格特征,以实现内容和风格特征的对齐,从而在传递风格的同时最小化影响内容。通过实验结果表明,该方法在无需训练和优化的前提下,可以在任意内容图像上实现高质量的风格迁移。
中图分类号:
雷松林, 赵征鹏, 阳秋霞, 普园媛, 谷金晶, 徐丹. 基于可解耦扩散模型的零样本风格迁移[J]. 图学学报, 2025, 46(4): 727-738.
LEI Songlin, ZHAO Zhengpeng, YANG Qiuxia, PU Yuanyuan, GU Jinjing, XU Dan. Zero-shot style transfer based on decoupled diffusion models[J]. Journal of Graphics, 2025, 46(4): 727-738.
图2 双支路方法的总体结构((a) 模型的总体框架图;(b) 特征调制模块;(c) 风格引导模块;(d) 内容引导模块)
Fig. 2 The overall structure of the dual branch method ((a) Overall framework diagram of the model; (b) Feature modulation module; (c) Style guidance module; (d) Content guidance module)
图3 在ImageNet数据集上的对比实验((a) 源图像;(b) 风格提示文本;(c) ZeCon;(d) DiffusionCLIP;(e) DiffuseIT;(f) InST;(g) FreeStyle;(h) 本文方法)
Fig. 3 Comparative experiments on the ImageNet dataset ((a) Source image; (b) Style prompt text; (c) ZeCon; (d) DiffusionCLIP; (e) DiffuseIT; (f) InST; (g) FreeStyle; (h) Ours)
图4 在FFHQ数据集上的对比实验((a) 源图像;(b) 风格提示文本;(c) ZeCon;(d) DiffusionCLIP;(e) DiffuseIT;(f) InST;(g) StyleGAN-NADA;(h) 本文方法)
Fig. 4 Comparative experiments on the FFHQ dataset ((a) Source image; (b) Style prompt text; (c) ZeCon; (d) DiffusionCLIP; (e) DiffuseIT; (f) InST; (g) StyleGAN-NADA; (h) Ours)
方法 | SSIM↑ | LPIPS↓ | CLIPscore↑ | FID↓ | Perference↑/% | Time/s |
---|---|---|---|---|---|---|
ZeCon | 0.696 | 0.467 | 26.94 | 262.20 | 26 | 38 |
DiffuseIT | 0.602 | 0.507 | 24.40 | 180.32 | 2 | 42 |
DiffCLIP | 0.668 | 0.536 | 28.64 | 256.19 | 6 | 462 |
InST | 0.557 | 0.489 | 26.34 | 220.13 | 22 | 816 |
Ours | 0.750 | 0.401 | 27.54 | 204.89 | 44 | 46 |
表1 不同风格迁移方法的定量结果比较
Table 1 Compared quantitative results with other style transfer methods
方法 | SSIM↑ | LPIPS↓ | CLIPscore↑ | FID↓ | Perference↑/% | Time/s |
---|---|---|---|---|---|---|
ZeCon | 0.696 | 0.467 | 26.94 | 262.20 | 26 | 38 |
DiffuseIT | 0.602 | 0.507 | 24.40 | 180.32 | 2 | 42 |
DiffCLIP | 0.668 | 0.536 | 28.64 | 256.19 | 6 | 462 |
InST | 0.557 | 0.489 | 26.34 | 220.13 | 22 | 816 |
Ours | 0.750 | 0.401 | 27.54 | 204.89 | 44 | 46 |
图5 内容损失和FMM模块的消融实验((a) 源图像;(b) No FMM;(c) No content loss;(d) Vgg loss;(e) Mse loss;(f) Cut loss;(g) Vgg+Mse;(h) Vgg+Mse+Cut)
Fig. 5 Ablation experiment of FMM module and content loss ((a) Source image; (b) No FMM; (c) No content loss; (d) Vgg loss; (e) Mse loss; (f ) Cut loss; (g) Vgg+Mse; (h) Vgg+Mse+Cut)
模型 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
Baseline | 0.707 | 0.431 | 27.37 |
w/o FMM(双支路) | 0.773 | 0.353 | 25.73 |
w/o vgg | 0.771 | 0.397 | 27.66 |
w/o mse | 0.754 | 0.409 | 27.21 |
w/o cut | 0.737 | 0.454 | 27.42 |
w/o content loss | 0.704 | 0.483 | 27.94 |
Ours | 0.819 | 0.321 | 27.34 |
表2 内容损失和FMM模块的消融实验结果
Table 2 Content loss and FMM module ablation experiments results
模型 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
Baseline | 0.707 | 0.431 | 27.37 |
w/o FMM(双支路) | 0.773 | 0.353 | 25.73 |
w/o vgg | 0.771 | 0.397 | 27.66 |
w/o mse | 0.754 | 0.409 | 27.21 |
w/o cut | 0.737 | 0.454 | 27.42 |
w/o content loss | 0.704 | 0.483 | 27.94 |
Ours | 0.819 | 0.321 | 27.34 |
图6 风格损失的消融实验((a) 源图像;(b) 风格文本提示;(c) No style loss;(d) Only dir loss;(e) Only global loss;(f) Fullset)
Fig. 6 Ablation experiment of style loss ((a) Source image; (b) Style text prompt; (c) No style loss; (d) Only dir loss; (e) Only global loss; (f) Fullset)
图7 超参数n的消融实验((a) 源图像;(b) n=0;(c) n=3;(d) n=6;(e) n=9;(f) n=12;(g) n=15;(h) n=18)
Fig. 7 The ablation experiment of hyper-parameter n ((a) Source image; (b) n=0;(c) n=3; (d) n=6; (e) n=9; (f) n=12; (g) n=15; (h) n=18)
参数 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
n=0 | 0.785 | 0.347 | 26.59 |
n=3 | 0.802 | 0.277 | 26.83 |
n=6 | 0.782 | 0.298 | 26.16 |
n=9 | 0.728 | 0.334 | 25.30 |
n=12 | 0.672 | 0.372 | 25.30 |
n=15 | 0.681 | 0.356 | 25.55 |
n=18 | 0.689 | 0.347 | 25.48 |
表3 超参数n的定性消融实验结果
Table 3 Qualitative ablation experiment with hyper-parameter n results
参数 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
n=0 | 0.785 | 0.347 | 26.59 |
n=3 | 0.802 | 0.277 | 26.83 |
n=6 | 0.782 | 0.298 | 26.16 |
n=9 | 0.728 | 0.334 | 25.30 |
n=12 | 0.672 | 0.372 | 25.30 |
n=15 | 0.681 | 0.356 | 25.55 |
n=18 | 0.689 | 0.347 | 25.48 |
图8 超参数s和b的消融实验((a) 源图像;(b) 文本提示;(c) s=1.0,b=1.0;(d) s=1.0,b=1.5;(e) s=1.0,b=2.0;(f) s=1.0,b=2.5;(g) s=1.0,b=3.0;(h) s=0.5,b=2.0;(i) s=0.8,b=2.0;(j) s=1.0,b=2.0;(k) s=1.2,b=2.0;(l) s=1.5,b=2.0)
Fig. 8 The ablation experiment of hyper-parameters s and b ((a) Source image; (b) Text prompt; (c) s=1.0, b=1.0; (d) s=1.0, b=1.5; (e) s=1.0, b=2.0; (f) s=1.0, b=2.5; (g) s=1.0, b=3.0; (h) s=0.5, b=2.0; (i) s=0.8, b=2.0; (j) s=1.0, b=2.0; (k) s=1.2, b=2.0; (l) s=1.5, b=2.0)
图9 解耦实验((a) 源图像;(b) 文本提示;(c) α=0.4;(d) α=0.6;(e) α=0.8;(f) α=1.0;(g) α=1.2;(h) α=1.4;(i) α=1.6)
Fig. 9 Experiment on disentanglement of content and style ((a) Source image; (b) Text prompt; (c) α=0.4; (d) α=0.6; (e) α=0.8;(f) α=1.0; (g) α=1.2; (h) α=1.4; (i) α=1.6)
[1] | CHENG B, LIU Z H, PENG Y B, et al. General image-to- image translation with one-shot image guidance[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22736-22746. |
[2] | GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2414-2423. |
[3] |
王晨琛, 王业琳, 葛中芹, 等. 基于卷积神经网络的中国水墨画风格提取[J]. 图学学报, 2017, 38(5): 754-759.
DOI |
WANG C C, WANG Y L, GE Z Q, et al. Convolutional neural network-based Chinese ink-painting artistic style extraction[J]. Journal of Graphics, 2017, 38(5): 754-759 (in Chinese).
DOI |
|
[4] |
李鑫, 普园媛, 赵征鹏, 等. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.
DOI |
LI X, PU Y Y, ZHAO Z P, et al. Content semantics and style features match consistent artistic style transfer[J]. Journal of Graphics, 2023, 44(4): 699-709 (in Chinese).
DOI |
|
[5] | HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 1501-1510. |
[6] | JING Y C, LIU X, DING Y K, et al. Dynamic instance normalization for arbitrary style transfer[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 4369-4376. |
[7] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// The 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680. |
[8] | PARK T, EFROS A A, ZHANG R, et al. Contrastive learning for unpaired image-to-image translation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 319-345. |
[9] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2223-2232. |
[10] | GAL R, PATASHNIK O, MARON H, et al. StyleGAN-NADA: CLIP-guided domain adaptation of image generators[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 141. |
[11] | KWON G, YE J C. CLIPstyler: image style transfer with a single text condition[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 18062-18071 |
[12] | SAHARIA C, HO J, CHAN W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4713-4726. |
[13] | KIM G, KWON T, YE J C. DiffusionCLIP: text-guided diffusion models for robust image manipulation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2426-2435. |
[14] | KWON G, YE J C. Diffusion-based image translation using disentangled style and content representation[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#KwonY23. |
[15] | YANG S, HWANG H, YE J C. Zero-shot contrastive loss for text-guided diffusion image style transfer[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22873-22882. |
[16] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10684-10695. |
[17] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/icml/icml2021.html#RadfordKHRGASAM21. |
[18] | MOKADY R, HERTZ A, ABERMAN K, et al. Null-text inversion for editing real images using guided diffusion models[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 6038-6047. |
[19] | EVERAERT M N, BOCCHIO M, ARPA S, et al. Diffusion in style[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 2251-2261. |
[20] | CHUNG J, HYUN S, HEO J P. Style injection in diffusion: a training-free approach for adapting large-scale diffusion models for style transfer[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 8795-8805. |
[21] | HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to- prompt image editing with cross-attention control[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HertzMTAPC23. |
[22] | SI C Y, HUANG Z Q, JIANG Y M, et al. FreeU: free lunch in diffusion U-net[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 4733-4743. |
[23] | JEONG J, KWON M, UH Y. Training-free content injection using h-space in diffusion models[C]// 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 5151-5161. |
[24] | 张海嵩, 尹小勤, 于金辉. 实时绘制3D中国画效果[J]. 计算机辅助设计与图形学学报, 2004, 16(11): 1485-1489. |
ZHANG H S, YIN X Q, YU J H. Real-time rendering of 3D Chinese painting effects[J]. Journal of Computer-Aided Design & Computer Graphics, 2004, 16(11): 1485-1489 (in Chinese). | |
[25] |
钱小燕, 肖亮, 吴慧中. 快速风格迁移[J]. 计算机工程, 2006, 32(21): 15-17, 46.
DOI |
QIAN X Y, XIAO L, WU H Z. Fast style transfer[J]. Computer Engineering, 2006, 32(21): 15-17, 46(in Chinese).
DOI |
|
[26] | LI X T, LIU S F, KAUTZ J, et al. Learning linear transformations for fast image and video style transfer[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3809-3817. |
[27] | PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5880-5888. |
[28] | SONG C J, WU Z J, ZHOU Y, et al. ETNet: error transition network for arbitrary style transfer[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 61. |
[29] | WU Z J, SONG C J, ZHOU Y, et al. EFANet: exchangeable feature alignment network for arbitrary style transfer[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12305-12312. |
[30] | LIU S H, LIN T W, HE D L, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6649-6658. |
[31] | KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4401-4410. |
[32] | KARRAS T, AITTALA M, HELLSTEN J, et al. Training generative adversarial networks with limited data[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1015. |
[33] | ZHOU Y, CHEN Z C, HUANG H. Deformable one-shot face stylization via DINO semantic guidance[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 7787-7796. |
[34] | ZHANG Y X, DONG W M, TANG F, et al. ProSpect: prompt spectrum for attribute-aware personalization of diffusion models[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2305.16225. |
[35] | ZHANG Y X, HUANG N S, TANG F, et al. Inversion-based style transfer with diffusion models[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10146-10156. |
[36] | DENG Y Y, HE X Y, TANG F, et al. Z*: zero-shot style transfer via attention reweighting[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 6934-6944. |
[37] | SOHN K, RUIZ N, LEE K, et al. StyleDrop: text-to-image generation in any style[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2306.00983. |
[38] | AHN N, LEE J, LEE C, et al. DreamStyler: paint by style inversion with text-to-image diffusion models[C]// The 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 674-681. |
[39] | QI T H, FANG S C, WU Y Z, et al. DEADiff: an efficient stylization diffusion model with disentangled representations[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 8693-8702. |
[40] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574. |
[41] | DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 672. |
[42] | NICHOL A Q, DHARIWAL P. Improved denoising diffusion probabilistic models[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/icml/icml2021.html#NicholD21. |
[43] | SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#SongME21. |
[44] | PAN Z H, ZHOU X, TIAN H. Arbitrary style guidance for enhanced diffusion-based text-to-image generation[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2023: 4461-4471. |
[45] | HE F H, LI G, SUN F H, et al. FreeStyle: free lunch for text-guided style transfer using diffusion models[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2401.15636. |
[46] | TOV O, ALALUF Y, NITZAN Y, et al. Designing an encoder for StyleGAN image manipulation[J]. ACM Transactions on Graphics (TOG), 2021, 40(4): 133. |
[47] |
WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
DOI PMID |
[48] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
[1] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
[2] | 李纪远, 管哲予, 宋海川, 谭鑫, 马利庄. 人在环路的细分行业logo生成方法[J]. 图学学报, 2025, 46(2): 382-392. |
[3] | 涂晴昊, 李元琪, 刘一凡, 过洁, 郭延文. 基于扩散模型的文本生成材质贴图的泛化性优化方法[J]. 图学学报, 2025, 46(1): 139-149. |
[4] | 刘宗明, 洪唯, 龙睿, 祝越, 张小宇. 基于自注意机制的乳源瑶绣自动生成与应用研究[J]. 图学学报, 2024, 45(5): 1096-1105. |
[5] | 张冀, 崔文帅, 张荣华, 王文彬, 李亚琦. 基于关键视图的文本驱动3D场景编辑方法[J]. 图学学报, 2024, 45(4): 834-844. |
[6] | 王吉, 王森, 蒋智文, 谢志峰, 李梦甜. 基于深度条件扩散模型的零样本文本驱动虚拟人生成方法[J]. 图学学报, 2023, 44(6): 1218-1226. |
[7] | 吴正存, 吴通, 敦晓荣. 基于可拓语义的澄城刺绣风格迁移设计方法研究[J]. 图学学报, 2023, 44(5): 1041-1049. |
[8] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. |
[9] | 林 晓 , 屈时操 , 黄 伟 , 郑晓妹 , 马利庄 . 显著区域保留的图像风格迁移算法[J]. 图学学报, 2021, 42(2): 190-197. |
[10] | 侯宇康, 吕 健, 刘 翔, 胡 涛, 赵泽宇. 基于神经风格迁移网络的民族图案创新方法[J]. 图学学报, 2020, 41(4): 606-613. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||