Journal of Graphics ›› 2025, Vol. 46 ›› Issue (4): 727-738.DOI: 10.11996/JG.j.2095-302X.2025040727
• Image Processing and Computer Vision • Previous Articles Next Articles
LEI Songlin1(), ZHAO Zhengpeng1(
), YANG Qiuxia1, PU Yuanyuan1,2, GU Jinjing1, XU Dan1
Received:
2024-10-05
Accepted:
2025-01-15
Online:
2025-08-30
Published:
2025-08-11
Contact:
ZHAO Zhengpeng
About author:
First author contact:LEI Songlin (2000-), master student. His main research interest covers image style transfer. E-mail:leisonglin@stu.ynu.edu.cn
Supported by:
CLC Number:
LEI Songlin, ZHAO Zhengpeng, YANG Qiuxia, PU Yuanyuan, GU Jinjing, XU Dan. Zero-shot style transfer based on decoupled diffusion models[J]. Journal of Graphics, 2025, 46(4): 727-738.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025040727
Fig. 2 The overall structure of the dual branch method ((a) Overall framework diagram of the model; (b) Feature modulation module; (c) Style guidance module; (d) Content guidance module)
Fig. 3 Comparative experiments on the ImageNet dataset ((a) Source image; (b) Style prompt text; (c) ZeCon; (d) DiffusionCLIP; (e) DiffuseIT; (f) InST; (g) FreeStyle; (h) Ours)
Fig. 4 Comparative experiments on the FFHQ dataset ((a) Source image; (b) Style prompt text; (c) ZeCon; (d) DiffusionCLIP; (e) DiffuseIT; (f) InST; (g) StyleGAN-NADA; (h) Ours)
方法 | SSIM↑ | LPIPS↓ | CLIPscore↑ | FID↓ | Perference↑/% | Time/s |
---|---|---|---|---|---|---|
ZeCon | 0.696 | 0.467 | 26.94 | 262.20 | 26 | 38 |
DiffuseIT | 0.602 | 0.507 | 24.40 | 180.32 | 2 | 42 |
DiffCLIP | 0.668 | 0.536 | 28.64 | 256.19 | 6 | 462 |
InST | 0.557 | 0.489 | 26.34 | 220.13 | 22 | 816 |
Ours | 0.750 | 0.401 | 27.54 | 204.89 | 44 | 46 |
Table 1 Compared quantitative results with other style transfer methods
方法 | SSIM↑ | LPIPS↓ | CLIPscore↑ | FID↓ | Perference↑/% | Time/s |
---|---|---|---|---|---|---|
ZeCon | 0.696 | 0.467 | 26.94 | 262.20 | 26 | 38 |
DiffuseIT | 0.602 | 0.507 | 24.40 | 180.32 | 2 | 42 |
DiffCLIP | 0.668 | 0.536 | 28.64 | 256.19 | 6 | 462 |
InST | 0.557 | 0.489 | 26.34 | 220.13 | 22 | 816 |
Ours | 0.750 | 0.401 | 27.54 | 204.89 | 44 | 46 |
Fig. 5 Ablation experiment of FMM module and content loss ((a) Source image; (b) No FMM; (c) No content loss; (d) Vgg loss; (e) Mse loss; (f ) Cut loss; (g) Vgg+Mse; (h) Vgg+Mse+Cut)
模型 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
Baseline | 0.707 | 0.431 | 27.37 |
w/o FMM(双支路) | 0.773 | 0.353 | 25.73 |
w/o vgg | 0.771 | 0.397 | 27.66 |
w/o mse | 0.754 | 0.409 | 27.21 |
w/o cut | 0.737 | 0.454 | 27.42 |
w/o content loss | 0.704 | 0.483 | 27.94 |
Ours | 0.819 | 0.321 | 27.34 |
Table 2 Content loss and FMM module ablation experiments results
模型 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
Baseline | 0.707 | 0.431 | 27.37 |
w/o FMM(双支路) | 0.773 | 0.353 | 25.73 |
w/o vgg | 0.771 | 0.397 | 27.66 |
w/o mse | 0.754 | 0.409 | 27.21 |
w/o cut | 0.737 | 0.454 | 27.42 |
w/o content loss | 0.704 | 0.483 | 27.94 |
Ours | 0.819 | 0.321 | 27.34 |
Fig. 6 Ablation experiment of style loss ((a) Source image; (b) Style text prompt; (c) No style loss; (d) Only dir loss; (e) Only global loss; (f) Fullset)
参数 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
n=0 | 0.785 | 0.347 | 26.59 |
n=3 | 0.802 | 0.277 | 26.83 |
n=6 | 0.782 | 0.298 | 26.16 |
n=9 | 0.728 | 0.334 | 25.30 |
n=12 | 0.672 | 0.372 | 25.30 |
n=15 | 0.681 | 0.356 | 25.55 |
n=18 | 0.689 | 0.347 | 25.48 |
Table 3 Qualitative ablation experiment with hyper-parameter n results
参数 | SSIM↑ | LPIPS↓ | CLIP score↑ |
---|---|---|---|
n=0 | 0.785 | 0.347 | 26.59 |
n=3 | 0.802 | 0.277 | 26.83 |
n=6 | 0.782 | 0.298 | 26.16 |
n=9 | 0.728 | 0.334 | 25.30 |
n=12 | 0.672 | 0.372 | 25.30 |
n=15 | 0.681 | 0.356 | 25.55 |
n=18 | 0.689 | 0.347 | 25.48 |
Fig. 8 The ablation experiment of hyper-parameters s and b ((a) Source image; (b) Text prompt; (c) s=1.0, b=1.0; (d) s=1.0, b=1.5; (e) s=1.0, b=2.0; (f) s=1.0, b=2.5; (g) s=1.0, b=3.0; (h) s=0.5, b=2.0; (i) s=0.8, b=2.0; (j) s=1.0, b=2.0; (k) s=1.2, b=2.0; (l) s=1.5, b=2.0)
Fig. 9 Experiment on disentanglement of content and style ((a) Source image; (b) Text prompt; (c) α=0.4; (d) α=0.6; (e) α=0.8;(f) α=1.0; (g) α=1.2; (h) α=1.4; (i) α=1.6)
[1] | CHENG B, LIU Z H, PENG Y B, et al. General image-to- image translation with one-shot image guidance[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22736-22746. |
[2] | GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2414-2423. |
[3] |
王晨琛, 王业琳, 葛中芹, 等. 基于卷积神经网络的中国水墨画风格提取[J]. 图学学报, 2017, 38(5): 754-759.
DOI |
WANG C C, WANG Y L, GE Z Q, et al. Convolutional neural network-based Chinese ink-painting artistic style extraction[J]. Journal of Graphics, 2017, 38(5): 754-759 (in Chinese).
DOI |
|
[4] |
李鑫, 普园媛, 赵征鹏, 等. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.
DOI |
LI X, PU Y Y, ZHAO Z P, et al. Content semantics and style features match consistent artistic style transfer[J]. Journal of Graphics, 2023, 44(4): 699-709 (in Chinese).
DOI |
|
[5] | HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 1501-1510. |
[6] | JING Y C, LIU X, DING Y K, et al. Dynamic instance normalization for arbitrary style transfer[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 4369-4376. |
[7] | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// The 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680. |
[8] | PARK T, EFROS A A, ZHANG R, et al. Contrastive learning for unpaired image-to-image translation[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 319-345. |
[9] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2223-2232. |
[10] | GAL R, PATASHNIK O, MARON H, et al. StyleGAN-NADA: CLIP-guided domain adaptation of image generators[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 141. |
[11] | KWON G, YE J C. CLIPstyler: image style transfer with a single text condition[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 18062-18071 |
[12] | SAHARIA C, HO J, CHAN W, et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 4713-4726. |
[13] | KIM G, KWON T, YE J C. DiffusionCLIP: text-guided diffusion models for robust image manipulation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 2426-2435. |
[14] | KWON G, YE J C. Diffusion-based image translation using disentangled style and content representation[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#KwonY23. |
[15] | YANG S, HWANG H, YE J C. Zero-shot contrastive loss for text-guided diffusion image style transfer[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22873-22882. |
[16] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10684-10695. |
[17] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/icml/icml2021.html#RadfordKHRGASAM21. |
[18] | MOKADY R, HERTZ A, ABERMAN K, et al. Null-text inversion for editing real images using guided diffusion models[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 6038-6047. |
[19] | EVERAERT M N, BOCCHIO M, ARPA S, et al. Diffusion in style[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 2251-2261. |
[20] | CHUNG J, HYUN S, HEO J P. Style injection in diffusion: a training-free approach for adapting large-scale diffusion models for style transfer[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 8795-8805. |
[21] | HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to- prompt image editing with cross-attention control[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HertzMTAPC23. |
[22] | SI C Y, HUANG Z Q, JIANG Y M, et al. FreeU: free lunch in diffusion U-net[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 4733-4743. |
[23] | JEONG J, KWON M, UH Y. Training-free content injection using h-space in diffusion models[C]// 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2024: 5151-5161. |
[24] | 张海嵩, 尹小勤, 于金辉. 实时绘制3D中国画效果[J]. 计算机辅助设计与图形学学报, 2004, 16(11): 1485-1489. |
ZHANG H S, YIN X Q, YU J H. Real-time rendering of 3D Chinese painting effects[J]. Journal of Computer-Aided Design & Computer Graphics, 2004, 16(11): 1485-1489 (in Chinese). | |
[25] |
钱小燕, 肖亮, 吴慧中. 快速风格迁移[J]. 计算机工程, 2006, 32(21): 15-17, 46.
DOI |
QIAN X Y, XIAO L, WU H Z. Fast style transfer[J]. Computer Engineering, 2006, 32(21): 15-17, 46(in Chinese).
DOI |
|
[26] | LI X T, LIU S F, KAUTZ J, et al. Learning linear transformations for fast image and video style transfer[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3809-3817. |
[27] | PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5880-5888. |
[28] | SONG C J, WU Z J, ZHOU Y, et al. ETNet: error transition network for arbitrary style transfer[C]// The 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2019: 61. |
[29] | WU Z J, SONG C J, ZHOU Y, et al. EFANet: exchangeable feature alignment network for arbitrary style transfer[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12305-12312. |
[30] | LIU S H, LIN T W, HE D L, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6649-6658. |
[31] | KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4401-4410. |
[32] | KARRAS T, AITTALA M, HELLSTEN J, et al. Training generative adversarial networks with limited data[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1015. |
[33] | ZHOU Y, CHEN Z C, HUANG H. Deformable one-shot face stylization via DINO semantic guidance[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 7787-7796. |
[34] | ZHANG Y X, DONG W M, TANG F, et al. ProSpect: prompt spectrum for attribute-aware personalization of diffusion models[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2305.16225. |
[35] | ZHANG Y X, HUANG N S, TANG F, et al. Inversion-based style transfer with diffusion models[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10146-10156. |
[36] | DENG Y Y, HE X Y, TANG F, et al. Z*: zero-shot style transfer via attention reweighting[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 6934-6944. |
[37] | SOHN K, RUIZ N, LEE K, et al. StyleDrop: text-to-image generation in any style[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2306.00983. |
[38] | AHN N, LEE J, LEE C, et al. DreamStyler: paint by style inversion with text-to-image diffusion models[C]// The 38th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2024: 674-681. |
[39] | QI T H, FANG S C, WU Y Z, et al. DEADiff: an efficient stylization diffusion model with disentangled representations[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 8693-8702. |
[40] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574. |
[41] | DHARIWAL P, NICHOL A. Diffusion models beat GANs on image synthesis[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 672. |
[42] | NICHOL A Q, DHARIWAL P. Improved denoising diffusion probabilistic models[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/icml/icml2021.html#NicholD21. |
[43] | SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2024-05-04]. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#SongME21. |
[44] | PAN Z H, ZHOU X, TIAN H. Arbitrary style guidance for enhanced diffusion-based text-to-image generation[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2023: 4461-4471. |
[45] | HE F H, LI G, SUN F H, et al. FreeStyle: free lunch for text-guided style transfer using diffusion models[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2401.15636. |
[46] | TOV O, ALALUF Y, NITZAN Y, et al. Designing an encoder for StyleGAN image manipulation[J]. ACM Transactions on Graphics (TOG), 2021, 40(4): 133. |
[47] |
WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
DOI PMID |
[48] | ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 586-595. |
[1] | SUN Heyi, LI Yixiao, TIAN Xi, ZHANG Songhai. Image to 3D vase generation technology combining procedural content generation and diffusion models [J]. Journal of Graphics, 2025, 46(2): 332-344. |
[2] | LIU Zongming, HONG Wei, LONG Rui, ZHU Yue, ZHANG Xiaoyu. Research on automatic generation and application of Ruyuan Yao embroidery based on self-attention mechanism [J]. Journal of Graphics, 2024, 45(5): 1096-1105. |
[3] | ZHOU Leijing, ZHANG Yuxin, LEI Rui, SHEN Aoyi. Research on stylization method of copper chiseling paper-cutting [J]. Journal of Graphics, 2024, 45(1): 126-138. |
[4] | WU Zheng-cun, WU Tong, DUN Xiao-rong. Research on design method of Chengcheng embroidery style migration based on extension semantics [J]. Journal of Graphics, 2023, 44(5): 1041-1049. |
[5] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[6] | LIN Xiao, QU Shi-cao, HUANG Wei, ZHENG Xiao-mei, MA Li-zhuang. Style transfer algorithm for salient region preservation [J]. Journal of Graphics, 2021, 42(2): 190-197. |
[7] | HOU Yu-kang, LV Jian, LIU Xiang, HU Tao, ZHAO Ze-yu. Innovative method of ethnic pattern based on neural style transfer network [J]. Journal of Graphics, 2020, 41(4): 606-613. |
[8] | SHI Min1, WEI Yu-kun1, WANG Jun-zheng1, MAO Tian-lu2 . Transfer Method of Body Shape-Oriented Garment Style [J]. Journal of Graphics, 2019, 40(5): 866-871. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||