内容语义和风格特征匹配一致的艺术风格迁移

doi:10.11996/JG.j.2095-302X.2023040699

图学学报 ›› 2023, Vol. 44 ›› Issue (4): 699-709.DOI: 10.11996/JG.j.2095-302X.2023040699

• 图像处理与计算机视觉 • 上一篇下一篇

内容语义和风格特征匹配一致的艺术风格迁移

李鑫¹(), 普园媛¹^,²(), 赵征鹏¹, 徐丹¹, 钱文华¹

1.云南大学信息学院，云南昆明 650500
2.云南省高校物联网技术及应用重点实验室，云南昆明 650500

收稿日期:2022-12-06 接受日期:2023-03-06 出版日期:2023-08-31 发布日期:2023-08-16
通讯作者: 普园媛(1972-)，女，教授，博士。主要研究方向为数字图像处理、非真实感绘制和视觉艺术科学理解等。E-mail：yuanyuanpu@ynu.edu.cn
作者简介:
李鑫(1997-)，男，硕士研究生。主要研究方向为图像风格迁移。E-mail：3323163785@qq.com
基金资助:
国家自然科学基金项目(61163019);国家自然科学基金项目(61271361);国家自然科学基金项目(61761046);国家自然科学基金项目(U1802271);国家自然科学基金项目(61662087);国家自然科学基金项目(62061049);云南省科技厅项目(2014FA021);云南省科技厅项目(2018FB100);云南省科技厅应用基础研究计划重点项目(202001BB050043);云南省科技厅应用基础研究计划重点项目(2019FA044);云南省重大科技专项计划项目(202002AD080001);云南省中青年学术技术带头人后备人才项目(2019HB121)

Content semantics and style features match consistent artistic style transfer

LI Xin¹(), PU Yuan-yuan¹^,²(), ZHAO Zheng-peng¹, XU Dan¹, QIAN Wen-hua¹

1. School of Information Science and Engineering, Yunnan University, Kunming Yunnan 650500, China
2. University Key Laboratory of Internet of Things Technology and Application, Kunming Yunnan 650500, China

Received:2022-12-06 Accepted:2023-03-06 Online:2023-08-31 Published:2023-08-16
Contact: PU Yuan-yuan (1972-), professor, Ph.D. Her main research interests cover digital image processing, non-realistic drawing, and scientific understanding of visual arts, etc. E-mail：yuanyuanpu@ynu.edu.cn
About author:
LI Xin (1997-), master student. His main research interest covers image style transfer. E-mail：3323163785@qq.com
Supported by:
National Natural Science Foundation of China(61163019);National Natural Science Foundation of China(61271361);National Natural Science Foundation of China(61761046);National Natural Science Foundation of China(U1802271);National Natural Science Foundation of China(61662087);National Natural Science Foundation of China(62061049);Project of Department of Science and Technology of Yunnan Province(2014FA021);Project of Department of Science and Technology of Yunnan Province(2018FB100);Key Project of Applied Basic Research Program of Yunnan Provincial Science and Technology Department(202001BB050043);Key Project of Applied Basic Research Program of Yunnan Provincial Science and Technology Department(2019FA044);Major Science and Technology Special Program Projects in Yunnan Province(202002AD080001);Reserve Talents of Young and Middle-Aged Academic and Technical Leaders in Yunnan Province(2019HB121)

摘要/Abstract

摘要：

随着计算机视觉领域的发展，图像风格迁移已经成为一个具有挑战性和研究价值的重要课题。针对现有方法无法有效保留内容图像物体轮廓和同种内容语义迁移多种不同风格特征的问题，提出了一个内容语义和风格特征匹配一致的艺术风格迁移网络。首先，利用双支路特征处理模块增强风格特征和内容特征，并保留内容图像的物体轮廓；然后，在注意力特征空间中实现特征分布对齐和融合；最后，采用具有空间感知能力的插值模块实现内容语义的风格一致化。使用82 783张真实照片和80 095张艺术画像进行风格迁移训练，另各使用1 000张真实照片和艺术画像进行测试。实验通过与最新的4种风格迁移方法进行比较，并进行消融实验分别验证该框架与所加损失函数的有效性。实验结果表明，本文网络在256像素图像生成中平均运行时间为9.42 ms，在512像素图像生成中平均运行时间为10.23 ms；同时避免了内容结构扭曲失真，并将内容语义和风格特征匹配一致，具有更好的艺术视觉效果。

关键词: 卷积神经网络, 图像风格迁移, 注意力机制, 风格一致化, 特征融合

Abstract:

The development of computer vision has rendered image style transfer a challenging and valuable subject of research. Nonetheless, existing methods are unable to effectively preserve object contours of content images while migrating many different style features with the same content semantics. In response, an artistic style transfer network, with consistent matching of content semantics and style features, was proposed. First, a two-branch feature processing module was employed to enhance the style and content features and retain the object contours of content images. Subsequently, feature distribution alignment and fusion were achieved within the attentional feature space. Finally, an interpolation module with spatial perception capability was utilized to achieve style consistency of content semantics. The network was trained with 82 783 actual photos and 80 095 artistic portraits for style transfer. Furthermore, 1 000 actual photos and 1 000 artistic portraits were used for testing. The effectiveness of the proposed framework and the added loss function was verified through experiments, which included comparing it with the latest four style transfer methods and conducting ablation experiments, respectively. The experimental results demonstrated that the proposed network could run at an average time of 9.42 ms in 256-pixel image generation and 10.23 ms in 512-pixel image generation, while avoiding distortion of content structure and matching content semantics and style features consistently, with better artistic visual effects.

Key words: convolutional neural network, image style transfer, attention mechanism, style consistency, feature fusion

中图分类号:

TP391

李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709.

LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer[J]. Journal of Graphics, 2023, 44(4): 699-709.

图/表 9

图1 网络总框架

Fig. 1 General framework of the network

图2 CSMCNet

Fig. 2 CSMCNet

图3 实验效果展示

Fig. 3 Experimental effect display ((a) Content; (b) Style)

图4 与现有方法的对比

Fig. 4 Comparison with existing methods ((a) Content; (b) Style; (c) Ours; (d) PAMA; (e) AdaAttN; (f) MANet; (g) SANet)

图5 损失消融实验

Fig. 5 Loss ablation experiment ((a) Content; (b) Style; (c) w/o Lc; (d) w/o Lrec; (e) w/o Lr; (f) w/o Lh; (g) All loss)

图6 结构消融实验((a) Content；(b) Style；(c) w/o白化；(d) w/o空间插值；(e) w/o融合模块；(f) Ours)

Fig. 6 Structure ablation experiment ((a) Content; (b) Style; (c) w/o whitening; (d) w/o spatial interpolation; (e) w/o fusion module; (f) Ours)

表1 定量比较

Table 1 Quantitative comparison

方法	FID↓		CF↑	GE↑	LP↑
方法	t-c	t-s	CF↑	GE↑	LP↑
Ours	300.04	506.83	0.57	0.82	0.52
PAMA	483.99	498.73	0.50	0.86	0.50
AdaAttN	357.16	508.73	0.53	0.85	0.49
MANet	499.84	497.21	0.47	0.80	0.48
SANet	483.95	532.29	0.51	0.84	0.50

表2 图像风格化平均运行时间(ms)

Table 2 Average running time of image stylization (ms)

方法	时间
方法	256×256	512×512
Ours	9.42	10.23
PAMA	8.53	9.87
AdaAttN	19.76	22.52
MANet	8.20	8.99
SANet	4.79	6.35

图7 用户研究

Fig. 7 User research

参考文献 33

[1]	ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2242-2251.
[2]	GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2414-2423.
[3]	GATYS L A, ECKER A S, BETHGE M, et al. Controlling perceptual factors in neural style transfer[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3730-3738.
[4]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[5]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI URL
[6]	CHANDRAN P, ZOSS G, GOTARDO P, et al. Adaptive convolutions for structure-aware style transfer[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7972-7981.
[7]	WU H, SUN Z X, YUAN W H. Direction-aware neural style transfer[C]// The 26th ACM International Conference on Multimedia. New York: ACM, 2018: 1163-1171.
[8]	ZHI Y H, WEI H W, NI B B. Structure guided photorealistic style transfer[C]// The 26th ACM International Conference on Multimedia. New York: ACM, 2018: 365-373.
[9]	HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 1510-1519.
[10]	JING Y C, LIU X, DING Y K, et al. Dynamic instance normalization for arbitrary style transfer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4): 4369-4376. DOI URL
[11]	LI Y J, FANG C, YANG J M, et al. Diversified texture synthesis with feed-forward networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 266-274.
[12]	PARK D Y, LEE K H. Arbitrary style transfer with style-attentional networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 5873-5881.
[13]	DENG Y Y, TANG F, DONG W M, et al. Arbitrary style transfer via multi-adaptation network[C]// The 28th ACM International Conference on Multimedia. New York: ACM, 2020: 2719-2727.
[14]	LIU S H, LIN T W, HE D L, et al. AdaAttN: revisit attention mechanism in arbitrary neural style transfer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6629-6638.
[15]	LUO X, HAN Z, YANG L K, et al. Consistent style transfer[EB/OL]. (2022-01-06) [2022-07-16]. https://arxiv.org/abs/2201.02233.
[16]	LIU X C, CHENG M M, LAI Y K, et al. Depth-aware neural style transfer[C]// Proceedings of the Symposium on Non-Photorealistic Animation and Rendering. New York: ACM, 2017: 1-10.
[17]	WANG X, OXHOLM G, ZHANG D, et al. Multimodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 7178-7186.
[18]	JING Y C, LIU Y, YANG Y Z, et al. Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 244-260.
[19]	KOTOVENKO D, SANAKOYEU A, LANG S, et al. Content and style disentanglement for artistic style transfer[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 4421-4430.
[20]	CHEN D D, YUAN L, LIAO J, et al. StyleBank: an explicit representation for neural image style transfer[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2770-2779.
[21]	LI Y J, FANG C, YANG J M, et al. Diversified texture synthesis with feed-forward networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 266-274.
[22]	SHENG L, LIN Z Y, SHAO J, et al. Avatar-net: multi-scale zero-shot style transfer by feature decoration[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8242-8250.
[23]	GU S Y, CHEN C L, LIAO J, et al. Arbitrary style transfer with deep feature reshuffle[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8222-8231.
[24]	LIU S G, ZHU T. Structure-guided arbitrary style transfer for artistic image and video[J]. IEEE Transactions on Multimedia, 2022, 24: 1299-1312. DOI URL
[25]	LI Y, FANG C, YANG J, et al. Universal style transfer via feature transforms[EB/OL]. [2022-07-15]. https://dl.acm.org/doi/10.5555/3294771.3294808.
[26]	YAO Y, REN J Q, XIE X S, et al. Attention-aware multi-stroke style transfer[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1467-1475.
[27]	CHEN H, WANG Z, ZHANG H, et al. Artistic style transfer with internal-external learning and contrastive learning[EB/OL]. [2022-07-10]. https://openreview.net/forum?id=hm0i-cunzGW.
[28]	KOLKIN N, SALAVON J, SHAKHNAROVICH G. Style transfer by relaxed optimal transport and self-similarity[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 10043-10052.
[29]	AFIFI M, BRUBAKER M A, BROWN M S. HistoGAN: controlling colors of GAN-generated and real images via color histograms[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 7937-7946.
[30]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2014: 740-755.
[31]	PHILLIPS F, MACKINTOSH B. Wiki art gallery, inc.: a case for critical thinking[J]. Issues in Accounting Education, 2011, 26(3): 593-608. DOI URL
[32]	KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2017-01-30) [2022-07-17]. https://arxiv.org/abs/1412.6980.
[33]	WANG Z Z, ZHAO L, CHEN H B, et al. Evaluate and improve the quality of neural style transfer[J]. Computer Vision and Image Understanding, 2021, 207: 103203. DOI URL

内容语义和风格特征匹配一致的艺术风格迁移

Content semantics and style features match consistent artistic style transfer

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨陈成 , 董秀成 , 侯兵 , 张党成 , 向贤明 , 冯琪茗 . 基于参考的Transformer纹理迁移深度图像超分辨率重建 [J]. 图学学报, 2023, 44(5): 861-867.
[2]	宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭 . 基于改进教师学生网络的隧道火灾检测 [J]. 图学学报, 2023, 44(5): 978-987.
[3]	毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
[4]	李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666.
[5]	邓渭铭, 杨铁军, 李纯纯, 黄琳. 基于神经网络架构搜索的铭牌目标检测方法[J]. 图学学报, 2023, 44(4): 718-727.
[6]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
[7]	胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437.
[8]	郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464.
[9]	罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472.
[10]	李雨, 闫甜甜, 周东生, 魏小鹏. 基于注意力机制与深度多尺度特征融合的自然场景文本检测[J]. 图学学报, 2023, 44(3): 473-481.
[11]	刘冰, 叶成绪. 面向不平衡数据的肺部疾病细粒度分类模型[J]. 图学学报, 2023, 44(3): 513-520.
[12]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[13]	吴文欢, 张淏坤. 融合空间十字注意力与通道注意力的语义分割网络[J]. 图学学报, 2023, 44(3): 531-539.
[14]	杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215.
[15]	陆秋, 邵铧泽, 张云磊. 动态平衡多尺度特征融合的结直肠息肉分割[J]. 图学学报, 2023, 44(2): 225-232.