图学学报 ›› 2023, Vol. 44 ›› Issue (1): 120-130.DOI: 10.11996/JG.j.2095-302X.2023010120
收稿日期:
2022-04-24
修回日期:
2022-07-01
出版日期:
2023-10-31
发布日期:
2023-02-16
通讯作者:
张东亮
作者简介:
潘东辉(1997-),男,硕士研究生。主要研究方向为数字图像处理。E-mail:417969567@qq.com
基金资助:
PAN Dong-hui(), JIN Ying-han, SUN Xu, LIU Yu-sheng, ZHANG Dong-liang(
)
Received:
2022-04-24
Revised:
2022-07-01
Online:
2023-10-31
Published:
2023-02-16
Contact:
ZHANG Dong-liang
About author:
PAN Dong-hui (1997-), master student. His main research interest covers digital image processing. E-mail:417969567@qq.com
Supported by:
摘要:
绘制服装效果图是服装设计过程中重要的一环,针对目前存在智能化程度不足、对用户绘画水平和想象能力要求较高等问题,提出了一种使用线稿和颜色点生成服装图像的CNN-Transformer混合网络CTH-Net。CTH-Net结合卷积神经网络(CNN)在提取局部信息和Transformer在处理长距离依赖方面的优势,将2个模型架构进行高效混合,并设计ToPatch和ToFeatureMap模块减小输入Transformer的数据量和维度以降低计算资源消耗。CTH-Net由3个阶段组成:一是草图阶段,旨在预测服装的颜色分布,获得没有渐变和阴影的水彩式图像;二是细化阶段,将水彩式图像细化为有光影效果的服装图像;三是调优阶段,组合一、二阶段的输出进一步优化生成质量。实验结果表明,仅需输入线稿和少量颜色点,CTH-Net便能生成出高质量的服装图像。与现有的方法相比,该网络生成图像的真实感和准确性均有较大优势。
中图分类号:
潘东辉, 金映含, 孙旭, 刘玉生, 张东亮. CTH-Net:从线稿和颜色点生成服装图像的CNN-Transformer混合网络[J]. 图学学报, 2023, 44(1): 120-130.
PAN Dong-hui, JIN Ying-han, SUN Xu, LIU Yu-sheng, ZHANG Dong-liang. CTH-Net: CNN-Transformer hybrid network for garment image generation from sketches and color points[J]. Journal of Graphics, 2023, 44(1): 120-130.
图5 服装图像预处理((a)服装效果图;(b)图像规范化;(c)水彩式图像;(d)平滑处理;(e)线稿图像;(f)颜色点;(g)模糊处理)
Fig. 5 Pre-processing of a garment image ((a) Garment image; (b) Image normalization; (c) Watercolor image; (d) Smoothed image; (e) Sketches; (f) Color points; (g) Blurred image)
图6 线稿提取流程((a)服装图像;(b)脏线稿;(c~d)训练数据示例;(e~f)目标数据示例)
Fig. 6 Process of sketch extraction ((a) Garment image; (b) Image with noise; (c-d) Examples of training data; (e-f) Examples of target data)
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 1 |
Epoch | 2 000 |
CPU | E3-1230v2 |
GPU | GTX1660s |
Memory | 16 G |
OS | Windows 10 |
表1 线稿提取模型训练参数
Table 1 Parameter settings of training the sketch extraction model
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 1 |
Epoch | 2 000 |
CPU | E3-1230v2 |
GPU | GTX1660s |
Memory | 16 G |
OS | Windows 10 |
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 01 |
λ∗ | λrec:1, λfea:0.01, λadv:0.001, λR:0.01 |
Epoch | Drafting:500, Refinement:500, Tuning:50 |
CPU | E5-2695 v3 |
GPU | Tesla P100 |
Memory | 32 G |
OS | Ubuntu20.04 |
表2 CTH-Net模型训练参数
Table 2 Parameter settings of training the CTH-Net
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 01 |
λ∗ | λrec:1, λfea:0.01, λadv:0.001, λR:0.01 |
Epoch | Drafting:500, Refinement:500, Tuning:50 |
CPU | E5-2695 v3 |
GPU | Tesla P100 |
Memory | 32 G |
OS | Ubuntu20.04 |
图8 CTH-Net与其他网络模型生成结果的对比((a)大面积图案;(b)条纹;(c)纯色;(d)小面积图案;(e)蓝色米老鼠)
Fig. 8 Comparisons of generation results between CTH-Net and other methods ((a) Large area pattern; (b) Stripe pattern; (c) Solid color pattern; (d) Small area pattern; (e) Blue mickey pattern)
图9 CTH-Net与其他网络模型生成结果的局部对比
Fig. 9 Comparisons of generation details between CTH-Net and other networks ((a) Inputs; (b) Aaaention-UNet; (c) Pix2PixHD; (d) VQGAN; (e) CTH-Net)
图10 CTH-Net与其他网络模型生成结果的更多对比
Fig. 10 More comparisons of generation results between CTH-Net and other networks ((a) Inputs; (b) MUNIT; (c) UNet; (d) Pix2PixHD; (e) Attention-UNet; (f) TrandGAN; (g) VQGAN; (h) CTH-Net)
方法 | HPR | IS | FID |
---|---|---|---|
MUNIT | 0.0 | 3.852 | 5.790 |
UNet | 1.3 | 4.266 | 2.340 |
PixPixHD | 1.0 | 4.133 | 2.464 |
Attention-UNet | 1.7 | 4.287 | 2.191 |
TransGAN | 0.8 | 4.174 | 2.419 |
VQGAN | 1.9 | 4.304 | 2.085 |
FashionImageDesign | 3.4 | - | - |
CTH-Net† (本文) | 4.4 | 4.427 | 1.872 |
CTH-Net (本文) | 5.5 | 4.583 | 1.496 |
表3 CTH-Net与其他方法的量化对比
Table 3 Comparisons of quantitative evaluations between CTH-Net and other methods
方法 | HPR | IS | FID |
---|---|---|---|
MUNIT | 0.0 | 3.852 | 5.790 |
UNet | 1.3 | 4.266 | 2.340 |
PixPixHD | 1.0 | 4.133 | 2.464 |
Attention-UNet | 1.7 | 4.287 | 2.191 |
TransGAN | 0.8 | 4.174 | 2.419 |
VQGAN | 1.9 | 4.304 | 2.085 |
FashionImageDesign | 3.4 | - | - |
CTH-Net† (本文) | 4.4 | 4.427 | 1.872 |
CTH-Net (本文) | 5.5 | 4.583 | 1.496 |
方法 | Batch size* | 训练一轮时间(s) | 显存占用(G) | IS | FID |
---|---|---|---|---|---|
有ToPatch与ToFeatureMap | 4 | 365.2 | 2.223 | 4.583 | 1.496 |
无ToPatch与ToFeatureMap | 4 | 1204.6 | 7.854 | 4.989 | 1.367 |
表4 ToPatch与ToFeatureMap模块的作用
Table 4 Performance of ToPatch and ToFeatureMap
方法 | Batch size* | 训练一轮时间(s) | 显存占用(G) | IS | FID |
---|---|---|---|---|---|
有ToPatch与ToFeatureMap | 4 | 365.2 | 2.223 | 4.583 | 1.496 |
无ToPatch与ToFeatureMap | 4 | 1204.6 | 7.854 | 4.989 | 1.367 |
[1] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//The 31st Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 5998-6008. |
[2] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[3] | ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5967-5976. |
[4] | SANGKLOY P, LU J W, FANG C, et al. Scribbler: controlling deep image synthesis with sketch and color[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6836-6845. |
[5] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2242-2251. |
[6] | WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8798-8807. |
[7] | ZHU J Y, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 465-476. |
[8] | ZHANG R, ISOLA P, EFROS A A. Colorful image colorization[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 649-666. |
[9] | YOU S, YOU N, PAN M X. PI-REC: progressive image reconstruction network with edge and color domain[EB/OL]. (2019-03-25) [2022-01-28]. https://arxiv.org/abs/1903.10146. |
[10] |
REN H, LI J, GAO N. Two-stage sketch colorization with color parsing[J]. IEEE Access, 2019, 8: 44599-44610.
DOI URL |
[11] | CHONG M J, FORSYTH D. JoJoGAN: one shot face stylization[EB/OL]. [2022-01-28].https://arxiv.org/abs/2112.11641. |
[12] | KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2019: 4396-4405. |
[13] | LIN Z, ZHANG Z, ZHANG K R, et al. Interactive style transfer: all is your palette[EB/OL]. [2022-01-28]. https://arxiv.org/abs/2203.13470. |
[14] | CHEN P, ZHANG Y, LI Z, et al. Few-Shot Incremental Learning for Label-to-Image Translation[C]//2022 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 3697-3707. |
[15] | LI Y, YU X G, HAN X G, et al. A deep learning based interactive sketching system for fashion images design[EB/OL]. (2020-10-09) [2022-01-12].https://arxiv.org/abs/2010.04413. |
[16] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-01-12]. https://arxiv.org/abs/2010.11929. |
[17] | CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2021: 12294-12305. |
[18] | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 213-229. |
[19] | JIANG Y F, CHANG S Y, WANG Z Y. TransGAN: two pure transformers can make one strong GAN, and that can scale up[EB/OL]. (2021-12-09) [2022-01-28].https://arxiv.org/abs/2102.07074. |
[20] | DENG Y Y, TANG F, DONG W M, et al. StyTr2: image style transfer with transformers[EB/OL]. [2022-01-12]. https://arxiv.org/abs/2105.14576. |
[21] | LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 10012-10022. |
[22] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2022-01-28].https://arxiv.org/abs/2010.11929. |
[23] | CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//The 37th International Conference on Machine Learning. New York: ACM, 2020: 1691-1703. |
[24] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[25] | HAN K, XIAO A, WU E H, et al. Transformer in transformer[EB/OL]. (2021-10-26) [2022-01-28]. https://arxiv.org/abs/2103.00112. |
[26] | ODENA A, DUMOULIN V, OLAH C. Deconvolution and checkerboard artifacts[EB/OL]. (2016-10-17) [2022-01-28]. https://distill.pub/2016/deconv-checkerboard/. |
[27] | ZHANG Z F, WANG Z W, LIN Z, et al. Image super-resolution by neural texture transfer[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7974-7983. |
[28] | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5769-5779. |
[29] | JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 694-711. |
[30] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-01- 28]. https://arxiv.org/abs/1409.1556. |
[31] | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5769-5779. |
[32] | BRADSKI G. The openCV library[J]. Dr. Dobbʹs Journal: Software Tools for the Professional Programmer, 2000, 25(11): 120-123. |
[33] |
HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Applied Statistics, 1979, 28(1): 100-108.
DOI URL |
[34] | SIMO-SERRA E, IIZUKA S, SASAKI K, et al. Learning to simplify: fully convolutional networks for rough sketch cleanup[J]. ACM Transactions on Graphics, 2016, 35(4): 121. 1-121.11. |
[35] |
CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698.
DOI URL |
[36] | YAN C, VANDERHAEGHE D, GINGOLD Y. A benchmark for rough sketch cleanup[J]. ACM Transactions on Graphics, 2020, 39(6): 163. 1-163.14. |
[37] | ZHANG Y L, LI K P, LI K, et al. Image super-resolution using very deep residual channel attention networks[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 286-301. |
[38] | HUANG X, LIU M Y, BELONGIE S, et al. Multimodal unsupervised image-to-image translation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 172-189. |
[39] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241. |
[40] | OKTAY O, SCHLEMPER J, FOLGOC L L, et al. Attention U-net: learning where to look for the pancreas[EB/OL]. (2018-05-20) [2022-01-28]. https://arxiv.org/abs/1804.03999. |
[41] | ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12868-12878. |
[42] | SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 2234-2242. |
[43] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640. |
[1] | 杨陈成, 董秀成, 侯兵, 张党成, 向贤明, 冯琪茗. 基于参考的Transformer纹理迁移深度图像超分辨率重建[J]. 图学学报, 2023, 44(5): 861-867. |
[2] | 党宏社, 许怀彪, 张选德. 融合结构信息的深度学习立体匹配算法[J]. 图学学报, 2023, 44(5): 899-906. |
[3] | 翟永杰, 郭聪彬, 王乾铭, 赵宽, 白云山, 张冀. 基于隐含空间知识融合的输电线路多金具检测方法[J]. 图学学报, 2023, 44(5): 918-927. |
[4] | 杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965. |
[5] | 毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639. |
[6] | 郝帅, 赵新生, 马旭, 张旭, 何田, 侯李祥. 基于TR-YOLOv5的输电线路多类缺陷目标检测方法[J]. 图学学报, 2023, 44(4): 667-676. |
[7] | 曹义亲, 周一纬, 徐露. 基于E-YOLOX的实时金属表面缺陷检测算法[J]. 图学学报, 2023, 44(4): 677-690. |
[8] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. |
[9] | 邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717. |
[10] | 邓渭铭, 杨铁军, 李纯纯, 黄琳. 基于神经网络架构搜索的铭牌目标检测方法[J]. 图学学报, 2023, 44(4): 718-727. |
[11] | 余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738. |
[12] | 郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746. |
[13] | 李刚, 张运涛, 汪文凯, 张东阳. 采用DETR与先验知识融合的输电线路螺栓缺陷检测方法[J]. 图学学报, 2023, 44(3): 438-447. |
[14] | 毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455. |
[15] | 王佳婧, 王晨, 朱媛媛, 王笑梅. 基于民国纸币的图元素匹配检索[J]. 图学学报, 2023, 44(3): 492-501. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||