Journal of Graphics ›› 2023, Vol. 44 ›› Issue (1): 120-130.DOI: 10.11996/JG.j.2095-302X.2023010120
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
PAN Dong-hui(), JIN Ying-han, SUN Xu, LIU Yu-sheng, ZHANG Dong-liang(
)
Received:
2022-04-24
Revised:
2022-07-01
Online:
2023-10-31
Published:
2023-02-16
Contact:
ZHANG Dong-liang
About author:
PAN Dong-hui (1997-), master student. His main research interest covers digital image processing. E-mail:417969567@qq.com
Supported by:
CLC Number:
PAN Dong-hui, JIN Ying-han, SUN Xu, LIU Yu-sheng, ZHANG Dong-liang. CTH-Net: CNN-Transformer hybrid network for garment image generation from sketches and color points[J]. Journal of Graphics, 2023, 44(1): 120-130.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023010120
Fig. 5 Pre-processing of a garment image ((a) Garment image; (b) Image normalization; (c) Watercolor image; (d) Smoothed image; (e) Sketches; (f) Color points; (g) Blurred image)
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 1 |
Epoch | 2 000 |
CPU | E3-1230v2 |
GPU | GTX1660s |
Memory | 16 G |
OS | Windows 10 |
Table 1 Parameter settings of training the sketch extraction model
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 1 |
Epoch | 2 000 |
CPU | E3-1230v2 |
GPU | GTX1660s |
Memory | 16 G |
OS | Windows 10 |
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 01 |
λ∗ | λrec:1, λfea:0.01, λadv:0.001, λR:0.01 |
Epoch | Drafting:500, Refinement:500, Tuning:50 |
CPU | E5-2695 v3 |
GPU | Tesla P100 |
Memory | 32 G |
OS | Ubuntu20.04 |
Table 2 Parameter settings of training the CTH-Net
参数名称 | 参数值 |
---|---|
Optimizer | AdamW |
Learning rate | 0.000 01 |
λ∗ | λrec:1, λfea:0.01, λadv:0.001, λR:0.01 |
Epoch | Drafting:500, Refinement:500, Tuning:50 |
CPU | E5-2695 v3 |
GPU | Tesla P100 |
Memory | 32 G |
OS | Ubuntu20.04 |
Fig. 8 Comparisons of generation results between CTH-Net and other methods ((a) Large area pattern; (b) Stripe pattern; (c) Solid color pattern; (d) Small area pattern; (e) Blue mickey pattern)
Fig. 10 More comparisons of generation results between CTH-Net and other networks ((a) Inputs; (b) MUNIT; (c) UNet; (d) Pix2PixHD; (e) Attention-UNet; (f) TrandGAN; (g) VQGAN; (h) CTH-Net)
方法 | HPR | IS | FID |
---|---|---|---|
MUNIT | 0.0 | 3.852 | 5.790 |
UNet | 1.3 | 4.266 | 2.340 |
PixPixHD | 1.0 | 4.133 | 2.464 |
Attention-UNet | 1.7 | 4.287 | 2.191 |
TransGAN | 0.8 | 4.174 | 2.419 |
VQGAN | 1.9 | 4.304 | 2.085 |
FashionImageDesign | 3.4 | - | - |
CTH-Net† (本文) | 4.4 | 4.427 | 1.872 |
CTH-Net (本文) | 5.5 | 4.583 | 1.496 |
Table 3 Comparisons of quantitative evaluations between CTH-Net and other methods
方法 | HPR | IS | FID |
---|---|---|---|
MUNIT | 0.0 | 3.852 | 5.790 |
UNet | 1.3 | 4.266 | 2.340 |
PixPixHD | 1.0 | 4.133 | 2.464 |
Attention-UNet | 1.7 | 4.287 | 2.191 |
TransGAN | 0.8 | 4.174 | 2.419 |
VQGAN | 1.9 | 4.304 | 2.085 |
FashionImageDesign | 3.4 | - | - |
CTH-Net† (本文) | 4.4 | 4.427 | 1.872 |
CTH-Net (本文) | 5.5 | 4.583 | 1.496 |
方法 | Batch size* | 训练一轮时间(s) | 显存占用(G) | IS | FID |
---|---|---|---|---|---|
有ToPatch与ToFeatureMap | 4 | 365.2 | 2.223 | 4.583 | 1.496 |
无ToPatch与ToFeatureMap | 4 | 1204.6 | 7.854 | 4.989 | 1.367 |
Table 4 Performance of ToPatch and ToFeatureMap
方法 | Batch size* | 训练一轮时间(s) | 显存占用(G) | IS | FID |
---|---|---|---|---|---|
有ToPatch与ToFeatureMap | 4 | 365.2 | 2.223 | 4.583 | 1.496 |
无ToPatch与ToFeatureMap | 4 | 1204.6 | 7.854 | 4.989 | 1.367 |
[1] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//The 31st Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 5998-6008. |
[2] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[3] | ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5967-5976. |
[4] | SANGKLOY P, LU J W, FANG C, et al. Scribbler: controlling deep image synthesis with sketch and color[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6836-6845. |
[5] | ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2242-2251. |
[6] | WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 8798-8807. |
[7] | ZHU J Y, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 465-476. |
[8] | ZHANG R, ISOLA P, EFROS A A. Colorful image colorization[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 649-666. |
[9] | YOU S, YOU N, PAN M X. PI-REC: progressive image reconstruction network with edge and color domain[EB/OL]. (2019-03-25) [2022-01-28]. https://arxiv.org/abs/1903.10146. |
[10] |
REN H, LI J, GAO N. Two-stage sketch colorization with color parsing[J]. IEEE Access, 2019, 8: 44599-44610.
DOI URL |
[11] | CHONG M J, FORSYTH D. JoJoGAN: one shot face stylization[EB/OL]. [2022-01-28].https://arxiv.org/abs/2112.11641. |
[12] | KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2019: 4396-4405. |
[13] | LIN Z, ZHANG Z, ZHANG K R, et al. Interactive style transfer: all is your palette[EB/OL]. [2022-01-28]. https://arxiv.org/abs/2203.13470. |
[14] | CHEN P, ZHANG Y, LI Z, et al. Few-Shot Incremental Learning for Label-to-Image Translation[C]//2022 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 3697-3707. |
[15] | LI Y, YU X G, HAN X G, et al. A deep learning based interactive sketching system for fashion images design[EB/OL]. (2020-10-09) [2022-01-12].https://arxiv.org/abs/2010.04413. |
[16] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-01-12]. https://arxiv.org/abs/2010.11929. |
[17] | CHEN H T, WANG Y H, GUO T Y, et al. Pre-trained image processing transformer[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Press, 2021: 12294-12305. |
[18] | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]//Computer Vision - ECCV 2020. Cham: Springer International Publishing, 2020: 213-229. |
[19] | JIANG Y F, CHANG S Y, WANG Z Y. TransGAN: two pure transformers can make one strong GAN, and that can scale up[EB/OL]. (2021-12-09) [2022-01-28].https://arxiv.org/abs/2102.07074. |
[20] | DENG Y Y, TANG F, DONG W M, et al. StyTr2: image style transfer with transformers[EB/OL]. [2022-01-12]. https://arxiv.org/abs/2105.14576. |
[21] | LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 10012-10022. |
[22] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. [2022-01-28].https://arxiv.org/abs/2010.11929. |
[23] | CHEN M, RADFORD A, CHILD R, et al. Generative pretraining from pixels[C]//The 37th International Conference on Machine Learning. New York: ACM, 2020: 1691-1703. |
[24] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[25] | HAN K, XIAO A, WU E H, et al. Transformer in transformer[EB/OL]. (2021-10-26) [2022-01-28]. https://arxiv.org/abs/2103.00112. |
[26] | ODENA A, DUMOULIN V, OLAH C. Deconvolution and checkerboard artifacts[EB/OL]. (2016-10-17) [2022-01-28]. https://distill.pub/2016/deconv-checkerboard/. |
[27] | ZHANG Z F, WANG Z W, LIN Z, et al. Image super-resolution by neural texture transfer[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7974-7983. |
[28] | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5769-5779. |
[29] | JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 694-711. |
[30] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-01- 28]. https://arxiv.org/abs/1409.1556. |
[31] | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5769-5779. |
[32] | BRADSKI G. The openCV library[J]. Dr. Dobbʹs Journal: Software Tools for the Professional Programmer, 2000, 25(11): 120-123. |
[33] |
HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Applied Statistics, 1979, 28(1): 100-108.
DOI URL |
[34] | SIMO-SERRA E, IIZUKA S, SASAKI K, et al. Learning to simplify: fully convolutional networks for rough sketch cleanup[J]. ACM Transactions on Graphics, 2016, 35(4): 121. 1-121.11. |
[35] |
CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698.
DOI URL |
[36] | YAN C, VANDERHAEGHE D, GINGOLD Y. A benchmark for rough sketch cleanup[J]. ACM Transactions on Graphics, 2020, 39(6): 163. 1-163.14. |
[37] | ZHANG Y L, LI K P, LI K, et al. Image super-resolution using very deep residual channel attention networks[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 286-301. |
[38] | HUANG X, LIU M Y, BELONGIE S, et al. Multimodal unsupervised image-to-image translation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 172-189. |
[39] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241. |
[40] | OKTAY O, SCHLEMPER J, FOLGOC L L, et al. Attention U-net: learning where to look for the pancreas[EB/OL]. (2018-05-20) [2022-01-28]. https://arxiv.org/abs/1804.03999. |
[41] | ESSER P, ROMBACH R, OMMER B. Taming transformers for high-resolution image synthesis[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12868-12878. |
[42] | SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//The 30th International Conference on Neural Information Processing Systems. New York: ACM, 2016: 2234-2242. |
[43] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640. |
[1] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[2] | DANG Hong-she, XU Huai-biao, ZHANG Xuan-de. Deep learning stereo matching algorithm fusing structural information [J]. Journal of Graphics, 2023, 44(5): 899-906. |
[3] | ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji. Multi-fitting detection method for transmission lines based on implicit spatial knowledge fusion [J]. Journal of Graphics, 2023, 44(5): 918-927. |
[4] | YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting [J]. Journal of Graphics, 2023, 44(5): 955-965. |
[5] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
[6] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. |
[7] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[8] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[9] | SHAO Jun-qi, QIAN Wen-hua, XU Qi-hao. Landscape image generation based on conditional residual generative adversarial network [J]. Journal of Graphics, 2023, 44(4): 710-717. |
[10] | DENG Wei-ming, YANG Tie-jun, LI Chun-chun, HUANG Lin. Object detection for nameplate based on neural architecture search [J]. Journal of Graphics, 2023, 44(4): 718-727. |
[11] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[12] | GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints [J]. Journal of Graphics, 2023, 44(4): 739-746. |
[13] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. |
[14] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[15] | WANG Jia-jing, WANG Chen, ZHU Yuan-yuan, WANG Xiao-mei. Graph element detection matching based on Republic of China banknotes [J]. Journal of Graphics, 2023, 44(3): 492-501. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||