Journal of Graphics ›› 2025, Vol. 46 ›› Issue (1): 139-149.DOI: 10.11996/JG.j.2095-302X.2025010139
• Computer Graphics and Virtual Reality • Previous Articles Next Articles
TU Qinghao(), LI Yuanqi, LIU Yifan, GUO Jie(
), GUO Yanwen
Received:
2024-07-29
Accepted:
2024-09-18
Online:
2025-02-28
Published:
2025-02-14
Contact:
GUO Jie
About author:
First author contact:TU Qinghao (1999-), master student. Her main research interests cover digital image processing and pattern recognition. E-mail:qinghaotu@126.com
CLC Number:
TU Qinghao, LI Yuanqi, LIU Yifan, GUO Jie, GUO Yanwen. Generalization optimization method for text to material texture maps based on diffusion model[J]. Journal of Graphics, 2025, 46(1): 139-149.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025010139
Fig. 8 Multimodal dataset ((a) Wood tiles with herringbone and zigzagged pattern; (b) White tiles with black grid and small flower pattern; (c) Tiles with blue and white checkboard pattern; (d) Red black and white fabric with diamond and checkboard nattern; (e) Brown reflective shiny marble with cracked pattern; (f) Dark brown dirty gravel with pitted pattern; (g) Cliff rock with stratified pattern; (h) Brown old brick wall with staggered pattern)
模型 | 材质种类 | IS↑ | FID↓ |
---|---|---|---|
Text2Mat | 织物 | 3.23 | 91.45 |
瓷砖 | 7.49 | 69.06 | |
地砖 | 1.92 | 76.29 | |
木板 | 3.17 | 49.38 | |
岩石 | 2.50 | 43.33 | |
所有测试数据均值 | 4.02 | 75.31 | |
Ours | 织物 | 4.08 | 62.86 |
瓷砖 | 10.14 | 73.22 | |
地砖 | 6.45 | 55.13 | |
木板 | 2.92 | 32.18 | |
岩石 | 1.48 | 47.94 | |
所有测试数据均值 | 4.63 | 64.73 |
Table 1 Quantitative analysis results
模型 | 材质种类 | IS↑ | FID↓ |
---|---|---|---|
Text2Mat | 织物 | 3.23 | 91.45 |
瓷砖 | 7.49 | 69.06 | |
地砖 | 1.92 | 76.29 | |
木板 | 3.17 | 49.38 | |
岩石 | 2.50 | 43.33 | |
所有测试数据均值 | 4.02 | 75.31 | |
Ours | 织物 | 4.08 | 62.86 |
瓷砖 | 10.14 | 73.22 | |
地砖 | 6.45 | 55.13 | |
木板 | 2.92 | 32.18 | |
岩石 | 1.48 | 47.94 | |
所有测试数据均值 | 4.63 | 64.73 |
Fig. 9 Comparison of results between our work and Text2Mat/Polycam ((a) Black and blue checkboard tiled tiles; (b) Dirty ground; (c) Dark brown leather with diamond pattern; (d) Arc paved pavement; (e) White and purple ceramic with chequered pattern; (f) Clean red smooth tiles with I-shaped pattern; (g) Shiny silver metal; (h) black fabric; (i) Leather)
Fig. 11 Results generated by the proposed method using different formats of text descriptions ((a) Yellow tiles chequered; (b) Yellow tiles with chequered pattern; (c) Yellow and brown tiles arranged in chequered pattern; (d) A tile texture with brown chequered pattern; (e) A tile texture which has yellow and brown chequered and camouflage pattern)
[1] | ZHOU Z M, CHEN G J, DONG Y, et al. Sparse-as-possible SVBRDF acquisition[J]. ACM Transactions on Graphics (TOG), 2016, 35(6): 189. |
[2] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[3] | DONG Y, WANG J P, TONG X, et al. Manifold bootstrapping for SVBRDF capture[J]. ACM Transactions on Graphics (TOG), 2010, 29(4): 98. |
[4] | DESCHAINTRE V, AITTALA M, DURAND F, et al. Single-image SVBRDF capture with a rendering-aware deep network[J]. ACM Transactions on Graphics (TOG), 2018, 37(4): 128. |
[5] | GAO D, LI X, DONG Y, et al. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images[J]. ACM Transactions on Graphics (TOG), 2019, 38(4): 134. |
[6] | GUO J, LAI S C, TAO C Z, et al. Highlight-aware two-stream network for single-image SVBRDF acquisition[J]. ACM Transactions on Graphics (TOG), 2021, 40(4): 123. |
[7] | ZHOU X L, KALANTARI N K. Adversarial single‐image SVBRDF estimation with hybrid training[J]. Computer Graphics Forum, 2021, 40(2): 315-325. |
[8] | HENZLER P, DESCHAINTRE V, MITRA N J, et al. Generative modelling of BRDF textures from flash images[J]. ACM Transactions on Graphics (TOG), 2021, 40(6): 284. |
[9] | HU Y W, DORSEY J, RUSHMEIER H. A novel framework for inverse procedural texture modeling[J]. ACM Transactions on Graphics (TOG), 2019, 38(6): 186. |
[10] | SHI L, LI B C, HAŠAN M, et al. Match: differentiable material graphs for procedural material capture[J]. ACM Transactions on Graphics (TOG), 2020, 39(6): 196. |
[11] | GUO Y, SMITH C, HAŠAN M, et al. MaterialGAN: reflectance capture using a generative SVBRDF model[J]. ACM Transactions on Graphics (TOG), 2020, 39(6): 254. |
[12] | ZHOU X L, HASAN M, DESCHAINTRE V, et al. TileGen: tileable, controllable material generation and capture[C]// The SIGGRAPH Asia 2022 Conference. New York: ACM, 2022: 34. |
[13] | GUERRERO P, HAŠAN M, SUNKAVALLI K, et al. MatFormer: a generative model for procedural materials[J]. ACM Transactions on Graphics (TOG), 2022, 41(4): 46. |
[14] | SOHL-DICKSTEIN J, WEISS E A, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[EB/OL]. [2024-05-29]. https://dl.acm.org/ doi/10.5555/3045118.3045358. |
[15] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574. |
[16] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10684-10695. |
[17] | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. |
[18] | PEEBLES W, XIE S N. Scalable diffusion models with transformers[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 4195-4205. |
[19] | LIU L P, REN Y, LIN Z J, et al. Pseudo numerical methods for diffusion models on manifolds[EB/OL]. (2022-02-18) [2024-05-31]. https://arxiv.org/abs/2202.09778. |
[20] | BAO F, LI C X, ZHU J, et al. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models[EB/OL]. (2022-01-16) [2024-05-31]. https://arxiv.org/abs/2201.06503. |
[21] | RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB/OL]. (2022-04-21) [2024-05-31]. https://arxiv.org/abs/2204.06125. |
[22] | SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 2643. |
[23] | SCHUHMANN C, BEAUMONT R, VENCU R, et al. LAION-5B: an open large-scale dataset for training next generation image-text models[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 1833. |
[24] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-05-29]. https://dblp.uni-trier.de/db/conf/icml/icml2021.html#RadfordKHRGASAM21. |
[25] | MCINNES L, HEALY J, MELVILLE J. UMAP: uniform manifold approximation and projection for dimension reduction[EB/OL]. (2018-02-09) [2024-04-31]. https://arxiv.org/abs/1802.03426. |
[26] | LIANG W X, ZHANG Y H, KWON Y, et al. Mind the gap: understanding the modality gap in multi-modal contrastive representation learning[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 1280. |
[27] | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The Journal of Machine Learning Research, 2020, 21(1): 140. |
[28] | ZHOU Y F, LIU B C, ZHU Y Z, et al. Shifted diffusion for text-to-image generation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10157-10166. |
[29] | SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]// The 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 2234-2242. |
[30] | IKEUCHI K. Computer vision: a reference guide[M]. 2nd ed. Cham: Springer, 2021: 40. |
[31] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6629-6640. |
[32] | HE Z, GUO J, ZHANG Y, et al. Text2Mat: generating materials from text[EB/OL]. [2024-05-29]. https://diglib.eg.org/items/0216dc7e-da3d-4305-8698-fd0463337316. |
[33] | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2818-2826. |
[1] | WU Jingyi, JING Jun, HE Yifan, ZHANG Shiyu, KANG Yunfeng, TANG Wei, KONG Delan, LIU Xiangdong. Traffic anomaly event analysis method for highway scenes based on multimodal large language models [J]. Journal of Graphics, 2024, 45(6): 1266-1276. |
[2] | NI Yunhao, HUANG Lei. Domain generalization based on data representation invariance [J]. Journal of Graphics, 2024, 45(4): 705-713. |
[3] | ZHANG Ji, CUI Wenshuai, ZHANG Ronghua, WANG Wenbin, LI Yaqi. A text-driven 3D scene editing method based on key views [J]. Journal of Graphics, 2024, 45(4): 834-844. |
[4] | WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model [J]. Journal of Graphics, 2023, 44(6): 1218-1226. |
[5] | SUN Ya-nan, WEN Yu-hui, SHU Ye-zhi, LIU Yong-jin . Multimodal emotion recognition with action features [J]. Journal of Graphics, 2022, 43(6): 1159-1169. |
[6] |
LI Xiao-ying, YU Ya-ping.
Research on interactive design of children’s sound and painting based on multimodal sensory experience
[J]. Journal of Graphics, 2022, 43(4): 736-743.
|
[7] | DENG Zhuang-lin, ZHANG Shao-bing, CHENG Miao, HE Lian. Homography estimation for multimodal coin images [J]. Journal of Graphics, 2022, 43(3): 361-369. |
[8] | HU Jun, GU Jing-jing, WANG Qiu-hong. Multimodal small target detection based on remote sensing image [J]. Journal of Graphics, 2022, 43(2): 197-204. |
[9] | HUANG Huan , SUN Li-juan, CAO Ying , GUO Jian, REN Heng-yi. Multimodal sentiment analysis of short videos based on attention [J]. Journal of Graphics, 2021, 42(1): 8-14. |
[10] | MU Da-qing, LI Teng. Face anti-spoofing technology based on multi-modal fusion [J]. Journal of Graphics, 2020, 41(5): 750-756. |
[11] | JIANG Sheng-nan, CHEN En-qing, ZHEN Ming-yao, DUAN Jian-kang . Human action recognition based on ResNeXt [J]. Journal of Graphics, 2020, 41(2): 277-282. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||