Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2025, Vol. 46 ›› Issue (1): 139-149.DOI: 10.11996/JG.j.2095-302X.2025010139

• Computer Graphics and Virtual Reality • Previous Articles     Next Articles

Generalization optimization method for text to material texture maps based on diffusion model

TU Qinghao(), LI Yuanqi, LIU Yifan, GUO Jie(), GUO Yanwen   

  1. School of Computer Science, Nanjing University, Nanjing Jiangsu 210033, China
  • Received:2024-07-29 Accepted:2024-09-18 Online:2025-02-28 Published:2025-02-14
  • Contact: GUO Jie
  • About author:First author contact:

    TU Qinghao (1999-), master student. Her main research interests cover digital image processing and pattern recognition. E-mail:qinghaotu@126.com

Abstract:

Considering the current situation where existing material texture datasets lack sufficient textual descriptions, while pure image datasets are massive in scale, and the difficulty of obtaining additional hyperparameters to generate new results when traditional generative models encounter inference errors, a generalized optimization method for text to material texture maps based on a stable diffusion model was proposed. The model was trained in a staged manner: firstly, a large-scale pure image dataset was used to finetune the diffusion model to fit image generation. Secondly, a small-scale dataset with text annotations was employed to learn semantic information. Thirdly, a new decoder was introduced to reconstruct texture maps from the latent codes generated by the diffusion model; ultimately, multiple randomly generated texture maps that conformed to the given descriptions were obtained by inputting textual descriptions. The method employed the Colossal architecture to organize the code, significantly reducing hardware requirements for training. By separating the tasks of image fitting and semantic information learning, with large-scale image datasets used for model parameter fitting and small-scale text data used for learning semantic information, the method enhanced the generalization of the model and reduced the demand for multimodal dataset scale.

Key words: diffusion model, generalization, multimodal, text-driven texture generation, material editor

CLC Number: