欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (2): 382-392.DOI: 10.11996/JG.j.2095-302X.2025020382

• 计算机图形学与虚拟现实 • 上一篇    下一篇

人在环路的细分行业logo生成方法

李纪远(), 管哲予, 宋海川(), 谭鑫, 马利庄   

  1. 华东师范大学计算机科学与技术学院,上海 200062
  • 收稿日期:2024-07-08 接受日期:2024-10-08 出版日期:2025-04-30 发布日期:2025-04-24
  • 通讯作者:宋海川(1986-),男,副教授,博士。主要研究方向为计算机辅助设计、计算机视觉。E-mail:hcsong@cs.ecnu.edu.cn
  • 第一作者:李纪远(2004-),男,本科生。主要研究方向为计算机辅助设计。E-mail:leehenry1024@qq.com
  • 基金资助:
    国家自然科学基金(62302167);国家自然科学基金(62222602);上海市青年科技英才扬帆计划(23YF1410500)

Human-in-the-loop field-specific logo generation method

LI Jiyuan(), GUAN Zheyu, SONG Haichuan(), TAN Xin, MA Lizhuang   

  1. School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
  • Received:2024-07-08 Accepted:2024-10-08 Published:2025-04-30 Online:2025-04-24
  • First author:LI Jiyuan (2004-),undergraduate student. His main research interest covers computer-aided design. E-mail:leehenry1024@qq.com
  • Supported by:
    National Natural Science Foundation of China(62302167);National Natural Science Foundation of China(62222602);Shanghai Youth Science and Technology Talents Sailing Program(23YF1410500)

摘要:

logo图像相比于其他生成图像类型,有着高度抽象、设计多变、风格统一的特点,因此较难直接控制生成结果。为了实现符合各行业特点、满足多种设计构成形态需要的logo高效生成,提出了一种人在环路的细分领域logo生成方法。首先,基于Dreambooth微调文生图扩散模型,以网络公开资源搜集的logo作为数据集,将文生图模型Stable Diffusion XL作为基座模型训练出适用于基础logo生成的“雏形模型”。然后,构造出多组适用于各目标行业领域的文本提示词库,在提示词库指导下,通过雏形模型对各目标行业的logo进行生成。接着,通过人工介入对生成结果进行筛选,推演构造出符合行业需求的二次数据集。最后,利用得到二次数据集对模型基于LoRA进行迭代微调,得到生成logo的“成品模型”,并通过生成图像与提示词的余弦相似度以及人工问卷指标对成品模型的生成结果进行评估,验证了成品模型生成的logo图像在行业关联度、结构完整性以及美观程度等评价维度上相比于未经过上述处理的原模型直接生成的效果有可观提升。

关键词: 图像生成, 扩散模型, 人在回路, 训练集构造, 文本合成图像

Abstract:

Compared to other types of generated pictures, logos are highly abstract, diversely-designed and unified in styles, making it challenging to directly control the outcome of the generated pictures. In an effort to efficiently generate logos that are in line with the characteristics of various industries and meet the requirements of multiple designs of composition patterns, a Human-in-the-Loop field-specific logo generation method was proposed. Firstly, based on Dreambooth, a method for fine tuning text-to-image diffusion models, and a dataset composed of logos collected from publicly available online sources the text-to-image model Stable Diffusion XL was utilized as the base model and trained to develop a “prototype model” for basic logo generation. Then, groups of lexicons for targeted industries were constructed. The prototype model was then used to generate logos for targeted industries under the guidance of the lexicons. Next, via human intervention, the generated outcomes were filtered into secondary datasets tailored to industry needs. Finally, “prototype model” was iteratively fine-tuned using LoRA and the secondary datasets, obtaining the final model for logo generation. The generated results of the final model were evaluated using cosine similarity between generated images and prompt words, as well as manual questionnaire indicators. The evaluation demonstrated that the logos generated by the final model have a considerable exhibited significant improvements in industry relevance, structural integrity, and aesthetic appearance compared to those generated directly by the untrained base model.

Key words: image synthesis, diffusion model, human-in-the-loop, training set construction, text to image

中图分类号: