欢迎访问《图学学报》 分享到:

图学学报 ›› 2021, Vol. 42 ›› Issue (6): 908-916.DOI: 10.11996/JG.j.2095-302X.2021060908

• 图像处理与计算机视觉 • 上一篇    下一篇

基于知识元模型的跨模态聊天卡通表情图像合成

  

  1. 云南大学软件学院,云南 昆明 650500
  • 出版日期:2022-01-18 发布日期:2022-01-18
  • 基金资助:
    云南省科技厅面上项目(202001BB050035,202001BB05003);中国科协“青年人才托举工程”项目(W8193209) 

Cross-modal chat cartoon emoticon image synthesis based on knowledge meta-model 

#br#   

  1. School of Software, Yunnan University, Kunming Yunnan 650500, China
  • Online:2022-01-18 Published:2022-01-18
  • Supported by:
    General Project of Yunnan Provincial Department of Science and Technology (202001BB050035, 202001BB05003); China Association for Science and Technology “Young Talents Support Project” (W8193209) 

摘要: 传统的聊天卡通表情图像生成技术主要基于预定义的聊天卡通表情图像库,通过用户的语义描 述,进行“语义-视觉”跨模态检索,匹配合适的表情图像。但是,预定义表情图像库样本数量有限且是固定 形式的,在实际的聊天场景中常常出现表情图像的错误匹配或无合适匹配。针对此问题,聚焦于合成新的聊天 卡通表情图像而非检索,设计了一种基于知识元模型的跨模态聊天卡通表情图像合成方法,根据用户的语义描 述,即时合成对应的聊天卡通表情图像。通过表情知识元模型建立聊天卡通表情图像的内在语义逻辑关系,增 强聊天卡通表情图像合成的语义一致性。通过多生成器模型,从每个元知识点合成对应的局部图像,再经过联 合生成器整合为完整的卡通表情图像,极大地减少了训练样本需求。在公开的聊天卡通表情图像合成数据集的 测试中,该方法在语义一致性上取得了更好的结果,同时在图像质量上与现有的方法具有可比性。

关键词: 图像生成, 跨模态学习, 文本合成图像(T2I), 知识元模型, 图像表情包

Abstract: The traditional chat cartoon emoticon technologies are mainly based on the predefined chat cartoon emoticon library. Through the semantic description of users, the “semantic-to-visual” cross-modal retrieval is carried out to match the appropriate emoticon. However, the number of predefined emoticon samples in the library is limited and fixed. In the actual chat scenarios, the emoticon is often mismatched or there is no match at all. In view of this problem, this research focused on synthesizing new chat cartoon emoticon rather than retrieval. A new method of cross-modal chat cartoon emoticon synthesis based on knowledge meta-model was designed. According to the semantic description of users, the corresponding chat cartoon emoticons were synthesized immediately. The method established the inner semantic logic relation of chat cartoon emoticon through the knowledge meta-model, and enhanced the semantic consistency of chat cartoon emoticon synthesis. Through the multi-generator model, the corresponding partial chat cartoon emoticons were synthesized from each meta-knowledge point, and then integrated into a complete cartoon emoticon by the joint generator, which greatly reduced the training sample demand. In the test of public chat cartoon emoticon synthesis data set, the method has achieved better semantic consistency, and it is comparable with the existing methods in the quality of synthesized image.  

Key words:  , image synthesis, cross-modal learning, text to image (T2I), knowledge meta-model, emoticon pack

中图分类号: