欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (2): 332-344.DOI: 10.11996/JG.j.2095-302X.2025020332

• 计算机图形学与虚拟现实 • 上一篇    下一篇

结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术

孙禾衣1(), 李艺潇2, 田希3, 张松海2()   

  1. 1.清华大学致理书院,北京 100084
    2.清华大学计算机科学与技术系,北京 100084
    3.英国巴斯大学计算机科学系,萨默塞特 巴斯133789
  • 收稿日期:2024-08-19 接受日期:2024-10-28 出版日期:2025-04-30 发布日期:2025-04-24
  • 通讯作者:张松海(1978-),男,副教授,博士。主要研究方向为计算机图形学与虚拟现实、图像/视频处理等。E-mail:shz@tsinghua.edu.cn
  • 第一作者:孙禾衣(2003-),女,本科生。主要研究方向为三维重建。E-mail:sun-hy21@mails.tsinghua.edu.cn

Image to 3D vase generation technology combining procedural content generation and diffusion models

SUN Heyi1(), LI Yixiao2, TIAN Xi3, ZHANG Songhai2()   

  1. 1. Zhili College, Tsinghua University, Beijing 100084, China
    2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    3. Department of Computer Science, University of Bath, Somerset 133789, UK
  • Received:2024-08-19 Accepted:2024-10-28 Published:2025-04-30 Online:2025-04-24
  • First author:SUN Heyi (2003-), undergraduate student. Her main research interest covers 3D reconstruction. E-mail:sun-hy21@mails.tsinghua.edu.cn

摘要:

在传统手工三维内容制作中,三维网格和纹理是构建三维资产的基础。为了提升三维资产的视觉表现和渲染性能,三维网格通常采用四边面构建,并需具备良好的拓扑结构和合理的UV映射;三维纹理需要与几何形状相匹配,并保持良好的全局一致性。然而,当前基于潜在扩散模型的三维内容生成技术尚且未能满足这些标准,限制了其在实际应用中的潜力。与此同时,程序内容生成技术因其能够根据规则创建大量符合行业最佳实践的三维资产,在游戏和建筑行业中得到了广泛应用。为了提升生成资产的可用性,提出了一种结合程序内容生成与扩散模型技术的综合解决方案。以三维旋转体中具体的瓷瓶对象为例,将图像到三维资产的生成问题细分为2个主要任务:三维网格重建和三维纹理生成。在三维网格重建方面,创建了一个新颖的瓷瓶生成程序,并训练深度神经网络学习图像特征与程序参数之间的映射关系,从而实现二维图像到三维模型的重建;在三维纹理生成方面,提出了一种新颖的两段式纹理生成策略,该策略结合了多视角图像生成和多视角一致性采样技术的优势,可以生成具有全局一致性的高清纹理贴图。总结而言,提出了一种可以基于图像自动构建三维瓷瓶资产的方案,此方案可以推广到其他三维旋转体内容的生成,并有望应用于其他品类的三维内容生成。

关键词: 扩散模型, 程序内容生成, 三维重建, 纹理生成, 深度学习

Abstract:

In the traditional manual production of 3D content, 3D meshes and textures serve as the foundational elements in constructing 3D assets. To enhance the visual representation and rendering performance of 3D assets, the meshes are typically constructed using quadrilateral faces, requiring optimal topology and UV mapping. Moreover, 3D textures must be congruent with the geometric shape and maintain global consistency. However, current 3D content generation technologies based on latent diffusion models fail to meet these standards, limiting their potential in practical applications. At the same time, procedural content generation techniques have gained widespread application in the gaming and architectural industries due to their ability to systematically produce a vast array of 3D assets that conform to industry best practices. To improve the usability of generated assets, an integrated solution combining procedural content generation with diffusion model techniques was proposed. Using the 3D rotational body example of a vase, the image-to-3D asset generation problem was divided into two principal tasks: 3D mesh reconstruction and 3D texture generation. In the domain of 3D mesh reconstruction, a novel vase generation program was developed, and a deep neural network was trained to learn the mapping between image features and procedural parameters, thereby facilitating the reconstruction from a 2D image to a 3D model. For3D texture generation, a novel two-stage texturing strategy was introduced, combining multi-view image synthesis and multi-view consistency sampling techniques to produce high quality texture maps with global coherence. In summary, a scheme for the automatic construction of 3D vase assets from images was presented, which can be generalized to generate other 3D rotational body content and holds promise for applications in generating other types of 3D content.

Key words: diffusion models, procedural content generation, 3D reconstruction, texture generation, deep learning

中图分类号: