基于潜在扩散模型的CAD条件生成

doi:10.11996/JG.j.2095-302X.2026020390

摘要/Abstract

摘要：

基于传统计算机辅助设计(CAD)创建兼具可制造性与可编辑性的三维模型是一项复杂且耗时的任务。近年来，深度学习技术在CAD模型自动化生成方面展现出巨大潜力并成为研究热点。然而，多数CAD生成模型未能充分利用点云、图像和草图等输入数据中蕴含的几何与语义信息，难以通过灵活的条件输入精准控制生成方向。针对这一问题，通过挖掘潜在空间的表征能力，采用去噪扩散概率模型，以这类条件输入数据作为引导，实现CAD模型定向生成。具体而言，首先构建基于Transformer架构的自编码器，将CAD参数命令序列编码至潜在空间；进而在此空间内搭建去噪扩散概率模型，融合点云、图像或草图条件编码信息，生成CAD特征向量；最后通过解码器还原为三维CAD模型。实验结果表明，所生成的CAD模型结构合理、表面光滑且几何特征清晰，相较于现有方法，在生成形状多样性、分布相似性与保真度之间实现了较好的平衡，且当以点云、图像或草图作为条件输入时，均能有效提升CAD模型的生成质量。相关代码已开源，详情可见 https://github.com/Ziyou-maker/LDM4CAD。

关键词: CAD生成, 参数化建模, 扩散模型, 潜在扩散模型, 条件生成

Abstract:

Creating 3D models with both manufacturability and editability based on traditional Computer-Aided Design (CAD) is a complex and time-consuming task. In recent years, deep learning technology has shown great potential in the automated generation of CAD models and has become a research hotspot. However, most CAD generation models fail to fully utilize the geometric and semantic information contained in input data such as point clouds, images, and sketches, making it difficult to accurately control the generation direction through flexible conditional inputs. To address this issue, the directional generation of CAD models can be achieved by exploring the representational capability of the latent space, adopting a denoising diffusion probabilistic model, and using such conditional input data as guidance. Specifically, a Transformer-based autoencoder was first constructed to encode CAD parameter command sequences into a latent space. Subsequently, a denoising diffusion probabilistic model was established within this space to generate CAD feature vectors by integrating conditional encoding information from point clouds, images, or sketches. Finally, the feature vectors were reconstructed into 3D CAD models via a decoder. Experimental results demonstrated that the generated CAD models exhibited reasonable structures, smooth surfaces, and distinct geometric features. Compared with existing methods, a superior balance was achieved among shape diversity, distribution similarity, and fidelity. Furthermore, the generation quality of CAD models was effectively enhanced when point clouds, images, or sketches were utilized as conditional inputs. The relevant code has been open-sourced and is available at https://github.com/Ziyou-maker/LDM4CAD.

Key words: CAD generation, parametric modeling, diffusion model, latent diffusion model, conditional generation

中图分类号:

TP391.72

刘景豪, 游振国, 杜冬. 基于潜在扩散模型的CAD条件生成[J]. 图学学报, 2026, 47(2): 390-401.

LIU Jinghao, YOU Zhenguo, DU Dong. Conditional generation of CAD models based on latent diffusion models[J]. Journal of Graphics, 2026, 47(2): 390-401.

图/表 12

表1 CAD命令类型及其对应参数

Table 1 Types of CAD commands and their corresponding parameters

命令类型	参数	含义
$<SOL>$(开始指示符)	$\varnothing $	一个回路的开始
$L$(线)	$x,y$	线的端点
$A$(弧)	$x,y$	圆弧端点
	$\alpha $	扫掠角度
	$f$	逆时针标志
$C$(圆)	$x,y$	圆心
$C$(圆)	$r$	半径
$E$(挤出)	$\theta,\varphi,\gamma $	草图平面方向
	${{p}_{x,}}{{p}_{y,}}{{p}_{z}}$	草图平面原点
	$s$	比例因子
	${{e}_{1}},{{e}_{2}}$	挤出距离
	$b$	布尔类型
	$u$	挤出类型
$<EOS>$(结束指示符)	$\varnothing $	整个序列的结束

图1 CAD生成模型的网络体系

Fig. 1 Network architecture of CAD generation models

图2 自编码器模型架构

Fig. 2 Model architecture of autoencoders

图3 扩散模型训练及推理过程

Fig. 3 Training and inference processes of diffusion models

表2 CAD模型的形状生成表现

Table 2 Shape generation performance of CAD models

方法	COV/%↑	JSD↓	MMD↓
DeepCAD	78.6	4.086	1.509
SkexGen	76.8	2.110	1.395
BrepGen	75.1	1.457	1.245
FlexCAD	76.5	2.625	1.532
本文	79.1	3.051	1.348

图4 DeepCAD数据集上无条件生成结果((a) DeepCAD方法；(b) 本文方法)

Fig. 4 Unconditional generation results on the DeepCAD dataset ((a) DeepCAD; (b) Ours)

图5 Fusion 360 gallery数据集上无条件生成结果

Fig. 5 Unconditional generation results on the Fusion 360 gallery dataset

表3 无条件生成推理速度对比

Table 3 Comparison of inference speed for unconditional generation

方法	推理时间/ms
DeepCAD	2.88
SkexGen	49.55
BrepGen	7164.12
FlexCAD	9485.88
本文方法	6.43

图6 点云、图像和草图编码器训练流程

Fig. 6 Training processes of point cloud, image, and sketch encoders

表4 不同条件下CAD模型重建指标对比

Table 4 Comparison of CAD model reconstruction metrics under different conditions

方法	Point Cloud		Image		Sketch
方法	ACC_cmd	ACC_param	ACC_cmd	ACC_param	ACC_cmd	ACC_param
DeepCAD	74.91	61.22	63.15	52.04	61.96	47.36
本文	86.54	71.80	76.98	65.86	68.19	56.02

图7 基于不同条件数据的CAD生成结果((a) 基于点云的生成结果；(b) 基于图像的生成结果；(c) 基于草图的生成结果)

Fig. 7 CAD generation results based on different conditional data ((a) Generation results based on point clouds; (b) Generation results based on images; (c) Generation results based on sketches)

图8 CAD模型编辑示例

Fig. 8 CAD model editing examples

参考文献 40

[1]	刘爱军, 黄松波, 闫光荣. 三维CAD混合建模技术研究[J]. 图学学报, 2013, 34(6): 61-63.
	LIU A J, HUANG S B, YAN G R. Research on hybrid modeling technology for CAD model[J]. Journal of Graphics, 2013, 34(6): 61-63 (in Chinese).
[2]	黄学良, 李娜, 陈立平. 三维装配几何约束组合的分类求解策略[J]. 图学学报, 2014, 35(2): 236-242.
	HUANG X L, LI N, CHEN L P. Classification and solution of 3D assembly geometric constraint system between two rigid bodies[J]. Journal of Graphics, 2014, 35(2): 236-242 (in Chinese).
[3]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685.
[4]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
[5]	WU J J, ZHANG C K, XUE T F, et al. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling[C]// The 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 82-90.
[6]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. DOI URL
[7]	YANG G D, HUANG X, HAO Z K, et al. PointFlow: 3D point cloud generation with continuous normalizing flows[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 4540-4549.
[8]	WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh: generating 3D mesh models from single RGB images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[9]	MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4455-4465.
[10]	PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 165-174.
[11]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[12]	SHARMA G, GOYAL R, LIU D F, et al. CSGNet: neural shape parser for constructive solid geometry[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5515-5523.
[13]	KANIA K, ZIĘBA M, KAJDANOWICZ T. UCSG-NET- unsupervised discovering of constructive solid geometry tree[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 736.
[14]	WANG X G, XU Y L, XU K, et al. PIE-NET: parametric inference of point cloud edges[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1693.
[15]	JAYARAMAN P K, LAMBOURNE J G, DESAI N, et al. SolidGen: an autoregressive model for direct B-rep synthesis[EB/OL]. [2025-05-05]. https://arxiv.org/abs/2203.13944.
[16]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]// The 29th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2015: 2692-2700.
[17]	XU X, LAMBOURNE J, JAYARAMAN P, et al. BrepGen: a B-rep generative diffusion model with structured latent geometry[J]. ACM Transactions on Graphics, 2024, 43(4): 119.
[18]	LI J, FU Y H, CHEN F L. DTGBrepGen: a novel B-rep generative model through decoupling topology and geometry[C]// 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2025: 21438-21447.
[19]	WU R D, XIAO C, ZHENG C X. DeepCAD: a deep generative network for computer-aided design models[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6752-6762.
[20]	XU X, WILLIS K D D, LAMBOURNE J G, et al. SkexGen: autoregressive generation of cad construction sequences with disentangled codebooks[EB/OL]. [2025-07-02]. https://proceedings.mlr.press/v162/xu22k.html.
[21]	XU X, JAYARAMAN P K, LAMBOURNE J G, et al. Hierarchical neural coding for controllable CAD model generation[EB/OL]. [2025-07-02]. https://proceedings.mlr.press/v202/xu23f.html.
[22]	ZHANG Z W, SUN S Z, WANG W X, et al. FlexCAD: unified and versatile controllable CAD generation with fine-tuned large language models[EB/OL]. [2025-12-08]. https://arxiv.org/abs/2411.05823.
[23]	LI C J, PAN H, BOUSSEAU A, et al. Sketch2CAD: sequential cad modeling by sketching in context[J]. ACM Transactions on Graphics, 2020, 39(6): 164.
[24]	Li C J, PAN H, BOUSSEAU A, et al. Free2CAD: parsing freehand drawings into CAD commands[J]. ACM Transactions on Graphics, 2022, 41(4): 93.
[25]	HÄHNLEIN F, LI C J, MITRA N J, et al. CAD2Sketch: generating concept sketches from CAD sequences[J]. ACM Transactions on Graphics, 2022, 41(6): 279.
[26]	UY M A, CHANG Y Y, SUNG M, et al. Point2Cyl: reverse engineering 3D objects from point clouds to extrusion cylinders[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11840-11850.
[27]	REN D X, ZHENG J M, CAI J F, et al. ExtrudeNet: unsupervised inverse sketch-and-extrude for shape parsing[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 482-498.
[28]	LI P, GUO J W, ZHANG X P, et al. SECAD-net: self-supervised CAD reconstruction by learning sketch-extrude operations[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16816-16826.
[29]	ZHOU S D, TANG T Y, ZHOU B. CADParser: a learning approach of sequence modeling for B-rep CAD[EB/OL]. [2025-07-02]. https://dblp.org/rec/conf/ijcai/ZhouTZ23.html?view=bibtex.
[30]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574.
[31]	CHOI J, KIM S, JEONG Y, et al. ILVR: conditioning method for denoising diffusion probabilistic models[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14347-14356.
[32]	GU S Y, CHEN D, BAO J M, et al. Vector quantized diffusion model for text-to-image synthesis[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10686-10696.
[33]	VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6309-6318.
[34]	ZHANG A J, JIA W Q, ZOU Q, et al. Diffusion-CAD: controllable diffusion model for generating computer-aided design models[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(12): 10188-10199. DOI URL
[35]	ALAM M F, AHMED F. GenCAD: image-conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors[EB/OL]. [2025-05-05]. https://arxiv.org/abs/2409.16294.
[36]	WANG H X, ZHAO M Y, Wang Y Q, et al. VQ-CAD: computer-aided design model generation with vector quantized diffusion[J]. Computer Aided Geometric Design, 2024, 111: 102327. DOI URL
[37]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114.
[38]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[39]	CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698. DOI URL
[40]	WILLIS K D D, PU Y W, LUO J L, et al. Fusion 360 gallery: a dataset and environment for programmatic CAD construction from human design sequences[J]. ACM Transactions on Graphics, 2021, 40(4): 54.