Conditional generation of CAD models based on latent diffusion models

doi:10.11996/JG.j.2095-302X.2026020390

Abstract

Abstract:

Creating 3D models with both manufacturability and editability based on traditional Computer-Aided Design (CAD) is a complex and time-consuming task. In recent years, deep learning technology has shown great potential in the automated generation of CAD models and has become a research hotspot. However, most CAD generation models fail to fully utilize the geometric and semantic information contained in input data such as point clouds, images, and sketches, making it difficult to accurately control the generation direction through flexible conditional inputs. To address this issue, the directional generation of CAD models can be achieved by exploring the representational capability of the latent space, adopting a denoising diffusion probabilistic model, and using such conditional input data as guidance. Specifically, a Transformer-based autoencoder was first constructed to encode CAD parameter command sequences into a latent space. Subsequently, a denoising diffusion probabilistic model was established within this space to generate CAD feature vectors by integrating conditional encoding information from point clouds, images, or sketches. Finally, the feature vectors were reconstructed into 3D CAD models via a decoder. Experimental results demonstrated that the generated CAD models exhibited reasonable structures, smooth surfaces, and distinct geometric features. Compared with existing methods, a superior balance was achieved among shape diversity, distribution similarity, and fidelity. Furthermore, the generation quality of CAD models was effectively enhanced when point clouds, images, or sketches were utilized as conditional inputs. The relevant code has been open-sourced and is available at https://github.com/Ziyou-maker/LDM4CAD.

Key words: CAD generation, parametric modeling, diffusion model, latent diffusion model, conditional generation

CLC Number:

TP391.72

LIU Jinghao, YOU Zhenguo, DU Dong. Conditional generation of CAD models based on latent diffusion models[J]. Journal of Graphics, 2026, 47(2): 390-401.

Figures/Tables 12

References 40

[1]	刘爱军, 黄松波, 闫光荣. 三维CAD混合建模技术研究[J]. 图学学报, 2013, 34(6): 61-63.
	LIU A J, HUANG S B, YAN G R. Research on hybrid modeling technology for CAD model[J]. Journal of Graphics, 2013, 34(6): 61-63 (in Chinese).
[2]	黄学良, 李娜, 陈立平. 三维装配几何约束组合的分类求解策略[J]. 图学学报, 2014, 35(2): 236-242.
	HUANG X L, LI N, CHEN L P. Classification and solution of 3D assembly geometric constraint system between two rigid bodies[J]. Journal of Graphics, 2014, 35(2): 236-242 (in Chinese).
[3]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685.
[4]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
[5]	WU J J, ZHANG C K, XUE T F, et al. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling[C]// The 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 82-90.
[6]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. DOI URL
[7]	YANG G D, HUANG X, HAO Z K, et al. PointFlow: 3D point cloud generation with continuous normalizing flows[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 4540-4549.
[8]	WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh: generating 3D mesh models from single RGB images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[9]	MESCHEDER L, OECHSLE M, NIEMEYER M, et al. Occupancy networks: learning 3D reconstruction in function space[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4455-4465.
[10]	PARK J J, FLORENCE P, STRAUB J, et al. DeepSDF: learning continuous signed distance functions for shape representation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 165-174.
[11]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[12]	SHARMA G, GOYAL R, LIU D F, et al. CSGNet: neural shape parser for constructive solid geometry[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5515-5523.
[13]	KANIA K, ZIĘBA M, KAJDANOWICZ T. UCSG-NET- unsupervised discovering of constructive solid geometry tree[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 736.
[14]	WANG X G, XU Y L, XU K, et al. PIE-NET: parametric inference of point cloud edges[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1693.
[15]	JAYARAMAN P K, LAMBOURNE J G, DESAI N, et al. SolidGen: an autoregressive model for direct B-rep synthesis[EB/OL]. [2025-05-05]. https://arxiv.org/abs/2203.13944.
[16]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]// The 29th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2015: 2692-2700.
[17]	XU X, LAMBOURNE J, JAYARAMAN P, et al. BrepGen: a B-rep generative diffusion model with structured latent geometry[J]. ACM Transactions on Graphics, 2024, 43(4): 119.
[18]	LI J, FU Y H, CHEN F L. DTGBrepGen: a novel B-rep generative model through decoupling topology and geometry[C]// 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2025: 21438-21447.
[19]	WU R D, XIAO C, ZHENG C X. DeepCAD: a deep generative network for computer-aided design models[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 6752-6762.
[20]	XU X, WILLIS K D D, LAMBOURNE J G, et al. SkexGen: autoregressive generation of cad construction sequences with disentangled codebooks[EB/OL]. [2025-07-02]. https://proceedings.mlr.press/v162/xu22k.html.
[21]	XU X, JAYARAMAN P K, LAMBOURNE J G, et al. Hierarchical neural coding for controllable CAD model generation[EB/OL]. [2025-07-02]. https://proceedings.mlr.press/v202/xu23f.html.
[22]	ZHANG Z W, SUN S Z, WANG W X, et al. FlexCAD: unified and versatile controllable CAD generation with fine-tuned large language models[EB/OL]. [2025-12-08]. https://arxiv.org/abs/2411.05823.
[23]	LI C J, PAN H, BOUSSEAU A, et al. Sketch2CAD: sequential cad modeling by sketching in context[J]. ACM Transactions on Graphics, 2020, 39(6): 164.
[24]	Li C J, PAN H, BOUSSEAU A, et al. Free2CAD: parsing freehand drawings into CAD commands[J]. ACM Transactions on Graphics, 2022, 41(4): 93.
[25]	HÄHNLEIN F, LI C J, MITRA N J, et al. CAD2Sketch: generating concept sketches from CAD sequences[J]. ACM Transactions on Graphics, 2022, 41(6): 279.
[26]	UY M A, CHANG Y Y, SUNG M, et al. Point2Cyl: reverse engineering 3D objects from point clouds to extrusion cylinders[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11840-11850.
[27]	REN D X, ZHENG J M, CAI J F, et al. ExtrudeNet: unsupervised inverse sketch-and-extrude for shape parsing[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 482-498.
[28]	LI P, GUO J W, ZHANG X P, et al. SECAD-net: self-supervised CAD reconstruction by learning sketch-extrude operations[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 16816-16826.
[29]	ZHOU S D, TANG T Y, ZHOU B. CADParser: a learning approach of sequence modeling for B-rep CAD[EB/OL]. [2025-07-02]. https://dblp.org/rec/conf/ijcai/ZhouTZ23.html?view=bibtex.
[30]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574.
[31]	CHOI J, KIM S, JEONG Y, et al. ILVR: conditioning method for denoising diffusion probabilistic models[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 14347-14356.
[32]	GU S Y, CHEN D, BAO J M, et al. Vector quantized diffusion model for text-to-image synthesis[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10686-10696.
[33]	VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6309-6318.
[34]	ZHANG A J, JIA W Q, ZOU Q, et al. Diffusion-CAD: controllable diffusion model for generating computer-aided design models[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(12): 10188-10199. DOI URL
[35]	ALAM M F, AHMED F. GenCAD: image-conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors[EB/OL]. [2025-05-05]. https://arxiv.org/abs/2409.16294.
[36]	WANG H X, ZHAO M Y, Wang Y Q, et al. VQ-CAD: computer-aided design model generation with vector quantized diffusion[J]. Computer Aided Geometric Design, 2024, 111: 102327. DOI URL
[37]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114.
[38]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[39]	CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6): 679-698. DOI URL
[40]	WILLIS K D D, PU Y W, LUO J L, et al. Fusion 360 gallery: a dataset and environment for programmatic CAD construction from human design sequences[J]. ACM Transactions on Graphics, 2021, 40(4): 54.

命令类型	参数	含义
$<SOL>$(开始指示符)	$\varnothing $	一个回路的开始
$L$(线)	$x,y$	线的端点
$A$(弧)	$x,y$	圆弧端点
	$\alpha $	扫掠角度
	$f$	逆时针标志
$C$(圆)	$x,y$	圆心
$C$(圆)	$r$	半径
$E$(挤出)	$\theta,\varphi,\gamma $	草图平面方向
	${{p}_{x,}}{{p}_{y,}}{{p}_{z}}$	草图平面原点
	$s$	比例因子
	${{e}_{1}},{{e}_{2}}$	挤出距离
	$b$	布尔类型
	$u$	挤出类型
$<EOS>$(结束指示符)	$\varnothing $	整个序列的结束

命令类型	参数	含义
$<SOL>$(开始指示符)	$\varnothing $	一个回路的开始
$L$(线)	$x,y$	线的端点
$A$(弧)	$x,y$	圆弧端点
	$\alpha $	扫掠角度
	$f$	逆时针标志
$C$(圆)	$x,y$	圆心
$C$(圆)	$r$	半径
$E$(挤出)	$\theta,\varphi,\gamma $	草图平面方向
	${{p}_{x,}}{{p}_{y,}}{{p}_{z}}$	草图平面原点
	$s$	比例因子
	${{e}_{1}},{{e}_{2}}$	挤出距离
	$b$	布尔类型
	$u$	挤出类型
$<EOS>$(结束指示符)	$\varnothing $	整个序列的结束

方法	COV/%↑	JSD↓	MMD↓
DeepCAD	78.6	4.086	1.509
SkexGen	76.8	2.110	1.395
BrepGen	75.1	1.457	1.245
FlexCAD	76.5	2.625	1.532
本文	79.1	3.051	1.348

方法	COV/%↑	JSD↓	MMD↓
DeepCAD	78.6	4.086	1.509
SkexGen	76.8	2.110	1.395
BrepGen	75.1	1.457	1.245
FlexCAD	76.5	2.625	1.532
本文	79.1	3.051	1.348

方法	推理时间/ms
DeepCAD	2.88
SkexGen	49.55
BrepGen	7164.12
FlexCAD	9485.88
本文方法	6.43