PDF-Sketch: layout-based sketch generation via primitive distance fields and discrete diffusion

doi:10.11996/JG.j.2095-302X.2026020380

Abstract

Abstract:

Sketches play an important role in conceptual design, digital art, and human-computer interaction. However, existing deep learning-based sketch generation methods often rely on polylines or Bézier curves for geometric representation, which are limited in capturing complex shapes. Sequential point prediction also leads to cumulative errors, causing structural distortion and loss of details. To address these issues, sketch generation was formulated as a layout modeling problem, where a sketch was composed of multiple independent stroke primitives. A framework was proposed that integrated a discrete diffusion model with the Primitive Distance Field (PDF). The method first applied adaptive stroke decomposition and a stroke autoencoder to obtain continuous and differentiable features of stroke segments. A codebook mechanism was then employed to discretize frequently recurring stroke patterns into a finite set of items, enabling the diffusion process to gradually recover a coherent set of stroke segments while jointly modeling their positions, sizes, and shapes. Experiments on the QuickDraw dataset showed that the proposed approach outperformed Sketch-rnn and SketchKnitter in terms of Frechet Inception Distance (FID), Precision, and Recall. In tasks with fewer strokes, the model captured local geometric details more effectively and achieved higher recall, while in tasks with more strokes, it demonstrated greater structural accuracy and fidelity. Qualitative comparisons further indicated that the generated sketches exhibited stronger structural coherence, richer details, and better spatial consistency. These results confirmed that the adoption of a layout-based perspective, combined with distance field representation and discretization, effectively reduced error accumulation in sequential modeling and improves both structural integrity and diversity in sketch generation. The framework also provided directions for enhancing stroke segmentation, detail recovery, and inter-segment connectivity in more complex scenarios.

Key words: vector sketch, layout generation, stroke decomposition, stroke representation and learning, distance field, diffusion model

CLC Number:

ZHOU Jin, ZHOU Yi, XU Pengfei, HUANG Hui. PDF-Sketch: layout-based sketch generation via primitive distance fields and discrete diffusion[J]. Journal of Graphics, 2026, 47(2): 380-389.

Figures/Tables 9

Fig. 1 Stroke segment decomposition example, where different segments are marked in different colors

Fig. 2 Structure of the stroke autoencoder proposed in this paper

Fig. 3 Demonstration of the generation process of the discrete diffusion model, where dashed boxes represent the position information of stroke segments

Fig. 4 Illustration of the stroke rendering process

Table 1 Quantitative comparison results for sketches with few strokes (≤5)

方法	FID↓	Precision↑	Recall↑
Sketch-rnn	31.607	0.489	0.449
SketchKnitter	26.183	0.537	0.464
本文	25.496	0.456	0.581

Table 2 Quantitative comparison results for sketches with many strokes (>5)

方法	FID↓	Precision↑	Recall↑
Sketch-rnn	35.307	0.482	0.432
SketchKnitter	33.984	0.551	0.410
本文	32.087	0.576	0.396

Fig. 5 Qualitative comparison of vector sketch generation results

Table 3 Influence of codebook size on generation quality

词典大小	FID↓	Precision↑	Recall↑
64	44.681	0.351	0.412
128	36.018	0.401	0.477
256	25.496	0.456	0.581
384	35.174	0.413	0.482

Fig. 6 Stroke reconstruction results using the codebook ((a),(c) Original sketch; (b),(d) Reconstructed result)

References 27

[1]	AUSTIN J, JOHNSON D D, HO J, et al. Structured denoising diffusion models in discrete state-spaces[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 1376.
[2]	LONG X X, LIN C, LIU L J, et al. NeuralUDF: learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 20834-20843.
[3]	VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6309-6318.
[4]	郑屹, 黄向, 秦菲儿, 等. 2D/3D生成式人工智能技术发展及创意产业应用[J]. 中国图象图形学报, 2025, 30(6): 1953-1984.
	ZHENG Y, HUANG X, QIN F E, et al. AIGC 2D/3D technology development and creative industry applications[J]. Journal of Image and Graphics, 2025, 30(6): 1953-1984 (in Chinese). DOI URL
[5]	刘安安, 苏育挺, 王岚君, 等. AIGC视觉内容生成与溯源研究进展[J]. 中国图象图形学报, 2024, 29(6): 1535-1554.
	LIU A A, SU Y T, WANG L J, et al. Review on the progress of the AIGC visual content generation and traceability[J]. Journal of Image and Graphics, 2024, 29(6): 1535-1554 (in Chinese). DOI URL
[6]	李纪远, 管哲予, 宋海川, 等. 人在环路的细分行业logo生成方法[J]. 图学学报, 2025, 46(2): 382-392. DOI
	LI J Y, GUAN Z Y, SONG H C, et al. Human-in-the-loop field-specific logo generation method[J]. Journal of Graphics, 2025, 46(2): 382-392 (in Chinese). DOI
[7]	GUO C E, ZHU S C, WU Y N. Primal sketch: integrating structure and texture[J]. Computer Vision and Image Understanding, 2007, 106(1): 5-19. DOI URL
[8]	LI M T, LIN Z, MECH R, et al. Photo-sketching: inferring contour drawings from images[C]// 2019 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2019: 1403-1412.
[9]	GE S W, GOSWAMI V, ZITNICK L, et al. Creative sketch generation[EB/OL]. [2025-07-11]. https://openreview.net/forum?id=gwnoVHIES05.
[10]	赵鹏, 高杰超, 周彪, 等. 基于对抗自编码器的矢量草图生成方法[J]. 计算机辅助设计与图形学学报, 2020, 32(12): 1957-1966.
	ZHAO P, GAO J C, ZHOU B, et al. A novel vector sketch generation method based on adversarial autoencoder[J]. Journal of Computer-Aided Design & Computer Graphics, 2020, 32(12): 1957-1966 (in Chinese).
[11]	LIU R T, YU Q, YU S X. Unsupervised sketch to photo synthesis[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 36-52.
[12]	MANUSHREE V, SAXENA S, CHOWDHURY P, et al. XCI-Sketch: extraction of color information from images for generation of colored outlines and sketches[EB/OL]. [2025-07-11]. https://arxiv.org/abs/2108.11554.
[13]	LI S C, LI K, KACHER I, et al. ArtPDGAN: creating artistic pencil drawing with key map using generative adversarial networks[C]// The 20th International Conference on Computational Science. Cham: Springer, 2020: 285-298.
[14]	HA D, ECK D. A neural representation of sketch drawings[EB/OL]. [2025-07-11]. https://openreview.net/forum?id=Hy6GHpkCW.
[15]	RIBEIRO L S F, BUI T, COLLOMOSSE J, et al. Sketchformer: transformer-based representation for sketched structure[C]// IEEE/ CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14141-14150.
[16]	CARLIER A, DANELLJAN M, ALAHI A, et al. DeepSVG: a hierarchical generative network for vector graphics animation[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1372.
[17]	LOPES R G, HA D, ECK D, et al. A learned representation for scalable vector graphics[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 7929-7938.
[18]	DAS A, YANG Y X, HOSPEDALES T, et al. BézierSketch: a generative model for scalable vector sketches[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 632-647.
[19]	WANG Q, DENG H G, QI Y G, et al. SketchKnitter: vectorized sketch generation with diffusion models[EB/OL]. [2025-05-16]. https://openreview.net/forum?id=4eJ43EN2g6l.
[20]	DAS A, YANG Y X, HOSPEDALES T, et al. ChiroDiff: modelling chirographic data with diffusion models[EB/OL]. [2025-05-16]. https://openreview.net/forum?id=1ROAstc9jv.
[21]	BANDYOPADHYAY H, BHUNIA A K, CHOWDHURY P N, et al. SketchINR: a first look into sketches as implicit neural representations[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 12565-12574.
[22]	GUPTA K, LAZAROW J, ACHILLE A, et al. LayoutTransformer: layout generation and completion with self-attention[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 984-994.
[23]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
[24]	KONG X, JIANG L, CHANG H W, et al. BLT: bidirectional layout transformer for controllable layout generation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 474-490.
[25]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574.
[26]	GU S Y, CHEN D, BAO J M, et al. Vector quantized diffusion model for text-to-image synthesis[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10686-10696.
[27]	INOUE N, KIKUCHI K, SIMO-SERRA E, et al. LayoutDM: discrete diffusion model for controllable layout generation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10167-10176.