PDF-Sketch：基于笔画段距离场与离散扩散的布局式草图生成方法

doi:10.11996/JG.j.2095-302X.2026020380

摘要/Abstract

摘要：

草图在概念设计、数字艺术和人机交互等领域具有重要价值，但现有基于深度学习的生成方法常依赖折线或贝塞尔曲线进行几何表征，难以刻画复杂形态，且逐点预测机制易产生累计误差，导致结构偏移与细节缺失。为此，将草图建模为由多个独立笔画段构成的布局结构，提出一种结合离散扩散模型与笔画段距离场（PDF）的生成框架。首先通过自适应笔画分解与笔画自编码器获取笔画段的连续可微特征表示，再利用编码词典机制将高频相似的笔画形态离散化为有限词项，使扩散过程能逐步恢复出结构合理的笔画段集合，实现对笔画段位置、尺寸和形态的联合建模。在QuickDraw数据集上的实验结果表明，该方法在FID，Precision和Recall等指标上均优于对比方法Sketch-rnn与SketchKnitter。在少笔画任务中，模型更好地学习局部几何特征，召回率提升显著；在多笔画任务中，则展现出更高的结构精度和整体保真度。定性结果显示，生成草图在整体一致性、局部细节还原和空间布局协调性方面均明显优于现有方法。研究表明，从布局生成角度出发并结合距离场与离散化机制，能够有效缓解传统序列建模中的误差累积问题，提升草图生成的结构完整性与多样性，为进一步改进笔画段分割、细节恢复及段间连接一致性提供了新的方向。

关键词: 矢量草图, 布局生成, 笔画分解, 笔画表示与学习, 距离场, 扩散模型

Abstract:

Sketches play an important role in conceptual design, digital art, and human-computer interaction. However, existing deep learning-based sketch generation methods often rely on polylines or Bézier curves for geometric representation, which are limited in capturing complex shapes. Sequential point prediction also leads to cumulative errors, causing structural distortion and loss of details. To address these issues, sketch generation was formulated as a layout modeling problem, where a sketch was composed of multiple independent stroke primitives. A framework was proposed that integrated a discrete diffusion model with the Primitive Distance Field (PDF). The method first applied adaptive stroke decomposition and a stroke autoencoder to obtain continuous and differentiable features of stroke segments. A codebook mechanism was then employed to discretize frequently recurring stroke patterns into a finite set of items, enabling the diffusion process to gradually recover a coherent set of stroke segments while jointly modeling their positions, sizes, and shapes. Experiments on the QuickDraw dataset showed that the proposed approach outperformed Sketch-rnn and SketchKnitter in terms of Frechet Inception Distance (FID), Precision, and Recall. In tasks with fewer strokes, the model captured local geometric details more effectively and achieved higher recall, while in tasks with more strokes, it demonstrated greater structural accuracy and fidelity. Qualitative comparisons further indicated that the generated sketches exhibited stronger structural coherence, richer details, and better spatial consistency. These results confirmed that the adoption of a layout-based perspective, combined with distance field representation and discretization, effectively reduced error accumulation in sequential modeling and improves both structural integrity and diversity in sketch generation. The framework also provided directions for enhancing stroke segmentation, detail recovery, and inter-segment connectivity in more complex scenarios.

Key words: vector sketch, layout generation, stroke decomposition, stroke representation and learning, distance field, diffusion model

中图分类号:

周金, 周一, 徐鹏飞, 黄惠. PDF-Sketch：基于笔画段距离场与离散扩散的布局式草图生成方法[J]. 图学学报, 2026, 47(2): 380-389.

ZHOU Jin, ZHOU Yi, XU Pengfei, HUANG Hui. PDF-Sketch: layout-based sketch generation via primitive distance fields and discrete diffusion[J]. Journal of Graphics, 2026, 47(2): 380-389.

图/表 9

图1 笔画段切分示例，不同笔画段以不同颜色标记

Fig. 1 Stroke segment decomposition example, where different segments are marked in different colors

图2 本文的笔画自编码器结构

Fig. 2 Structure of the stroke autoencoder proposed in this paper

图3 本文离散扩散生成模型的生成过程演示,其中虚线框代表笔画段的位置信息

Fig. 3 Demonstration of the generation process of the discrete diffusion model, where dashed boxes represent the position information of stroke segments

图4 笔画渲染过程示意

Fig. 4 Illustration of the stroke rendering process

表1 少笔画(≤5)任务定量比较结果

Table 1 Quantitative comparison results for sketches with few strokes (≤5)

方法	FID↓	Precision↑	Recall↑
Sketch-rnn	31.607	0.489	0.449
SketchKnitter	26.183	0.537	0.464
本文	25.496	0.456	0.581

表2 多笔画(>5)任务定量比较结果

Table 2 Quantitative comparison results for sketches with many strokes (>5)

方法	FID↓	Precision↑	Recall↑
Sketch-rnn	35.307	0.482	0.432
SketchKnitter	33.984	0.551	0.410
本文	32.087	0.576	0.396

图5 矢量草图生成结果定性对比

Fig. 5 Qualitative comparison of vector sketch generation results

表3 词典大小对生成质量的影响

Table 3 Influence of codebook size on generation quality

词典大小	FID↓	Precision↑	Recall↑
64	44.681	0.351	0.412
128	36.018	0.401	0.477
256	25.496	0.456	0.581
384	35.174	0.413	0.482

图6 编码词典笔画重建结果((a),(c) 原始草图；(b),(d) 重建结果)

Fig. 6 Stroke reconstruction results using the codebook ((a),(c) Original sketch; (b),(d) Reconstructed result)

参考文献 27

[1]	AUSTIN J, JOHNSON D D, HO J, et al. Structured denoising diffusion models in discrete state-spaces[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 1376.
[2]	LONG X X, LIN C, LIU L J, et al. NeuralUDF: learning unsigned distance fields for multi-view reconstruction of surfaces with arbitrary topologies[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 20834-20843.
[3]	VAN DEN OORD A, VINYALS O, KAVUKCUOGLU K. Neural discrete representation learning[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6309-6318.
[4]	郑屹, 黄向, 秦菲儿, 等. 2D/3D生成式人工智能技术发展及创意产业应用[J]. 中国图象图形学报, 2025, 30(6): 1953-1984.
	ZHENG Y, HUANG X, QIN F E, et al. AIGC 2D/3D technology development and creative industry applications[J]. Journal of Image and Graphics, 2025, 30(6): 1953-1984 (in Chinese). DOI URL
[5]	刘安安, 苏育挺, 王岚君, 等. AIGC视觉内容生成与溯源研究进展[J]. 中国图象图形学报, 2024, 29(6): 1535-1554.
	LIU A A, SU Y T, WANG L J, et al. Review on the progress of the AIGC visual content generation and traceability[J]. Journal of Image and Graphics, 2024, 29(6): 1535-1554 (in Chinese). DOI URL
[6]	李纪远, 管哲予, 宋海川, 等. 人在环路的细分行业logo生成方法[J]. 图学学报, 2025, 46(2): 382-392. DOI
	LI J Y, GUAN Z Y, SONG H C, et al. Human-in-the-loop field-specific logo generation method[J]. Journal of Graphics, 2025, 46(2): 382-392 (in Chinese). DOI
[7]	GUO C E, ZHU S C, WU Y N. Primal sketch: integrating structure and texture[J]. Computer Vision and Image Understanding, 2007, 106(1): 5-19. DOI URL
[8]	LI M T, LIN Z, MECH R, et al. Photo-sketching: inferring contour drawings from images[C]// 2019 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2019: 1403-1412.
[9]	GE S W, GOSWAMI V, ZITNICK L, et al. Creative sketch generation[EB/OL]. [2025-07-11]. https://openreview.net/forum?id=gwnoVHIES05.
[10]	赵鹏, 高杰超, 周彪, 等. 基于对抗自编码器的矢量草图生成方法[J]. 计算机辅助设计与图形学学报, 2020, 32(12): 1957-1966.
	ZHAO P, GAO J C, ZHOU B, et al. A novel vector sketch generation method based on adversarial autoencoder[J]. Journal of Computer-Aided Design & Computer Graphics, 2020, 32(12): 1957-1966 (in Chinese).
[11]	LIU R T, YU Q, YU S X. Unsupervised sketch to photo synthesis[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 36-52.
[12]	MANUSHREE V, SAXENA S, CHOWDHURY P, et al. XCI-Sketch: extraction of color information from images for generation of colored outlines and sketches[EB/OL]. [2025-07-11]. https://arxiv.org/abs/2108.11554.
[13]	LI S C, LI K, KACHER I, et al. ArtPDGAN: creating artistic pencil drawing with key map using generative adversarial networks[C]// The 20th International Conference on Computational Science. Cham: Springer, 2020: 285-298.
[14]	HA D, ECK D. A neural representation of sketch drawings[EB/OL]. [2025-07-11]. https://openreview.net/forum?id=Hy6GHpkCW.
[15]	RIBEIRO L S F, BUI T, COLLOMOSSE J, et al. Sketchformer: transformer-based representation for sketched structure[C]// IEEE/ CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 14141-14150.
[16]	CARLIER A, DANELLJAN M, ALAHI A, et al. DeepSVG: a hierarchical generative network for vector graphics animation[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1372.
[17]	LOPES R G, HA D, ECK D, et al. A learned representation for scalable vector graphics[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 7929-7938.
[18]	DAS A, YANG Y X, HOSPEDALES T, et al. BézierSketch: a generative model for scalable vector sketches[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 632-647.
[19]	WANG Q, DENG H G, QI Y G, et al. SketchKnitter: vectorized sketch generation with diffusion models[EB/OL]. [2025-05-16]. https://openreview.net/forum?id=4eJ43EN2g6l.
[20]	DAS A, YANG Y X, HOSPEDALES T, et al. ChiroDiff: modelling chirographic data with diffusion models[EB/OL]. [2025-05-16]. https://openreview.net/forum?id=1ROAstc9jv.
[21]	BANDYOPADHYAY H, BHUNIA A K, CHOWDHURY P N, et al. SketchINR: a first look into sketches as implicit neural representations[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 12565-12574.
[22]	GUPTA K, LAZAROW J, ACHILLE A, et al. LayoutTransformer: layout generation and completion with self-attention[C]// IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 984-994.
[23]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
[24]	KONG X, JIANG L, CHANG H W, et al. BLT: bidirectional layout transformer for controllable layout generation[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 474-490.
[25]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 574.
[26]	GU S Y, CHEN D, BAO J M, et al. Vector quantized diffusion model for text-to-image synthesis[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10686-10696.
[27]	INOUE N, KIKUCHI K, SIMO-SERRA E, et al. LayoutDM: discrete diffusion model for controllable layout generation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 10167-10176.