Welcome to Journal of Graphics
Bimonthly, Started in 1980
Administrated: China Association for  Science and Technology
Sponsored: China Graphics Society
Edited and Published: Editorial Board  of Journal of Graphics
Chief Editor: Guoping Wang
Editorial Director: Xiaohong Hou
ISSN 2095-302X
CN 10-1034/T
Current Issue
30 April 2026, Volume 47 Issue 2 Previous Issue   
For Selected: Toggle Thumbnails
Review
A review of image data augmentation based on generative models
XIANG Ting, TANG Zhuo, ZHENG Jiali, CHEN Changjian, LYU Fei, LI Kenli
2026, 47(2): 235-250.  DOI: 10.11996/JG.j.2095-302X.2026020235
HTML    PDF 23     6

Deep learning has shown great potential in the field of computer vision, but its performance in practical applications relies heavily on large amounts of high-quality labeled data. Generative models, with their ability to generate diverse data, have become an effective solution to the problem of data scarcity, aiming to provide training data for computer vision efficiently and effectively. Consequently, image data augmentation techniques based on generative models have become a popular research direction in recent years. To this end, a comprehensive literature review was conducted on image data augmentation methods based on generative models. Through a three-stage retrieval process, 37 relevant studies were collected. The methodological processes of these studies were summarized into four main steps, with each step categorized and described in detail. First, various generative models suitable for image data augmentation were introduced, focusing on model selection. Next, generative image data augmentation methods were classified, with elaborations on the workflow, representative studies, existing challenges, and areas in need of optimization for each category. Considering that generated data may contain noise, methods were also discussed for the selection and processing of generated data to better utilize them in downstream tasks. Furthermore, evaluation methods were categorized and described to comprehensively verify the effectiveness and robustness of data augmentation approaches. Finally, the opportunities and challenges faced by generative image data augmentation in aspects were elaborated upon, such as maintaining semantic consistency, ensuring diversity, improving generation efficiency, and applying to black-box models, and pointed out potential directions for future exploration.

Figures and Tables | References | Related Articles | Metrics
A comparative analysis of domestic and international research on surgical robot interaction design using CiteSpace
WANG Yirui, HUA Xinyi, TANG Tianyu, WANG Yilin, YAN Zhiqi, GENG Zihan, CHEN Xingyu, YANG Jianming, SUN Bowen
2026, 47(2): 251-263.  DOI: 10.11996/JG.j.2095-302X.2026020251
HTML    PDF 10     4

To address the current development status of surgical robot interaction design, a systematic comparative and visualized analysis was conducted. Based on the Web of Science and CNKI databases, literature related to surgical robot interaction design was retrieved. Bibliometric and content analysis methods were applied in combination with the visualization functions of VOSviewer and CiteSpace to construct knowledge maps. From three dimensions-collaboration network distribution, research hotspot themes, and temporal evolution the research landscape and development trends in this field were revealed. The results showed that international research on surgical robot interaction design had started earlier, with close institutional collaborations and more refined, technology-driven research focuses. In contrast, domestic research in this area was initiated later, with weaker institutional cooperation, more scattered topics, and a greater emphasis on theoretical exploration and user experience. It was concluded that future research should strengthen interdisciplinary collaborative innovation and integrate advanced technologies such as intelligent speech recognition, high-precision visual and haptic digitization, intelligent motion trajectory planning, machine learning, and big data modeling to promote the intelligent, precise, and human-centered development of surgical robot interaction design.

Figures and Tables | References | Related Articles | Metrics
Image Processing and Computer Vision
Video attractiveness assessment method for scenic live stream recommendations
ZHOU Qiang, HUANG Yaoqiu, SHI Weimin, ZHOU Zhong
2026, 47(2): 264-274.  DOI: 10.11996/JG.j.2095-302X.2026020264
HTML    PDF 9     3

With the proliferation of 5G, cloud computing, and audio-video technologies, live streaming has emerged as a pivotal medium for online cultural tourism. However, mainstream multi-camera “slow live broadcasts” lack human-guided narration and scripting, resulting in high content randomness that undermines traditional recommendation methods based on user preferences or video popularity. To address this limitation, video attractiveness assessment method was proposed to predict audience engagement by evaluating how multi-source video content stimulated viewer attention and emotional resonance. This approach proved more suitable for scenic-area live streaming scenarios than conventional methods. Centered on video attractiveness, a multi-perspective guided video- description generation method was developed and leveraged a Large Vision-Language Model (LVLM) to extract key information, structure content representations, and infer emotional semantics, synthesizing them into readable descriptive texts and attractiveness factors. Secondly, a multimodal feature fusion-based attractiveness assessment method integrated cross-attention mechanisms, dynamic saliency, and negative sample augmentation within a contrastive-learning network to output attractiveness scores and critical factors. Finally, an attractiveness driven live-streaming system prototype for scenic areas was implemented, featuring channel recommendation, attractiveness visualization, and AI-guided navigation. Validation on the TVSum50 dataset was conducted and demonstrated a 7.00% improvement in video-description relevance over raw descriptions and a 6.00% gain in cross-task generalization. On a self-built scenic live streaming dataset, the multimodal attractiveness evaluation method achieved a 24.00% higher accuracy than unimodal baselines.

Figures and Tables | References | Related Articles | Metrics
Text-to-image person re-identification based on multi-granularity color learning
ZHOU Tenglong, YANG Wenjie, YIN Shaohua, YU Yuanlong
2026, 47(2): 275-285.  DOI: 10.11996/JG.j.2095-302X.2026020275
HTML    PDF 13     12

Text-to-image person re-identification aims to retrieve a target person from an image database using natural-language descriptions. This task is of considerable practical importance for applications in video surveillance and public safety. Although existing text-to-image person re-identification methods have made significant progress in cross-modal fine-grained alignment, the exploration of color as a key discriminative cue remains insufficient. This is primarily due to the significant semantic gap between discrete textual color descriptions and continuous visual color representations. This modality difference can mislead the model’s feature-learning process and ultimately limits the final retrieval accuracy. To address these challenges, a novel framework for text-to-image person re-identification based on Multi-Granularity Color Learning (MGCL) was proposed. Our method employed a dual-tower vision-language model architecture as the feature-extraction backbone and learned color information at three distinct granularities: global, phrase, and word. This multi-granularity design aimed to capture and align color information in a coarse-to-fine manner, thereby comprehensively enhancing the color perception and cross-modal alignment accuracy. At the global granularity, color-consistency modeling was introduced. A decoder with a cross-attention mechanism was used to fuse grayscale-image embeddings with joint image-text embeddings to reconstruct the visual representation of the color image. This module guided the model to learn an implicit mapping from textual concepts to the continuous visual-color space, thus alleviating the semantic differences in cross-modal color representations. At the phrase granularity, a color-phrase multi-label classification task was designed. This task aligned the reconstructed visual representation of the color image with a pre-constructed feature library of color phrases by projecting them into a shared semantic space. The objective was to strengthen the precise model understanding of “color-object” associations. At the word granularity, a color-aware replacement detection mechanism was proposed. This mechanism enhanced the model’s sensitivity to specific color words by masking them in the text and then training the model to predict whether they had been substituted. Experimental results demonstrated that MGCL achieved more precise cross-modal fine-grained alignment through its multi-granularity color learning. It obtained superior performance on three public datasets: CUHK-PEDES, ICFG-PEDES, and RSTPReid, validating the effectiveness of the method for the text-to-image person re-identification task.

Figures and Tables | References | Related Articles | Metrics
Cross-modal consistency detection via graph topological feature extraction
FANG Youjiang, WANG Shihao, ZHANG Liang, DUAN Keran, LIU Yue, WEI Xiaopeng, YANG Xin
2026, 47(2): 286-295.  DOI: 10.11996/JG.j.2095-302X.2026020286
HTML    PDF 5     4

With the rapid development of social media, massive multimodal content is extensively disseminated during public opinion events, making automated public opinion monitoring a critical tool for social governance and early risk warning. Complex linguistic expressions such as sarcasm and metaphor frequently appear in online discourse and are often characterized by inconsistencies between textual and visual modalities, which significantly complicates automatic detection. Existing cross-modal consistency detection methods face limitations in structurally modeling unimodal and multimodal information and in capturing deep semantic correlations, hindering the precise control of real-world public opinion trends. To address these issues, a Graph-structure-aware Cross-modal Public Opinion Network (GCPNet) is proposed. First, the CLIP (Contrastive Language-Image Pretraining) model was utilized as a feature encoder, and fully connected graph topological structures were constructed with textual words and image patches as nodes. Graph Convolutional Networks (GCNs) were employed to explicitly mine and enhance the semantic and structural correlations within multimodal information. Second, a hierarchical interactive attention graph module was designed to improve global modeling and deep interaction capabilities for complex contexts through three stages: fine-grained cross-attention alignment, global adaptive gating fusion, and dynamic graph structure enhancement. Finally, an adaptive weighted fusion strategy was adopted to dynamically integrate unimodal structured features and cross-modal interactive features. Experimental results on the public benchmark dataset MMSD2.0 show that GCPNet accurately captured cross-modal consistency cues and effectively identified complex public opinion content such as sarcasm and metaphor, outperforming existing state-of-the-art methods in terms of accuracy and robustness. This research provides a new methodological pathway and theoretical foundation for multimodal public opinion understanding, offering a practical tool for real-world opinion governance and social risk mitigation.

Figures and Tables | References | Related Articles | Metrics
Precise-oil leakage segmentation for substation equipment under water-accumulation interference
ZHAO Zhenbing, ZHANG Jingliang, TANG Chenkang, BI Yuxuan, LI Haopeng
2026, 47(2): 296-310.  DOI: 10.11996/JG.j.2095-302X.2026020296
HTML    PDF 6     3

Accurate segmentation of oil leakage from substation equipment is crucial for ensuring the safe operation of power systems. However, existing segmentation methods face significant challenges due to the high visual similarity between oil leakage and water accumulation, the irregular morphology of oil spills, and the scarcity of training data for such interference scenarios. To address these issues, a comprehensive solution from data augmentation to network-model optimization was proposed. First, a novel diffusion model-based timestep-adaptive tuning method (Evolutionary Tuning, EvoTune) was designed to dynamically adjust the feature contributions of U-Net during image generation, effectively expanding the quantity and diversity of water interference scenarios in the dataset. Second, based on the data augmentation, a high-performance oil leakage segmentation network HyDR-Net (Hydro Discriminative Refining Network) was proposed. The network incorporated a Discriminative Boundary Anti-Interference Module (DBAIM) to enhance feature discrimination between oil leakage and confusing backgrounds such as water accumulation while effectively suppressing background noise, and a Multi-Scale Attentive Alignment Module (MAAM) for multi-scale context-aware processing and fine-grained boundary calibration of oil leakage features to accommodate the irregular morphology of oil spills. Experimental results showed that EvoTune achieved SSIM, PSNR, and NIQE scores of 0.918 8, 26.790 2 dB, and 5.713 3, respectively, significantly improving training-data quality and the realism of generated water-accumulation regions, while HyDR-Net achieved F1 and PA scores of 80.46% and 92.15%, respectively, substantially outperforming existing mainstream segmentation methods across all key evaluation metrics, and particularly exhibiting superior segmentation accuracy and robustness under complex water-interference scenarios. The research provided an effective approach for addressing data scarcity under specific visual-interference conditions and offered robust technical support for intelligent and precise detection of oil leakage in substation equipment.

Figures and Tables | References | Related Articles | Metrics
Multiscale temporal enhanced action recognition method based on hypergraph Transformer
CHEN Qingshuan, CHEN Enqing, GUO Xin, WANG Song
2026, 47(2): 311-321.  DOI: 10.11996/JG.j.2095-302X.2026020311
HTML    PDF 12     8

Skeleton-based human action recognition has gained widespread attention due to its robustness to background interference and structured representations. In recent years, the Transformer architecture has been widely applied to this task due to its powerful modeling capabilities. However, the existing methods still face challenges in recognizing actions with local detail changes, complex temporal dynamics, or strong temporal dependence, mainly because of their insufficient local spatial semantic modeling, limited multi-scale dynamic perception, and a lack of explicit temporal location perception. in addition, traditional temporal convolution used for dimensionality reduction was prone to the loss of important dynamic information. To overcome these problems, a multi-scale temporal-enhanced model based on a hypergraph Transformer was proposed. Specifically, a Local-Multi-Scale Enhancement (LME) module was designedto enhance the perception of local features in key areas such as limbs through a rectangular context modeling mechanism, and an efficient multi-scale attention mechanism was used to integrate action patterns at different time granularities, improving the adaptability of the model to multi-rhythmic actions. At the same time, a learnable Temporal Positional Encoding (TPE) was introduced into the spatial attention module to inject temporal priors into the spatial dependence modeling to capture the spatio-temporal coupling relationship more accurately. Furthermore, a time-compression module, Squeeze and Excitation Downsampling (SEDS), based on the Haar wavelet transform and channel attention mechanism was adopted to replace the dimensionality reduction by traditional time convolution, reducing the calculation amount while preserving the key dynamic information. The experimental results on three public datasets, NTU RGB +D 60, NTU RGB+D 120, and Northwestern UCLA, showed that the proposed model outperformed many mainstream methods in recognition accuracy, especially in complex background, detailed action and large-scale data scenes.

Figures and Tables | References | Related Articles | Metrics
SAM-based mask generation and segmentation for dermatological images
CHEN Mengqi, ZHAO Junli, DENG Xiaodan
2026, 47(2): 322-331.  DOI: 10.11996/JG.j.2095-302X.2026020322
HTML    PDF 4     3

As a malignant tumor with a relatively high incidence rate, the timely detection of skin cancer carries substantial clinical significance. Accurate identification and segmentation of skin lesions serve as critical prerequisites for computer-aided diagnosis. Despite the remarkable performance of deep learning techniques in medical image segmentation, existing models commonly encounter challenges such as insufficient segmentation accuracy at lesion edges and constraints on the scale and diversity of training data. To address these issues, a boundary-enhanced model system named BESA-Diff was proposed. The system employed the boundary-enhanced diffusion model DermoSegDiff as its core segmentation architecture and optimized the model training workflow. The core technical contributions of this research were twofold: First, a framework for the automatic generation of pathological skin images and masks was constructed based on diffusion models. Second, an innovative mask refinement pipeline was designed by innovatively integrating the Segment Anything Model (SAM) with the edge refinement module of DermoSegDiff, and a high-quality synthetic medical image dataset was established. Experimental evaluations on the ISIC2018 standard dataset, PH2 dataset, HAM10000 dataset, and the synthetic dataset demonstrated that the proposed model significantly outperformed baseline models in key segmentation metrics, including the Dice Similarity Coefficient (Dice) and Intersection over Union (IoU). Ablation experiments confirmed that the introduction of SAM for mask refinement was the pivotal factor driving performance improvement. This module effectively enhanced the segmentation of lesion edges, particularly in regions with blurred boundaries or low contrast. The findings of this study validated that integrating the data generation capability of diffusion models with the boundary optimization capability of general segmentation models can effectively improve the accuracy and robustness of skin lesion segmentation. This work provided a high-performance solution for auxiliary diagnosis of skin cancer and highlighted the immense potential of synthetic data technology in overcoming the data bottleneck in medical artificial intelligence.

Figures and Tables | References | Related Articles | Metrics
Perceptually-aligned panoramic image quality assessment via global semantic feature fusion
BAO Yongtang, WANG Moqin, WANG Zhihui, MA Guangxiao
2026, 47(2): 332-340.  DOI: 10.11996/JG.j.2095-302X.2026020332
HTML    PDF 5     2

Panoramic Image Quality Assessment aims to objectively reflect the subjective perceptual quality of immersive visual content. However, a significant discrepancy often exists between the objective predictions of current deep learning models and human subjective perception, primarily due to an over-reliance on low-level distortion features. To address this critical issue, a novel Hierarchical Semantic-Guided Network, was proposed, which emulated the “top-down” cognitive mechanism inherent in the human visual system. Prevailing methods predominantly follow a “bottom-up” paradigm, aggregating quality scores from pixel-level features. however, this process often fails to effectively integrate high-level semantic information such as global composition and aesthetic attributes, thereby limiting the performance ceiling. To this end, a dual-path parallel information processing architecture was constructed, centered around a “top-down” semantic attention modulation mechanism. Within this architecture, a semantic prior path leveraged a Vision-Language Model to parse the input image into a structured semantic embedding. Concurrently, a visual representation path extracted multi-scale feature maps using a deep convolutional network. The designed modulation mechanism utilized the semantic embedding as a conditional input to generate dynamic attention weights, which performed real-time recalibration of the multi-scale features in the visual path. This design ensured that the entire feature extraction process was guided by high-level semantics, thereby focusing on information most critical to human subjective judgment. To ensure the ordinal relationship of the model’s predictions aligns with human perception, the entire framework was optimized end-to-end via a composite objective function that incorporated a listwise ranking loss. Comprehensive experiments on three public benchmark datasets, CVIQD, OIQA, and OIQ-10K, demonstrated that the proposed framework significantly outperformed state-of-the-art methods, validating the effectiveness and novelty of the semantic-guided paradigm in advancing perceptual quality assessment tasks.

Figures and Tables | References | Related Articles | Metrics
Cross-domain structured deep dictionary learning for image classification
YAN Kang, ZENG Li, GU Xiaoqing
2026, 47(2): 341-350.  DOI: 10.11996/JG.j.2095-302X.2026020341
HTML    PDF 7     2

Image classification plays a fundamental role in computer vision, yet conventional deep learning-based approaches typically rely on large-scale annotated datasets, which are difficult to obtain in many small-scale scenarios, especially when labeled samples in the target domain are scarce. To address this challenge, a Cross-Domain Structured Deep Dictionary Learning (CD-SDDL) method for image classification was presented. CD-SDDL constructed multilayer dictionaries in the source and target domains and introduced a cross-domain dictionary regularization to achieve structural-level soft alignment, thereby reducing domain shift. In addition, intra-class compactness, inter-class separability, and Laplacian locality-preserving constraints were incorporated to enhance geometric consistency and discriminability of learned representations. A layer-wise unfolded deep dictionary framework was further adopted to integrate structural constraints with nonlinear transformations, enabling the model to capture more complex cross-domain feature patterns. Experimental results demonstrated that CD-SDDL exhibited superior generalization ability and significantly improved classification performance compared with existing methods on cross-domain tasks.

Figures and Tables | References | Related Articles | Metrics
Multi-focus image fusion based on 3D manifold fitting and frequency division-guided attention mechanism
ZHANG Zhou, WANG Zeyu, SONG Haiyu, LI Wei, GE Mingyu, WANG Jiayu, WANG Wenqi
2026, 47(2): 351-359.  DOI: 10.11996/JG.j.2095-302X.2026020351
HTML    PDF 6     1

Multi-focus image fusion is a technique that integrates multiple images of the same scene with different focus regions to generate a fully focused and clear image featuring both distinct details and complete structural information. It has found widespread applications in fields such as consumer electronics, medical imaging, and satellite remote sensing. To address the prevalent issues such as information loss, artifacts, insufficient datasets, and high spatiotemporal overhead in deep learning-based image fusion methods, a novel fusion model based on Three-Dimensional (3D) manifold fitting and frequency-separated guided attention mechanism was proposed. The model adopted a new paradigm of feature decomposition-fusion-reconstruction. During the encoding phase, background structures and detail information were effectively identified and separated, significantly reducing the loss of structural information and the introduction of artifacts. Innovatively, 3D manifold fitting was employed to extract common features of multi-focus images, thereby reducing the model’s dependency on large datasets and lowers spatiotemporal overhead. In the feature fusion stage, a frequency-separated guided attention mechanism was introduced to accurately characterize high-frequency details and low-frequency backgrounds of images, enabling adaptive weighted fusion of cross-frequency domain features and alleviating problems such as blurred complex textures and missing details. Furthermore, to ensure the global visual quality and local detail preservation of the fused image, a weighted composite loss function was designed by integrating multiple loss constraints. Experimental results on public classical test datasets Lytro and MFFW demonstrated that the proposed method achieved state-of-the-art performance across six commonly used evaluation metrics, fully verifying its effectiveness.

Figures and Tables | References | Related Articles | Metrics
Computer Graphics and Virtual Reality
3D scene-graph generation via vision-language model distillation and large language model parsing
LU Yaguang, SHEN Xukun, HU Yong
2026, 47(2): 360-367.  DOI: 10.11996/JG.j.2095-302X.2026020360
HTML    PDF 5     2

To address the limitation of point clouds in expressing semantic relationships for 3D scene-graph generation tasks, which typically requires rendering corresponding images and fusing multimodal features-thereby introducing additional computational overhead during inference, a 3D scene-graph generation method based on Vision-Language Model (VL model) distillation and Large Language Models (LLM) was proposed. The method took 3D point clouds as input, rendered corresponding images, and aligned their feature spaces to distill knowledge from the VL model into a Graph Neural Network (GNN), thereby establishing a mapping between point-cloud instances and corresponding textual descriptions and constructing a Point-cloud-Language model (PL model). The PL model leveraged an LLM to enhance the understanding of complex semantic relationships and effectively aggregated node features through the GNN. It could capture both semantic and spatial relationships of point clouds without relying on additional image information, enabling 3D scene-graph generation for indoor environments. Experimental results demonstrated that the proposed method not only achieved robust understanding of 3D indoor environments in open-vocabulary tasks, but also significantly reduced computational overhead and inference time compared with end-to-end 3D scene-graph generation approaches that relied on vision-language models, highlighting its strong performance and practical applicability.

Figures and Tables | References | Related Articles | Metrics
3D model reconstruction based on retrieval and deformation techniques
PANG Min, LI Zhentang, ZHANG Yuan, CUI Xiaokang, XIONG Fengguang
2026, 47(2): 368-379.  DOI: 10.11996/JG.j.2095-302X.2026020368
HTML    PDF 6     3

As Virtual Reality (VR) and Augmented Reality (AR) technologies advance rapidly, the demand for high-quality 3D models has increased significantly. Traditional 3D modeling methods have drawbacks such as slow processing speed and poor adaptability to complex shapes. Consequently, a novel 3D model construction method based on 3D model retrieval and deformation was proposed. Firstly, a 3D model retrieval framework based on semantic keypoints was constructed, where sparse geometric feature points with semantic consistency were utilized to build a deformation-aware embedding space, enabling dynamic aggregation of global and local features. Meanwhile, Adaptive Global-CHANNEL Attention (AGCA) was embedded into a Transformer to form a joint attention mechanism, thereby enhancing the model’s expressiveness and retrieval accuracy. Then, for the retrieved models, a DGCNN-based keypoint-driven neural cage deformation algorithm was designed. The self-attention mechanism was utilized to calculate the influence weights of keypoints on vertices within local support regions. This process established a deformation mapping between feature keypoints and the neural cage structure, driving neural cage deformation to achieve fine-grained and constrained shape control. Finally, the loss function was improved by incorporating Chamfer distance and EMD distance constraints. This ensured that while focusing on local feature differences, geometric details were more accurately aligned, resulting in more precise 3D model reconstruction. Experiments were conducted on the Partnet and the Scan2CAD datasets to compare the proposed method with existing networks such as U-RED, ShapeFlow, and KP-RED. The results demonstrated that the proposed 3D model construction method could effectively handle noise and occlusion. The average value of the loss function was reduced by 33.33% and 41.67% on the Partnet dataset. moreover, on the Scan2CAD dataset, the average loss value was reduced by 3.6% compared with the baseline.

Figures and Tables | References | Related Articles | Metrics
PDF-Sketch: layout-based sketch generation via primitive distance fields and discrete diffusion
ZHOU Jin, ZHOU Yi, XU Pengfei, HUANG Hui
2026, 47(2): 380-389.  DOI: 10.11996/JG.j.2095-302X.2026020380
HTML    PDF 8     1

Sketches play an important role in conceptual design, digital art, and human-computer interaction. However, existing deep learning-based sketch generation methods often rely on polylines or Bézier curves for geometric representation, which are limited in capturing complex shapes. Sequential point prediction also leads to cumulative errors, causing structural distortion and loss of details. To address these issues, sketch generation was formulated as a layout modeling problem, where a sketch was composed of multiple independent stroke primitives. A framework was proposed that integrated a discrete diffusion model with the Primitive Distance Field (PDF). The method first applied adaptive stroke decomposition and a stroke autoencoder to obtain continuous and differentiable features of stroke segments. A codebook mechanism was then employed to discretize frequently recurring stroke patterns into a finite set of items, enabling the diffusion process to gradually recover a coherent set of stroke segments while jointly modeling their positions, sizes, and shapes. Experiments on the QuickDraw dataset showed that the proposed approach outperformed Sketch-rnn and SketchKnitter in terms of Frechet Inception Distance (FID), Precision, and Recall. In tasks with fewer strokes, the model captured local geometric details more effectively and achieved higher recall, while in tasks with more strokes, it demonstrated greater structural accuracy and fidelity. Qualitative comparisons further indicated that the generated sketches exhibited stronger structural coherence, richer details, and better spatial consistency. These results confirmed that the adoption of a layout-based perspective, combined with distance field representation and discretization, effectively reduced error accumulation in sequential modeling and improves both structural integrity and diversity in sketch generation. The framework also provided directions for enhancing stroke segmentation, detail recovery, and inter-segment connectivity in more complex scenarios.

Figures and Tables | References | Related Articles | Metrics
Conditional generation of CAD models based on latent diffusion models
LIU Jinghao, YOU Zhenguo, DU Dong
2026, 47(2): 390-401.  DOI: 10.11996/JG.j.2095-302X.2026020390
HTML    PDF 4     2

Creating 3D models with both manufacturability and editability based on traditional Computer-Aided Design (CAD) is a complex and time-consuming task. In recent years, deep learning technology has shown great potential in the automated generation of CAD models and has become a research hotspot. However, most CAD generation models fail to fully utilize the geometric and semantic information contained in input data such as point clouds, images, and sketches, making it difficult to accurately control the generation direction through flexible conditional inputs. To address this issue, the directional generation of CAD models can be achieved by exploring the representational capability of the latent space, adopting a denoising diffusion probabilistic model, and using such conditional input data as guidance. Specifically, a Transformer-based autoencoder was first constructed to encode CAD parameter command sequences into a latent space. Subsequently, a denoising diffusion probabilistic model was established within this space to generate CAD feature vectors by integrating conditional encoding information from point clouds, images, or sketches. Finally, the feature vectors were reconstructed into 3D CAD models via a decoder. Experimental results demonstrated that the generated CAD models exhibited reasonable structures, smooth surfaces, and distinct geometric features. Compared with existing methods, a superior balance was achieved among shape diversity, distribution similarity, and fidelity. Furthermore, the generation quality of CAD models was effectively enhanced when point clouds, images, or sketches were utilized as conditional inputs. The relevant code has been open-sourced and is available at https://github.com/Ziyou-maker/LDM4CAD.

Figures and Tables | References | Related Articles | Metrics
Digital Design and Manufacture
Toolpath generation for finishing machining of blades based on geometric features
TU Yihao, MA Wenyang, YAN Guangrong
2026, 47(2): 402-410.  DOI: 10.11996/JG.j.2095-302X.2026020402
HTML    PDF 5     5

As a core component in aerospace manufacturing, aero-engine blades face challenges of significant cutting-force fluctuations and severe tool wear during finishing due to their complex geometries and high precision requirements. To achieve stable cutting-force control in complex-surface machining, a toolpath-planning method integrating blade geometric-feature analysis and cutting-parameter optimization was proposed. First, a cutting-force model based on micro-element cutting theory was developed to analyze the relationship among surface features, cutting parameters, and cutting forces. Subsequently, to address force fluctuations caused by fixed parameters in traditional paths, a variable-scale chaotic algorithm was used co-optimize tool-axis inclination, feed rate, and cutting depth, establishing an optimization model to minimize force fluctuation. Finally, step length and row spacing were calculated based on blade geometry, and the isoparametric-line method plans tool-contact-point trajectories. Optimal cutting parameters for each point were determined by integrating the force model with the optimization results, generating the complete finishing toolpath. Results showed that this method optimized cutting-force distribution, achieved smooth machining forces, reduced tool fatigue and wear, and extended tool life. This work provided a new force-control-based approach for precision machining of complex surfaces.

Figures and Tables | References | Related Articles | Metrics
A study on knowledge mining and reuse for non-standard tool design based on deep belief network
WANG Mingwei, ZHAO Jianhua, SUN Zhihong, SUI Peng, LU Xiaojun
2026, 47(2): 411-422.  DOI: 10.11996/JG.j.2095-302X.2026020411
HTML    PDF 3     1

In the design of non-standard tools, a strongly coupled correlation between tool features and part-machining features is identified as a typical type of implicit design knowledge. It exhibits data multimodality and multi-dimensional uncertainty, leading to difficulties in capture and reuse. Therefore, a method for tool design knowledge mining and reuse based on a Deep Belief Network (DBN) was proposed. First, targeting the two-modal data of 2D images and attribute texts associated with machining features and tool features, a dual-channel DBN was designed to perform feature extraction and fusion. Second, a DBN oriented to correlation mining was designed to obtain implicit relationships between machining features and tool features. Finally, existing tool cases were evaluated and reused through association-rule reasoning and an improved Rake algorithm. Taking the design process of a non-standard special inner-hole-groove tool as an example, the effectiveness of the method was verified by comparing the reuse results with the actual results in terms of structural and attribute information.

Figures and Tables | References | Related Articles | Metrics
Digital Design and Manufacture
Camera calibration and 2D digitalization of artifacts based on L-shaped target
ZHAO Min, WANG Niuna, YAN Tongying, ZHU Lingjian
2026, 47(2): 423-431.  DOI: 10.11996/JG.j.2095-302X.2026020423
HTML    PDF 5     2

In the 2D digital conservation and research of movable artifacts, it is essential to provide orthographic images of the relics as well as their external dimensions. Camera calibration not only corrects imaging distortion but also enhances precision. Simulation analysis indicates that the accuracy of camera calibration depends on the proportion of the field of view occupied by the target, and an L-shaped target achieves a similar level of accuracy to the rectangular target. Therefore, a camera calibration method based on an L-shaped target and a calculation method for the external dimensions of artifacts were proposed. An L-shaped target with directional markers and a target point matching method were designed, enabling the selection of an appropriate target range for calibration according to the imaging field of view of the relics, thus ensuring calibration accuracy. Since the designed target did not block the relics, camera calibration and the acquisition of orthographic images of the relics could be conducted simultaneously. The internal and external parameters of the camera were obtained accurately, and high-precision calculations of the relics’ dimensions were realized using3D coordinate data. Experimental results demonstrated that the target-point recognition and matching method for L-shaped targets was accurate, that the camera calibration parameters effectively corrected image distortion in cultural relics, and that the measurement accuracy for the external dimensions of various relics consistently was below 0.2 mm, representing an improvement of over one order of magnitude compared to the scale method. The proposed approach provided significant technical support for the 2D digitization and preservation of movable cultural relics.

Figures and Tables | References | Related Articles | Metrics
Research on model-based multi-dimensional comprehensive trade-off method for complex system solutions
HE Wenhu
2026, 47(2): 432-439.  DOI: 10.11996/JG.j.2095-302X.2026020432
HTML    PDF 5     1

The scientific evaluation of initial solutions in the early stages of complex system design is crucial to the success of the development. To address issues in traditional solution evaluation-such as reliance on expert experience, limited perspectives, and poor data correlation, a model-based multi-dimensional comprehensive trade-off analysis method for system solutions was proposed. Leveraging the technical advantages of Model-Based Systems Engineering (MBSE) in architectural modeling and simulation, this method first constructed a multi-dimensional indicator system covering performance, technology, cost, and schedule. A mapping relationship was then established between system architecture and parameter models using the Systems Modeling Language, enabling dynamic association between trade-off indicators and subsystem parameters. On this basis, a parameter normalization and weighted summation approach was adopted, accounting for positive/negative effects, innovatively integrated with SysML parametric diagrams to construct a multi-dimensional indicator framework for comprehensive complex system trade-off modeling, along with establishing a seven-step analysis process for multi-dimensional comprehensive trade-off. Validation using aircraft engine solution trade-off case, demonstrated that this approach could efficiently and automatically evaluate and rank multiple alternative solutions, verifying the engineering practicality and the model effectiveness of the comprehensive trade-off analysis process. This approach provided an effective model-driven decision-making tool for evaluating complex system solutions.

Figures and Tables | References | Related Articles | Metrics
BIM/CIM
Development of a graph theory-based automated calculation method and tool for airflow resistance in building ventilation systems
JIANG Kai, XU Jinglin, YU Fangqiang
2026, 47(2): 440-448.  DOI: 10.11996/JG.j.2095-302X.2026020440
HTML    PDF 8     4

In building mechanical and electrical systems, airflow resistance calculation of ventilation systems is a critical task in HVAC design and optimization. Traditional methods, which rely on manual identification of the most unfavorable loop and segment-by-segment resistance calculation, suffer from low efficiency and high error rates, making them inadequate for complex engineering demands. With the advancement of Building Information Modeling (BIM), automated analysis based on Revit models has become an important approach to improving design quality. A graph theory-based automated calculation method for airflow resistance in building ventilation systems was proposed. Based on Revit MEP models, the topological relationships of ventilation system components were extracted via the API and abstracted into an undirected graph, where fittings were represented as nodes and duct segments as edges. The Breadth-First Search (BFS) algorithm was employed to traverse the entire system from the fan, identifying all connected air terminals and constructing a tree structure with the fan as the root node and the terminals as leaf nodes. A bottom-up, depth-first backtracking strategy was then applied to calculate the downstream airflow at each node layer by layer. By integrating local resistance coefficients and frictional resistance formulas, the total resistance for each path was computed. The most unfavorable loop was automatically identified by comparing the resistance values of all paths, enabling an intelligent and automated analysis of the system’s airflow resistance. An automated calculation plugin integrated into the Revit platform was developed based on this method and validated in a large-scale laboratory project in Shanghai. The case study involved a comprehensive review of 144 fans, revealing that the original selection for 28 fans could not meet the system’s resistance requirements, while the calculation efficiency was improved by approximately 37 times compared to traditional manual methods. This research provided reliable data support for fan selection, energy-saving optimization, and subsequent operation and maintenance management, demonstrating strong potential for practical engineering applications.

Figures and Tables | References | Related Articles | Metrics