Loading...
Welcome to Journal of Graphics share: 

Current Issue

    For Selected: Toggle Thumbnails
    Review
    Review of deep learning based methods for detecting focal liver lesions
    DONG Wenyi, YANG Weidong, TANG Binghui, WANG Qi, XIAO Hongyu
    2026, 47(1): 1-16.  DOI: 10.11996/JG.j.2095-302X.2026010001
    HTML    PDF 21     7

    The detection of Focal Liver Lesions (FLLs) is crucial for disease diagnosis and treatment. Traditional detection methods face many challenges, and the application of deep-learning technology brings new opportunities. In view of this, this paper systematically reviewed the deep-learning-based FLLs detection methods, and provided specific research directions for the development of FLLs detection technology by analyzing the advantages and disadvantages of related technologies. First, the public datasets of liver radiological images were organized and summarized, and the key role of data preprocessing in improving model performance was expounded. Secondly, the 2D and 3D detection algorithms based on convolutional neural networks, Transformer, knowledge distillation, and other technologies were compared and analyzed, revealing the technical evolution path from local feature modeling to global spatio-temporal correlation. In addition, the temporal feature fusion methods for multi-phase images were examined in depth, providing new ideas for dynamic lesion characterization. The review showed that existing methods had achieved breakthroughs in detection accuracy and efficiency, but still faced challenges such as insufficient sensitivity to small lesions, weak cross-device generalization, and lack of clinical verification. Future research was recommended to accelerate the clinical transformation and application of deep learning in auxiliary diagnosis of liver lesions through multi-center data collaboration, lightweight algorithm design, and enhanced interpretability.

    Figures and Tables | References | Related Articles | Metrics
    Image Processing and Computer Vision
    A vehicle damage classification model incorporating dual attention and weighted dynamic convolution
    ZHAI Yongjie, WANG Zixuan, ZHANG Zhenqi, ZHOU Xunqi, WANG Qianming
    2026, 47(1): 17-28.  DOI: 10.11996/JG.j.2095-302X.2026010017
    HTML    PDF 12     7

    To address the challenges of morphological similarity and the resulting difficulty in classifying vehicle damage images uploaded by clients for auto insurance claims, a model named ResAWDNet was proposed for vehicle damage classification. Firstly, to effectively augment the model’s capacity for extracting damage features, the traditional down sampling operation was replaced with weighted dynamic convolution. This approach dynamically adjusted the weights of convolutional kernels based on the input features, thereby enhancing the model’s adaptability to features of varying scales and orientations. As a result, it enabled more precise capture of the subtle differences in vehicle damage. Secondly, to ensure that the model could concentrate on the salient discriminative regions and feature channels within the images, a dual attention mechanism was embedded after the convolutional layers of the backbone network. This mechanism concurrently learned the important weights in both spatial and channel dimensions, significantly enhancing the model’s ability to capture crucial information. Consequently, it further enhanced the decision-making accuracy of the model in the task of vehicle damage classification. Finally, experimental validation was conducted based on a dataset of vehicle damage images sourced from real accident cases. The experimental results demonstrated that the ResAWDNet model was feasible and offered significant advantages for vehicle damage classification tasks, achieving an accuracy rate of 73.79%. Compared with baseline models, ResAWDNet achieved higher accuracy in classifying multiple types of damages, robustly validating the effectiveness of the proposed model.

    Figures and Tables | References | Related Articles | Metrics
    Generative model based unsupervised multi-view stereo network
    PAN Yuxuan, JIN Rui, LIU Yu, ZHANG Lin
    2026, 47(1): 29-38.  DOI: 10.11996/JG.j.2095-302X.2026010029
    HTML    PDF 9     4

    Existing research on multi-view stereo scheme utilizes depth-estimation algorithms to achieve stereo representation by establishing a mapping relationship between the physical and digital worlds. Supervised learning-based neural networks have achieved accurate and high-fidelity 3D reconstruction results through training. However, in-the-wild visual reconstruction remains challenging due to the lack of rendered depth priors and wide-baseline characteristics of images. A novel system was proposed to obtain optimized depth for naturally collected multi-view images without prior information by applying an unsupervised learning network and semantically optimized Neural Radiation Field (NeRF) rendering. First, preliminary depth information for wild multi-view images were produced without ground truth based on unsupervised deep learning. Subsequently, in a separate NeRF module, a diffusion model was used to construct a surface semantic rendering loss, enabling a fine-grained volumetric representation. Experimental results on the benchmark dataset validated the performance of the proposed system by improving an average of 24.6% of the overall metrics, compared with other state-of-the-art schemes. A novel wild wide-baseline dataset was also applied to verify the generalization performance, and the proposed system reduced the reconstruction error by up to 40.8% compared with all methods.

    Figures and Tables | References | Related Articles | Metrics
    A mixed-precision quantization method for large language models via memory alignment
    LI Zhangming, GUAN Weifan, CHANG Zhengwei, ZHANG Linghao, HU Qinghao
    2026, 47(1): 39-46.  DOI: 10.11996/JG.j.2095-302X.2026010039
    HTML    PDF 7     2

    As large models continue to grow in scale, the memory footprint and computational overhead of model inference have become critical challenges. Mixed-precision quantization is an effective approach to reduce resource consumption, but existing methods suffer from insufficient outlier handling, significant quantization accuracy loss, and inefficient memory access. To address these issues, a memory-aligned mixed-precision quantization method for large models was proposed. First, weights were divided into SIMD-aligned groups, and outlier groups were identified via group-wise significance analysis, with high-significance groups quantized to 8 bit and others to 2 bit. A block-wise compensation strategy was introduced to mitigate accuracy degradation caused by 2 bit quantization. Furthermore, an efficient packing and storage scheme was designed for mixed-precision weights, where a bitmap was used to record the bit width of each data block, enabling random access. Experimental results demonstrated that the proposed method significantly reduced memory usage and improved computational efficiency while maintaining model accuracy. Specifically, on Llama2-7 B/13 B/70 B, the approach achieved perplexity reductions of 8.13/2.84/1.37 on WikiText-2 and 5.80 on C4 relative to state-of-the-art baselines. The quantized 70 B model reduced weight storage by approximately 87% compared with BF16. Across seven QA benchmarks, an average accuracy gain of 6.24% was achieved. Last, these results indicated that a mixed-precision quantization method for large language models via memory alignment could simultaneously improve compression ratio, memory-access efficiency, and overall model performance.

    Figures and Tables | References | Related Articles | Metrics
    Image classification method based on uncertainty-driven smart reinforcement active learning
    JIU Mingyuan, WU Guowei, SONG Xuguang, LI Shupan, XU Mingliang
    2026, 47(1): 47-56.  DOI: 10.11996/JG.j.2095-302X.2026010047
    HTML    PDF 7     2

    With the rapid development of deep learning, remarkable achievements have been made in image classification and related tasks. However, the success of these models heavily relies on large amounts of high-quality labeled data. In real-world applications, labeled data is often scarce, and manual annotation is time-consuming, labor-intensive, and costly, which limits the scalability and deployment of deep learning models. In recent years, active learning has gained significant attention due to its ability to improve model performance under limited annotation budgets. The core idea of active learning is to select the most valuable data for labeling based on certain criteria such as uncertainty, diversity, or representativeness. To address the limitations of traditional active learning methods, which often rely on manually designed heuristic sampling strategies that struggle to adapt to different task scenarios and are difficult to dynamically optimize, a Smart Reinforcement Active Learning (SRAL) approach for image classification is proposed. The sample selection process is modeled as a MARKOV DECISION PRocess (MDP), leveraging reinforcement learning’s adaptive strategy optimization ability to guide the model in dynamically selecting the most valuable samples from the unlabeled data for labeling. In this framework, the state is represented by features extracted from the unlabeled samples, the action indicates whether a sample should be selected for labeling, and the reward function is defined as the change in model accuracy after incorporating the selected sample into the training set. The Actor-Critic algorithm is adopted to optimize the sampling policy, and uncertainty-based heuristic ranking is incorporated as auxiliary information to improve the learning efficiency. Experimental results demonstrate that the proposed SRAL method significantly improves classification accuracy under the same labeling budget compared to other active learning approaches on datasets such as CIFAR-10, SVHN, and FASHION-MNIST. Furthermore, SRAL exhibits robust stability and strong generalization ability across these datasets. This confirms the effectiveness and advantages of SRAL in enhancing the performance of image classification models.

    Figures and Tables | References | Related Articles | Metrics
    A lightweight image flare removal method for night vision assisted driving
    LI Ye, JIA Junyang, HUANG Guan, LI Yujie, QI Wenting, LIU Yan
    2026, 47(1): 57-67.  DOI: 10.11996/JG.j.2095-302X.2026010057
    HTML    PDF 5     1

    In night-vision environments, image quality was significantly degraded by glare from intense light sources, impairing the performance of night-vision assisted driving systems. Existing flare-removal algorithms suffer from limited robustness, high computational complexity, and loss of light-source information. To address these challenges, a lightweight image flare-removal method, Night Flare Removal Network+ (NFR-Net+), was proposed to enhance image clarity while meeting the real-time computational demands of mobile devices. The approach first incorporated a feature-filtering mechanism combined with residual connection strategies to strengthen feature extraction capabilities, effectively mitigating overfitting and ensuring robust flare removal across diverse lighting conditions and flare types. Additionally, a nonlinear, activation-free feature attention module was introduced. Via a lightweight design, an efficient attention mechanism was constructed that significantly improved image-detail reconstruction while reducing model parameters by approximately 8.28% and runtime memory by about 11.1%, thereby optimizing computational efficiency. To tackle the issue of diminished image naturalness due to excessive light-source removal in traditional methods, an enhanced light-source extraction module was developed within the segmentation network. This module employed an improved light-source separation strategy to accurately preserve brightness and texture details in light-source regions, ensuring the authenticity and naturalness of output images. Experimental results demonstrated that NFR-Net+ surpassed state-of-the-art methods on image quality metrics such as Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity (LPIPS), exhibiting superior flare-removal performance and detail preservation. The method also demonstrated strong adaptability across various night-vision scenarios and hardware devices, fulfilling the efficiency requirements for real-time processing. Ablation studies further validated the effectiveness of individual components, highlighting the critical role of feature filtering and attention mechanisms in balancing performance and resource consumption. This approach provided an efficient, lightweight solution for applications such as nighttime autonomous driving and intelligent surveillance.

    Figures and Tables | References | Related Articles | Metrics
    Performance evaluation of construction site object detection under drone-captured perspective
    SONG Zhuo, LU Dehui, HUANG Zhichao, TIAN Shiyu, YAN Ronglong, DENG Yichuan
    2026, 47(1): 68-77.  DOI: 10.11996/JG.j.2095-302X.2026010068
    HTML    PDF 7     4

    The organizational management of construction sites is a critical aspect in engineering management; however, traditional human supervision method is constrained by many environment limitations and low efficiency. In recent years, multiple government departments have issued relevant policies advocating deep integration of artificial intelligence with the real economy to promote high-quality and efficient economic development. The accuracy, efficiency, and automation advantages of Computer Vision (CV) technology have gradually led to its widespread application in the field of construction supervision. Meanwhile, the drones, which can efficiently obtain complex and varied visual data of construction scene, demonstrate their application potential in CV-based construction supervision tasks. However, the current researches on drone-based construction scene detection are limited, and the lack of overhead-perspective construction-scene image datasets restricts further development in the field. Therefore, the DJI Mavic 3T drone was utilized to obtain construction-site images to establish an open-source overhead image dataset for construction scene UB-CSD. Several advanced object-detection algorithms were selected for comparative experiments on the UB-CSD dataset, and the reasons for performance differences were analyzed from multiple dimensions such as model workflow design, computation principle, and task characteristics. The mAPs of every algorithm’s detection result were YOLOv8 and YOLOv10 (96.1%), YOLOv9 (96.0%), YOLO11 (95.7%), DETR (95.3%), Faster-RCNN (76.3%) and RetinaNet (72.1%). The analysis results indicated that the YOLO series algorithm constituted the most optical algorithm for drone-based object detection tasks in construction scenes. By establishing a new open-source special dataset and conducting comparative experiments, the conclusion drawn provided effective data and experimental cases to support future safety production management and object-detection algorithm research in the construction industry.

    Figures and Tables | References | Related Articles | Metrics
    Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification
    ZHU Chenxi, LU Yinan, WU Tieru, GONG Wenyong, MA Rui
    2026, 47(1): 78-89.  DOI: 10.11996/JG.j.2095-302X.2026010078
    HTML    PDF 8     2

    Traditional 3D point-cloud classification methods tend to suffer from insufficient generalization and catastrophic forgetting in Few-Shot Class-incremental Learning (FSCIL) scenarios. The pretrained vision-language model CLIP (Contrastive Language-Image Pre-training), which contains rich 2D shape priors, has been shown to effectively enhance 3D FSCIL performance. However, existing CLIP-based frameworks still lack flexibility and adaptability in multimodal feature extraction and fusion, which limits classification accuracy during incremental stages. To address these shortcomings, a 3D FSCIL approach with deeply fused multimodal features was proposed. An adaptive adapter based on gated units and residual blocks was introduced to achieve multi-scale feature alignment and redundancy suppression, and a multimodal global feature dynamic fusion module with self-attention was designed to adaptively adjust the weight allocation of different feature streams according to sample characteristics, thereby obtaining more consistent and complementary fused representations. Specifically, point clouds were rendered into multi-view depth maps, and features were extracted using both the original CLIP visual encoder and a CLIP encoder pretrained on depth maps, combined with point-cloud geometric features. After processing through the adaptive adapter, these features were fed into the attention-based fusion module and aligned with semantic features extracted by the CLIP text encoder for classification. In addition, contrastive learning loss, multi-view and geometric perturbation-based data augmentation strategies, and a memory-replay mechanism were incorporated to effectively mitigate overfitting and forgetting under few-shot conditions. Experiments on ShapeNet, ModelNet, and CO3D demonstrated that the proposed method consistently achieved higher accuracy across incremental stages compared with existing 3D FSCIL approaches, while significantly reducing both relative accuracy drop rates and maximum stage fluctuations.

    Figures and Tables | References | Related Articles | Metrics
    An image matching method for large viewpoint variation scenarios
    XIANG Mengli, HUANG Zhiyong, SHE Yali, DING Tuojun
    2026, 47(1): 90-98.  DOI: 10.11996/JG.j.2095-302X.2026010090
    HTML    PDF 6     3

    To address the significant decline in matching accuracy and the number of correspondences exhibited by existing image-matching methods under large viewpoint variations, an improved image-matching approach based on E-LoFTR was proposed. Firstly, based on a strategy of viewpoint rectification followed by fine-grained matching, a novel two-stage SIFT-based viewpoint-rectification module was proposed, which leveraged the viewpoint invariance of the Scale-Invariant Feature Transform (SIFT) algorithm and the geometric alignment capability of homography to enhance matching accuracy under large viewpoint variations. Then, a directional-gated attention mechanism was designed that employed a cascaded structure of multi-directional convolutions and dynamic gating to extract queries (Q), keys (K), and values (V). The injected geometric priors significantly enhanced the model’s robustness. Lastly, to mitigate information loss during the upsampling of fused features, the Fusion-DySample module was incorporated to further improve performance. Experimental results on the public MegaDepth dataset showed that our method achieved relative pose estimation AUCs of 57.1%, 72.7%, and 83.9% under rotation error thresholds of 5°, 10°, and 20°, respectively, outperforming E-LoFTR by 0.7%, 0.5%, and 0.4%. On the newly constructed NewMega dataset based on MegaDepth and on a private industrial dataset, our method also demonstrated substantial improvements in both the number of matches and matching accuracy.

    Figures and Tables | References | Related Articles | Metrics
    BSD-YOLO: a small target vehicle detection method based on dynamic sparse attention and adaptive detection head
    YANG Biao, WANG Xue, GUAN Zheng, LONG Ping
    2026, 47(1): 99-110.  DOI: 10.11996/JG.j.2095-302X.2026010099
    HTML    PDF 9     3

    In intelligent traffic monitoring systems, small target vehicle detection in complex scenes faces challenges such as low feature resolution, severe occlusion interference, computational redundancy, and insufficient bounding-box regression accuracy. To balance detection accuracy with deployment efficiency on edge devices, an improved YOLOv8 framework based on dynamic sparse attention and a lightweight dual-branch structure was proposed. The method first introduced a bidirectional routing sparse attention mechanism (ReBiAttention) that enhanced the retention of shallow features for small targets by dynamically filtering key features through a two-level routing strategy. Subsequently, GSConv and VoV-GSCSP modules were integrated to reduce computational cost while dynamically adjusting multi-scale feature weights. An improved DynamicHead was applied for multi-task adaptive optimization, and a modified ShapeIoU loss function with shape- and scale-aware weighting was employed to improve localization accuracy. Experiments on the UA-DETRAC dataset showed that, relative to baseline YOLOv8n, Precision, Recall, and mAP@0.5 increased by 8.739%, 1.685%, and 7.225%, respectively, while the parameter count decreased by 4.3%. This method provided an efficient solution for accurate detection of small-target vehicles in complex traffic scenarios.

    Figures and Tables | References | Related Articles | Metrics
    Neural radiation field reconstruction based on feature point-guided interference identification
    REN Hao, LI Shaobo, GONG Mao, WANG Bo
    2026, 47(1): 111-119.  DOI: 10.11996/JG.j.2095-302X.2026010111
    HTML    PDF 5     3

    To address the challenge of achieving high-quality 3D reconstruction with Neural Radiation Fields (NeRF) under the influence of occluding objects, a method based on the collaborative optimization of Structure-from-Motion (SfM) and the Segment Anything Model (SAM) was propose. Building upon the Scale-Invariant Feature Transform (SIFT) algorithm within the SfM reconstruction process, geometric inconsistencies in dynamic scenes were leveraged for feature point identification and matching. Unmatched feature points were treated as dynamic occluders, guiding the SAM model—capable of point-guided segmentation—to perform dynamic occluder segmentation and generate a static scene mask. Based on the segmentation results, mask-aware volumetric rendering was used to predict colors and a quadruple loss function was established: comprising reconstruction loss, structural consistency loss, adversarial loss, and self-supervised patching loss. These objectives were jointly optimized to constrain the color output in patched regions. After iterative training, consistent restoration of geometric structure and appearance in occluded areas across multiple viewpoints was achieved. The radiometric integrity was preserved while occlusions were removed. Validation on public dynamic scene datasets demonstrated that the mask-based volumetric rendering combined with joint optimization produced an average Peak Signal-to-Noise Ratio (PSNR) improvement of 5.24 dB over baseline models and mainstream occlusion removal methods, alongside a 35% reduction in Learned Perceptual Image Patch Similarity (LPIPS). This approach established a new paradigm for 3D reconstruction in complex dynamic environments.

    Figures and Tables | References | Related Articles | Metrics
    Defect detection of aero-engine blades based on dynamic vision sensors
    ZHANG Xingshun, CHEN Haiyong
    2026, 47(1): 120-130.  DOI: 10.11996/JG.j.2095-302X.2026010120
    HTML    PDF 8     6

    Aeroengine blades are core components of engines; tiny surface defects can lead to serious safety accidents. Traditional vision detection technology is limited by motion blur, low dynamic range, background redundancy, and so forth. To address these challenges, a method of aeroengine blade defect detection based on Dynamic Vision Sensor (DVS) was proposed. Dynamic vision sensor produced data in an asynchronous event-stream format, and were therefore referred to as event camera, which exhibited the advantages of large dynamic range, high frame rate, and strong ability to capture small targets. Firstly, a defect detection platform based on DVS was built, and its imaging characteristics and advantages were explored. On this basis, the first Event-based Defect Detection Dataset of Aeroengine Blade (EDD-AB) dataset based on DVS was constructed, covering nearly 6 000 images of scratches, point marks and edge damage, with approximately 12 000 finely annotated target labels. The dataset was released as open source (link: https://github. com/NiBieZhouMei5520/EDD-AB.git). Furthermore, a multi-scale defect-detection algorithm based on asynchronous event-stream frame aggregation (AEAF-ABDD) was proposed: event streams were visualized through frame aggregation technology using a fixed time window; a Multi-Resolution Adaptive Feature Pyramid Network (MRAFPN) was developed to enhance multi-scale defect feature extraction capability; a lightweight SimAM attention mechanism was incorporated to strengthen focus on key regions; a star-convolution module (StarNet) was fused to improve the efficiency of high-dimensional nonlinear feature mapping, enabling accurate detection of multi-scale defects on complex curved workpieces. Experiments demonstrated that AEAF-ABDD achieved a mean Average Precision (mAP) of 97.7% on the EDD-AB dataset and a detection speed of 105 frames per second, substantially outperforming mainstream algorithms. An efficient solution for automated quality inspection of highly reflective curved workpieces was thereby provided, promoting the application of DVS in the field of industrial inspection.

    Figures and Tables | References | Related Articles | Metrics
    A dynamic pruning approach for cross-domain few-shot image generation
    LI Shiliang, FANG Qiang, WANG Yihua, SHI Yifei, WANG Zhuo, LI Zeyu, XIE Yunfei, WANG Jia
    2026, 47(1): 131-142.  DOI: 10.11996/JG.j.2095-302X.2026010131
    HTML    PDF 7     6

    Few-shot image generation has important application value in fields such as medical imaging and artistic creation. In recent years, significant research progress has been made in this task, with mainstream approaches typically relying on transferring generative models pretrained on large-scale source domain datasets to target domains to mitigate data-scarcity challenges. However, when substantial semantic gaps exist between source and target domains, direct transfer often introduced incompatible source-specific features, degrading image realism and style consistency. Although existing methods have removed redundant features via static pruning strategies, such as fixed-threshold filter pruning, they struggle to adapt to the dynamic evolution of features across different layers of deep networks, often resulting in the mistaken removal of general low-level features while retaining redundant high-level ones, thereby affecting the adaptation performance and generation quality of the model. To address this, a dynamic pruning method based on filter-importance estimation was proposed. Specifically, the method continuously tracked the changes in Fisher information of each layer’s filters during training to evaluate their importance for image generation quality. Based on the Fisher information, a cumulative importance weight-based adaptive pruning mechanism was constructed to dynamically determine the pruning ratio for each layer, enabling more precise removal of redundant or incompatible filters while preserving general structural semantic information. Experiments were conducted on several representative few-shot target domains, and results showed that the proposed method significantly outperformed existing approaches in terms of image quality (Frechet Inception Distance, FID) and image diversity (Intra-domain Learned Perceptual Image Patch Similarity, Intra-LPIPS). In target domains exhibiting significant semantic differences from the source domain, the proposed method achieved superior FID scores compared with the current state-of-the-art methods, demonstrating its stability and superiority for cross-domain few-shot image generation tasks.

    Figures and Tables | References | Related Articles | Metrics
    Computer Graphics and Virtual Reality
    A point cloud classification and segmentation algorithm based on lightweight networks and weighted RF
    ZHAO Fuqun, HAO Hanzhu, YU Jiale
    2026, 47(1): 143-151.  DOI: 10.11996/JG.j.2095-302X.2026010143
    HTML    PDF 6     0

    To address the issues of high computational cost and complex network models in point cloud classification and segmentation methods, a point cloud classification and segmentation algorithm based on lightweight networks and weighted Random Forest (RF) was proposed. The algorithm achieved efficient classification and segmentation in a hierarchical manner. Firstly, to address the issues of multiple layers and complex computation in traditional neural networks, a lightweight neural network was constructed to extract point cloud features such as global shape, inter-regional relationships, curvature, normal vector, and color, thereby achieving rapid rough classification and segmentation of point clouds. Then, to address data imbalance, an adaptive classification and segmentation strategy was designed. By introducing a weighted RF and combining inconsistency-measurement screening with dynamic-weighting optimization mechanisms, fine classification and segmentation of point clouds were achieved. The algorithm conducted classification experiments on the ModelNet40 dataset and segmentation experiments on the Semantic3D dataset and outdoor-scene point-cloud data. The results showed that compared with Local Geo-Transformer, PointNeXt, and FastPointNet++, classification and segmentation accuracy increased by approximately 1.9%, 1.6%, and 1.7%, respectively, while classification and segmentation time was reduced by approximately 40%, 30%, and 20%, respectively. Thus, the proposed point-cloud classification and segmentation algorithm based on lightweight networks and weighted RF can effectively reduce the training time of the model and improve the efficiency of classification and segmentation while maintaining high accuracy, making it an effective point cloud classification and segmentation algorithm.

    Figures and Tables | References | Related Articles | Metrics
    Conservative enclosing box construction algorithm based on implicit geometric coding with Lipschitz linear constraints
    ZHANG Bingyu, KUANG Liqun, XIONG Fengguang, SUN Fanshu, JIAO Shichao
    2026, 47(1): 152-161.  DOI: 10.11996/JG.j.2095-302X.2026010152
    HTML    PDF 4     0

    Currently the mainstream enveloping box methods are widely used in 3D scene rendering, ray tracing, and collision detection tasks; however, these methods suffer from the problems of low space utilization and insufficient fitting accuracy in fitting complex geometries, which are difficult to ensure strict conservatism and still have room for improvement in reducing false detection rates. To address these issues, a conservative bounding-box construction method combining implicit geometric coding and Lipschitz constraints was proposed. Implicit geometric coding mapped the input coordinates to a high-dimensional space via position coding, thus capturing local and global geometric information and improving bounding-box adaptability. A trainable Lipschitz-constrained linear layer was introduced to dynamically adjust Lipschitz constants control gradient changes, and Lipschitz regularization loss was combined with dynamically weighted cross-entropy loss to reduce the FP rate while optimizing the boundary fitting. The experimental results demonstrated that the method can achieve a false-negative rate of 0 on multiple 3D models and reduce the false-detection rate by up to 3.1% compared to the benchmark method, and improve the single-ray query method by 1.7 ms, providing a highly efficient and robust solution for high-precision conservative bounding box fitting.

    Figures and Tables | References | Related Articles | Metrics
    Digital Design and Manufacture
    Research on assembly accuracy prediction of complex products considering rough surfaces
    WANG Gangfeng, ZHANG Huan, YANG Yingying, LIU Yitao, GUO Yanyun, YUE Ping, SUN Yanhui
    2026, 47(1): 162-172.  DOI: 10.11996/JG.j.2095-302X.2026010162
    HTML    PDF 7     9

    Given that the impact of rough surfaces on assembly accuracy had been insufficiently considered in the existing assembly accuracy prediction for complex products, leading to inaccurate precision prediction and limited practical assembly applicability, an assembly-accuracy prediction method considering rough surfaces was proposed. Firstly, an assembly-accuracy information model was constructed to express mating feature, geometric tolerance, and roughness information. Based on the model, an assembly-precision knowledge graph was constructed. Secondly, a geometric-tolerance representation model was established based on the Small-Displacement Torsor (SDT) theory; a simulation method for rough surfaces of plane and cylindrical parts as well as a determination method of SDT expressions were studied. Thirdly, the error-propagation path of the assembly was determined according to the assembly sequence, and a pose-relationship graph for the assembly was constructed. Then, the assembly-precision prediction was achieved using a Jacobian-torsor model. Finally, the feasibility of the method was verified using the crank-connecting-rod mechanism of a specific construction-machine model as an example. The simulation results demonstrated that the method could achieve accurate assembly-precision prediction and provided valuable guidance for practical assembly operations.

    Figures and Tables | References | Related Articles | Metrics
    Generative digital twin modeling based on large models
    LIANG Shenglong, FAN Qiuxia
    2026, 47(1): 173-178.  DOI: 10.11996/JG.j.2095-302X.2026010173
    HTML    PDF 6     1

    To address the challenges in integrating Digital-Twin (DT) technology with large-scale generative models in industrial design, a CAD-LDT digital-twin modeling framework based on generative foundation models was proposed. The framework adopted a triadic architecture consisting of a physical-entity module, an intelligent generation module, and a virtual-entity module, and innovatively incorporated multi-modal data fusion mechanisms and domain-knowledge constraints to enable autonomous generation of parameterized CAD models from physical-entity descriptions. Utilizing LLaVA-7B and LLaMA-7B as backbone models, the framework employed LoRA-based lightweight adapters to achieve cross-modal alignment between visual and textual features, and introduced a constraint encoder that transformed geometric tolerances and physical rules into structured JSON objects. To enhance the mathematical consistency of spatial transformations, Lie-group algorithms were adopted for the optimization of rigid-body transformations, while a geometric-weight binning strategy was proposed to discretize complex assembly relationships. Moreover, a spatiotemporal-decoupled generation strategy was designed to jointly optimize spatial layout and assembly sequencing. Experimental results on the DeepCAD dataset indicated that the proposed framework achieved an Intersection- over-Union (IoU) of 83.6%, a constraint satisfaction rate of 91.3%, and a 26.5% improvement in generation efficiency, significantly outperforming existing baseline models. Further ablation studies confirmed the critical contributions of multi-modal fusion, constraint encoding mechanisms, and Lie-group optimization to modeling performance, providing a novel DT modeling paradigm for intelligent manufacturing with demonstrated value in parametric design and assembly process optimization.

    Figures and Tables | References | Related Articles | Metrics
    MBSE-based conceptual design method for complex forming equipment
    WANG Boya, WANG Shaozong, YANG Wanran, ZHOU Xingwei, HOU Liang, XIONG Chengyue
    2026, 47(1): 179-193.  DOI: 10.11996/JG.j.2095-302X.2026010179
    HTML    PDF 5     0

    The traditional development approach for complex forming equipment typically relies on Document-Based Systems Engineering (DBSE), which often leads to issues such as protracted development cycles due to inadequate requirement analysis, incomplete requirement coverage caused by textual ambiguity, and equipment development delays lagging behind technological iterations. These shortcomings frequently result in final designs that fail to meet target performance metrics and require inefficient, repetitive modifications. Therefore, in the conceptual design stage of complex forming equipment, and drawing on the U.S. Department of Defense Architecture Framework (DoDAF) combined with Model-Based Systems Engineering (MBSE), an MBSE-based conceptual-design method for complex forming equipment was proposed. This method utilized five viewpoints, including panoramic viewpoint, capability viewpoint, operational viewpoint, systems viewpoint, and standards viewpoint, as entry points for the conceptual design of complex forming equipment. Through multi-perspective analysis, the method performed top-level requirements acquisition, requirements refinement analysis, functional analysis, and system modeling across four design levels. Eleven types of models were established using the Systems Modeling Language (SysML), enabling digital and procedural expression in the conceptual design stage of complex forming equipment. Finally, superplastic-forming equipment was used as a representative example to demonstrate the application of this design method. The application of the method addressed the shortcomings of traditional design approaches and demonstrated that the method provided effective guidance for the forward development of complex forming equipment.

    Figures and Tables | References | Related Articles | Metrics
    BIM/CIM
    Research of parametric modeling methods for isolated foundation based on Revit secondary development
    DENG Peng, TAN Wenzheng, LUO Huiming, LI Shuai, YANG Bin
    2026, 47(1): 194-203.  DOI: 10.11996/JG.j.2095-302X.2026010194
    HTML    PDF 4     3

    With the widespread application of Building Information Modeling (BIM) technology in the field of engineering design, the Revit platform-driven 3D forward design has developed into a relatively mature solution for superstructure. However, as a critical structural component, isolated foundations still face challenges such as low efficiency and poor information integrity in 3D modeling and drawing generation. Moreover, due to the closed data interfaces of mainstream structural analysis software, it is extremely challenging to directly extract reinforcement information from fundamental database files and synchronize to the properties of Revit elements. To address this limitation, a new parametric modeling algorithm for isolated foundations based on AutoCAD layer and text recognition was proposed, utilizing the Revit Application Programming Interface (API) and the Model-View-ViewModel (MVVM) design pattern. By importing foundation layout and reinforcement annotation drawings generated by structural analysis software (e.g., YJK) into Revit, the algorithm calculated the center coordinates and planar dimensions of foundations using layer-recognition methods. Then, a matching logic was established between foundations and reinforcement annotations according to their relative positions. Subsequently, text recognition was employed to extract the corresponding annotations’ numbering, height, and reinforcement information, which were then automatically written into the family properties of the model. Additionally, the extraction of model attributes and geometric face references enabled automatic annotation of foundation dimensions and reinforcement. Finally, the proposed method was applied to the parametric modeling and drafting of isolated foundations for a self- built cold storage factory and was compared against conventional methods. The results demonstrated that this algorithm significantly improved the efficiency of three-dimensional modeling and annotation drawing for isolated foundations, while exhibiting excellent compatibility with calculation files exported from PKPM software.

    Figures and Tables | References | Related Articles | Metrics
    Research on dynamic voxelization-based collision detection in construction scenarios
    LIN Hao, WU Zhiming, JIN Jilan
    2026, 47(1): 204-215.  DOI: 10.11996/JG.j.2095-302X.2026010204
    HTML    PDF 14     3

    Among all safety accidents in construction scenarios, collision accidents are regarded as one of the most common types of injury. To effectively prevent and monitor the occurrence of collision accidents, the computer graphics analysis technology has been used to assist collision detection and analysis; however, limitations remain in balancing the real-time performance with high precision of detection. To address this, a collision-detection method based on dynamic voxelization was proposed. This method integrated the generation of dynamic spatial voxel tree with the dynamic spherical voxelization calculation of resources to construct a collision detection and analysis mechanism. The core ideas are as follows: ① Based on the crowding-degree threshold, the space was recursively divided to generate a dynamic voxel tree, effectively filtering out non-collision risk areas. ② The side length of voxel units were dynamically calculated according to the relative distance between resources and resource volume, realizing the adaptive adjustment of voxel granularity. ③ Spherical voxels were used instead of traditional cubic voxels to avoid the computational burden of non-axis-aligned detection. ④ A hollowing-out procedure was introduced to eliminate internal invalid voxels, further optimizing detection efficiency. This method can accurately capture resource interactions in complex dynamic construction environments, significantly improving detection accuracy and optimizing computational efficiency. Experimental results showed that compared with traditional methods, the proposed method significantly improved the detection accuracy, with precision and accuracy reaching 94.64% and 96.67%, respectively. In terms of collision detection time, it was more efficient than most existing methods, with a calculation speed increase of at least about 11.36%. At the same time, the study analyzed the impact of key parameters such as voxel-tree depth, root-node size, and voxel side length on performance, and analyzed the consumption of CPU resources and memory resources by the method in scenarios of different scales. The consumption was within an acceptable range, verifying the applicability of the method in construction scenarios. The method provided an effective new idea of information processing for enhancing the intelligent level of construction safety management.

    Figures and Tables | References | Related Articles | Metrics
    Intelligent analysis of design about roof equipment inspection paths based on graph theory and improved A* algorithm
    HE Ruiqi, CAO Ying, XU Jinglin, YU Fangqiang
    2026, 47(1): 216-222.  DOI: 10.11996/JG.j.2095-302X.2026010216
    HTML    PDF 1     0

    In roof engineering design, the rationality of equipment maintenance circulation routes directly impacts maintenance efficiency and safety. Traditional design methods often rely on empirical judgment, making it difficult to sufficiently evaluate the rationality of these routes during the design phase. To address this, a hybrid algorithm combining graph theory with an improved A* algorithm was developed. Integrated with Building Information Modeling (BIM) technology, an intelligent analysis and design tool for roof equipment maintenance circulation routes was created to address the shortcomings of traditional design via digital model-based route analysis. First, the roof was converted into a weighted equivalent grid map using collision detection and an octree algorithm. Next, an improved A* algorithm was employed to optimize the maintenance paths, comprehensively considering equipment collision volumes and spatial constraints to calculate the optimal maintenance circulation route and evaluate the rationality of detailed route-area design. Finally, the intelligent analysis and design tool based on this algorithm was tested on an actual project. Experimental results demonstrated that the algorithm accurately revealed potential spatial conflicts and irrational layouts, providing data to support design optimization, and enhanced design rationality and operability; it also improved efficiency by more than five times compared with traditional manual design. The intelligent analysis tool based on this algorithm is currently in use in several projects by the Shanghai Construction (No.4) Group Co., Ltd.

    Figures and Tables | References | Related Articles | Metrics
    Industrial Design
    How do robots attract children? The role of appearance, motion, and voice as multisensory features in early-stage interactions
    LI Yi, CAO Chengcai, SONG Zhangtong, LI Zuoqi, LI Xiao, LI Hesen
    2026, 47(1): 223-233.  DOI: 10.11996/JG.j.2095-302X.2026010223
    HTML    PDF 8     3

    With the rapid development of artificial intelligence technology, multimodal robots are playing an increasingly important role in preschool children’s education, entertainment, and daily life. Existing studies have primarily focused on the effects of single sensory cues of robots on children’s perception, while systematic research on multisensory integration effects remains limited. To explore how robots’ multimodal features jointly influence children’s emotional preferences and visual attention, 318 children aged 4-6 years were recruited to participate in an eye-tracking experiment. The experiment adopted a 2 (appearance features: humanoid vs. animal-like) × 3 (voice guidance: male voice, female voice, none) × 2 (gesture guidance: present vs. absent) mixed factorial design, with robot appearance features (humanoid vs. animal-like) and behavioral features (voice and gesture guidance) as independent variables, and children’s emotional preferences and eye-tracking indicators as dependent variables, thereby systematically examining the effects of multimodal features on child users. The results showed that, in terms of appearance features, no significant difference was observed in subjective preference ratings between humanoid and animal-like robots. However, humanoid robots attracted longer total fixation duration, more fixation counts, and shorter first-fixation latency, indicating superior attention-related performance compared with animal-like robots. Children were more readily attracted to humanoid robots during the initial stage of visual contact, and anthropomorphic design showed greater advantages in sustaining children’s attention. In terms of behavioral features, robots with gesture guidance received significantly higher subjective preference ratings than those without gestures, and also elicited longer total fixation duration and more fixation counts. Robots with female voices received slightly higher subjective preference ratings than those with male voices, and both were significantly preferred over robots without voices. Robots with male voices had slightly longer total fixation duration than those with female voices, and both significantly outperformed robots without voices. The difference in fixation counts between male- and female-voice robots was not significant, but both attracted significantly more fixations than robots without voices. Robots with gesture guidance and voice (especially female voice) performed better in subjective ratings and visual attention allocation, suggesting that behavioral features substantially enhanced children’s emotional preferences and interactive experiences. Furthermore, the effects of appearance and behavioral features on children’s emotional preferences and visual attention were relatively independent, and no significant interaction effects were observed. This study revealed the mechanisms through which robot appearance and behavioral features influenced preschool children’s emotional preferences and visual attention, thereby providing scientific evidence for designing child-oriented robots that align with users’ emotional needs.

    Figures and Tables | References | Related Articles | Metrics