Journal of Graphics

Review of digital technology methods for grottoes cultural relics in China

TAN Bingyu, XUE Yanmin, QIN Song

2025, 46(3): 479-490. DOI: 10.11996/JG.j.2095-302X.2025030479

HTML

PDF 125 74

As a brilliant pearl in China’s historical and cultural heritage, grotto art embodies rich historical and cultural information and possesses immense historical, artistic, and scientific value. China’s digital conservation of grotto artifacts began in the 1980s; with the advancement of social science and technology, technical methods for digitizing these relics were continually updated and improved. However, comprehensive reviews on the technical methods for digitizing such grotto cultural relics remain scarce. In order to address this research gap and support the digitalization of grotto artifacts, relevant research papers and typical cases were collected, organized, and summarized according to the general workflow of grotto cultural relic digitalization. The three-dimensional data collection technology and corresponding collection cases for grotto cultural relics were systematically categorized; various digital restoration methods and reproduction technologies of grotto cultural relics were outlined; transformation cases of grotto cultural relics in digital display were introduced. Finally, the future application of artificial intelligence in cultural relic protection were discussed, thereby envisioning the future development trend of the application.

Figures and Tables | References | Related Articles | Metrics

Data-efficient video retrieval with contrastive learning

LING Fei, YU Jingtao, ZHU Zheyan, LUO Jian, ZHU Jixiang, CHEN Xianke, DONG Jianfeng

2025, 46(3): 491-501. DOI: 10.11996/JG.j.2095-302X.2025030491

HTML

PDF 83 75

The performance of video retrieval systems largely depends on annotated data, and a key challenge is to reduce reliance on expensive manual annotation while enhancing performance. To address this issue, a data-efficient video retrieval method based on contrastive learning was proposed, which incorporated two key optimization strategies. First, to construct more diverse and effective learning data, a content-aware feature-level data augmentation method was introduced, utilizing a frame-based similarity K-nearest neighbor algorithm to capture deep semantic information and reduce dependence on annotated data. Second, by extracting long segments and their internal short segments from videos, a long-short dynamic sampling strategy was designed to construct positive sample pairs with multi-scale information for more effective contrastive learning, while the sampling lengths of long and short segments were dynamically adjusted to improve data utilization. Experimental results on the SVD and UCF101 datasets demonstrated that the proposed method significantly outperformed existing retrieval models. Extensive ablation studies confirmed that content-aware feature-level data augmentation enhanced model adaptability, and long-short dynamic sampling benefits not only self-supervised learning but also improved the performance of semi-supervised models.

Figures and Tables | References | Related Articles | Metrics

Large scene reconstruction method based on voxel grid feature of NeRF

WANG Daolei, DING Zijian, YANG Jun, ZHENG Shaokai, ZHU Rui, ZHAO Wenbin

2025, 46(3): 502-509. DOI: 10.11996/JG.j.2095-302X.2025030502

HTML

PDF 92 58

To address the problems of blurred rendering and missing details problems in neural radiation fields for large scenes, a rendering method suitable for large scenes was proposed that was guided by voxel mesh features and driven by ray sampling. This method can effectively enhance the accuracy of 3D models, which was particularly crucial for large-scale scene reconstruction and can be applicable to various scenarios such as architectural design and urban planning. Firstly, grid processing was performed on the reconstructed scene by allocating scene boundaries based on scene size and refining voxel units. Secondly, tensor decomposition was conducted on the information contained in the voxels, and gridded scene features were extracted. Neural radiance fields then focused on sampling based on the extracted features. Finally, the sampling results were fed into a neural network, and a Multilayer Perceptron renderer converted the features into color and density information, synthesizing view rendering results from various new perspectives. Multiple datasets were used for validation in the experiment. The experimental results demonstrated that, compared with other methods, the proposed approach achieved an average improvement of approximately 11% in PSNR, an average increase of about 12% in SSIM, and an average reduction of around 15% in LPIPS, with significantly enhanced visual effects.

Figures and Tables | References | Related Articles | Metrics

DCSplat: Gaussian splatting with depth information constraints under sparse viewpoints

HUANG Zhiyong, SHE Yali, HUA Xifeng, XIANG Mengli, YANG Chenlong, DING Tuojun

2025, 46(3): 510-519. DOI: 10.11996/JG.j.2095-302X.2025030510

HTML

PDF 138 51

To address the challenges in sparse-view 3D reconstruction, particularly reconstruction holes and accuracy degradation caused by insufficient Gaussians, a sparse-view 3D reconstruction method based on 3D Gaussian Splatting (3DGS) technology was proposed, namely DCSplat. This method utilized depth constraints to adaptively complete the point cloud required for 3DGS initialization and designed a random structural similarity loss to achieve fast and high-precision reconstruction of sparse-view images. The core of the method lay in the use of a proposed feedforward neural network to improve the sparse point cloud generated during the structure from motion (SFM) process. Firstly, a pre-trained monocular depth estimation network was used to predict depth information from the images. Secondly, a projection matrix was constructed using camera parameters to project the sparse point clouds onto the images, thereby establishing a correlation between point cloud’s z-values and depth values. Furthermore, a deep neural network was constructed and trained to map the depth values of image pixels to point cloud z-values, which was used to optimize and complete the point cloud information required for 3DGS. Additionally, to overcome the limitations of point-by-point optimization loss in 3DGS, a random structural similarity loss function was introduced, treating multiple Gaussians corresponding to pixels as a whole for processing. This enabled global consideration of the point cloud structure, thereby promoting more coherent and accurate 3D reconstruction. The test results of DCSplat on the local light field fusion (LLFF), large scale multi view stereotaxis evaluation (DTU), and unbounded anti aliasing neural radiance fields (MipNeRF360) standard datasets demonstrated that it achieved or even surpassed the performance level of existing methods on key evaluation indicators, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and learned perceptual image patch similarity (LPIPS), effectively improving the reconstruction quality. In addition, this method completed point cloud completion based on depth constraints, optimized reconstruction quality from global to local scales using depth information, and exhibited significant performance improvements across multiple indicators, thereby demonstrating certain application potential.

Figures and Tables | References | Related Articles | Metrics

TCPColor: a Chinese painting color scheme recommendation system based on text-to-image generation model

ZHANG Di, ZHANG Wenan, JIANG Zhide, WU Aixia, KONG Hao, GUO Xian, CHEN Wei

2025, 46(3): 520-531. DOI: 10.11996/JG.j.2095-302X.2025030520

HTML

PDF 56 33

Traditional Chinese painting (TCP) is a unique form of painting specific to China. Exploring the use of color schemes based on TCP holds significant importance for modern designers in integrating traditional art with contemporary design concepts. However, limited research was conducted on color recommendation systems based on TCP knowledge, and no effective solutions have yet been provided for color scheme retrieval and recommendation based on multi-dimensional features such as themes, objects, and artistic conceptions. A traditional Chinese painting color scheme recommendation system named TCPColor was proposed. Based on the Taiyi Chinese text-to-image generation model, this system fine-tuned the model using Song Dynasty TCP data annotated by experts. Then, it employed visual saliency algorithms, K-Means clustering, and color-distance-based palette matching on the generated images to produce color schemes that reflected the style of traditional Chinese painting. The effectiveness of the color extraction method was verified through ablation experiments, while objective color analysis was used to evaluate the distinctiveness of the generated color schemes and their similarity to traditional Chinese painting color schemes. In collaboration with TCP experts and volunteers, case studies, expert evaluations, and user research were conducted, which demonstrated the practicality of the system in recommending color schemes.

Figures and Tables | References | Related Articles | Metrics

Vibration damper defect detection algorithm based on improved YOLOv8

NIU Hang, GE Xinyu, ZHAO Xiaoyu, YANG Ke, WANG Qianming, ZHAI Yongjie

2025, 46(3): 532-541. DOI: 10.11996/JG.j.2095-302X.2025030532

HTML

PDF 88 89

During drone inspections of transmission lines, the aerial images of vibration dampers exhibited varying target scales and complex backgrounds, which can easily lead to missed or false detections. To address the limitations of existing object detection algorithms in handling complex backgrounds and multi-scale target detection, an improved YOLOv8-based detection algorithm for identifying defects in vibration dampers was proposed. Firstly, to enhance the model’s ability to extract multi-scale features, a multi-scale feature extraction (MSFE) module was introduced, effectively expanding the model’s receptive field. Secondly, to suppress interference from complex backgrounds during the multi-scale feature fusion process, a space pyramid kernel attention (SPKA) module was designed to improve the model’s global awareness of the target. Lastly, to improve the detection capability for small target defects, a small target semantic information layer (STSIL) was added to the network, providing rich semantic information for small-scale targets that were easily overlooked in images. In the comparison experiments, the mAP⁵⁰ of the improved algorithm increased by 5.7% over the baseline model YOLOv8s, with AP⁵⁰ for normal, tilted, and fallen vibration dampers increasing by 3.4%, 4.5%, and 9.2%, respectively, demonstrating the effectiveness and superiority of the proposed algorithm in detecting defects in vibration dampers. The application of the proposed algorithm was expected to contribute to ensuring the safe and reliable operation of the power system.

Figures and Tables | References | Related Articles | Metrics

Line extraction and representation algorithm for RGB-D data

LIU Xin, LI Yang, FENG Shengjie, WU Xiaoqun

2025, 46(3): 542-550. DOI: 10.11996/JG.j.2095-302X.2025030542

HTML

PDF 43 24

To improve the accuracy and quality of feature line extraction, a novel algorithm for RGB-D data was proposed to address the challenges of distinguishing between color and geometric boundaries and resolving the discontinuity and roughness of feature lines represented by straight line segments. The proposed algorithm fully utilized the close coupling and complementary properties between RGB and depth images, integrating color and geometric information to enhance the quality of line extraction results. First, the algorithm extracted a dense set of geometric boundary feature points based on color, depth, normal vectors, curvature, and other geometric information, as well as the corresponding planar geometric features of the RGB-D input. Subsequently, the feature point set was optimized using sparse processing, and corner point information was incorporated to enhance the feature line representation. Lastly, the lines were fitted and represented by cubic B-splines, which offered compactness, continuity, and smoothness. During the curve fitting process, a heavy node setting was applied to ensure that the curve passed through the key corner points, thereby accurately representing the trend of the recovered feature line. In order to verify the effectiveness of the proposed algorithm, experiments were performed on both self-collected and publicly accessible RGB-D datasets. Comparative evaluations against existing algorithms demonstrated that the proposed algorithm achieved an extraction precision of 0.82, a recall rate of 0.59, and an intersection-to-union ratio of 0.54 on the NYU v2 dataset. The results indicated that the algorithm can effectively extract continuous and smooth geometric lines from low-quality RGB-D inputs afflicted with deep noise.

Figures and Tables | References | Related Articles | Metrics

The next best view navigation technology based on RGB features

ZHOU Zheng, DAI Yaqiao, YI Renjiao, LAN Long, ZHU Chenyang

2025, 46(3): 551-557. DOI: 10.11996/JG.j.2095-302X.2025030551

HTML

PDF 32 12

Neural radiance field (NeRF) has shown excellent performance in reconstructing 3D scenes from 2D images. Using 2D images as training data, the 3D structure of scenes could be reconstructed and new views could be rendered with high quality. Although NeRF is very effective in reconstructing 3D scenes, issues of slow training speed and long inference time are encountered, and the sample quality is closely related to the quality of 3D scene reconstruction. In order to address the challenge of high-quality 3D reconstruction of NeRF under conditions of low sample quality, two sets of NeRFs with different hash codes were employed to learn the same scene and to evaluate the gap between the information gain of candidate views to guide view sampling. A new framework of Next Best View navigation technology based on RGB features was proposed. This framework exhibited strong robustness with sparse training data, was capable of capturing the next best view with high information gain through RGB feature evaluation, and optimized NeRF training, thereby improving the quality of new view synthesis with a minimal number of additional views. By optimizing the NeRF training process, the network convergence speed was increased by approximately 10 times, and the memory usage was reduced by 39.8%. A large number of experiments have verified the effectiveness and robustness of the proposed model.

Figures and Tables | References | Related Articles | Metrics

Research on multimodal text-visual large model for robotic terrain perception algorithm

SUN Hao, XIE Tao, HE Long, GUO Wenzhong, YU Yongfang, WU Qijun, WANG Jianwei, DONG Hui

2025, 46(3): 558-567. DOI: 10.11996/JG.j.2095-302X.2025030558

HTML

PDF 63 40

A terrain segmentation algorithm based on the fusion of information from multimodal text-visual large models was proposed to enhance the intelligent perception capability of robots in dynamic and complex environments. The algorithm integrated simple linear iterative clustering (SLIC) for image data preprocessing, contrastive language-image pre-training (CLIP) and segment anything model (SAM) for mask generation, and Dice coefficient for post-processing. Initially, the original input image was preprocessed using SLIC to obtain image segmentation blocks, and the quality of subsequent masks was improved by adding prompt points, which significantly enhanced terrain classification accuracy. Subsequently, the CLIP large model, which has been pre-trained on text-image data, was used to match the input visual images with predefined terrain text information, leveraging its interpretability and zero-shot learning capabilities to generate sets of terrain prompt points. The SAM large model then generates masked data with semantic labels based on these sets, and the Dice coefficient was applied in post-processing to select usable masks. Using the Cityscapes dataset as a terrain segmentation sample, the superiority of the proposed algorithm over mainstream segmentation algorithms under both supervised and unsupervised learning frameworks was validated. Without the need for labeled data, the algorithm achieved a mask generation rate of 76.58% and an IoU (intersection over union) of 90.14%. For the terrain perception task of a quadruped robot, a U-net encoder/decoder network quantification validation module was added. Using the generated masks as a dataset, a lightweight terrain segmentation model was constructed, deployed on the edge computing device of the quadruped robot, and terrain segmentation experiments were conducted in a real-world environment. The experimental results demonstrated that the two mask optimization methods proposed in this paper improved the model’s mean IoU (MIoU) by 2.36% and.2.56%, respectively, with the final lightweight model achieving an MIoU of 96.34%, demonstrating reliable terrain segmentation accuracy. The segmentation algorithm effectively guided the robot to quickly and safely navigate from the starting point to the target location, while effectively avoiding non-geometric obstacles such as grasslands.

Figures and Tables | References | Related Articles | Metrics

Detection of apparent defects in a small sample of industrial products with category imbalance

WANG Suqin, DU Yujie, SHI Min, ZHU Dengming

2025, 46(3): 568-577. DOI: 10.11996/JG.j.2095-302X.2025030568

HTML

PDF 61 16

It has been demonstrated that generic target detection networks exhibit reduced overall detection accuracy when the number of defect samples is limited and the distribution of defect categories is uneven. Furthermore, the detection accuracy is markedly diminished for tail categories with particularly scarce defect samples. Based on these observations, an improved method for detecting apparent defects in industrial products using YOLOv8s was developed. Phantom convolution GSConv was employed in the Neck network to diminish the network complexity while simultaneously augmenting its nonlinearity, thus circumventing the potential issue of overfitting. Furthermore, the aggregation module VoV-GSCSP was employed to facilitate the extraction and fusion of features at varying levels, thereby enhancing the network’s capacity for feature extraction and fusion. A reweighted loss function was adopted to balance the training loss contributions across different categories of samples, increasing the loss contribution percentage for the tail category and thereby enhancing defect detection accuracy for the tail category. In comparison with the baseline model, the enhanced method achieved a mAP of 93.3% for the apparent defect detection accuracy in acupuncture needles, representing a 5.0% enhancement, and achieved a 9.1% improvement for broken needle defects. It should be noted that these improvements were achieved with the minimal number of samples. For medicinal plates, a mAP of apparent defect detection accuracy was achieved at 91.4%, representing a 2.6% improvement, and the improvement for dirty defects with the fewest samples was achieved at 3.2%. On the steel dataset, which featured a greater number of samples with uneven distribution, the overall defect detection accuracy improved by 2.6% in mAP. The experiments demonstrated that the enhanced methodology can markedly enhance the overall detection accuracy of apparent defects in industrial products under conditions of the limited number of defect samples and the imbalanced distribution of categories. Furthermore, it can markedly enhance the detection accuracy for categories with sparse samples, exhibiting excellent generalization capabilities.

Figures and Tables | References | Related Articles | Metrics

An edge and sematic-aware segmentation network for defect detection

CUI Lisha, SONG Zhiwen, JIANG Xiaoheng, MA Xin, CHEN Enqing, XU Mingliang

2025, 46(3): 578-587. DOI: 10.11996/JG.j.2095-302X.2025030578

HTML

PDF 54 17

To address challenges such as weak defect features, blurred boundaries, and significant scale variations, an edge and semantic-aware segmentation network for defect detection (ESNet) was proposed. Specifically, a dual-branch network was employed to learn semantic and detailed information of the image separately. To effectively utilize the complementary information from both branches, a bilateral attention guidance module (BAGM) was proposed. This module guided the detailed branch to learn contextual information via the channel attention of the semantic branch, while the spatial attention of the detailed branch guided the semantic branch to capture low-level detailed information. In the semantic branch, a multi-scale pyramid pooling module (MPPM) was designed to thoroughly learn and encode multi-level contextual information. Simultaneously, in the detailed branch, an edge-aware module (EAM) was incorporated, which used the boundary map predicted by the lower layers to guide the higher-level feature maps in learning boundary information. Finally, to effectively fuse high-level and low-level feature maps, a semantic-aware module (SAM) was proposed to alleviate the semantic misalignment problem in cross-scale feature fusion. Extensive experiments on public defect segmentation datasets NEU-Seg, MT-Defect, and MSD demonstrated the effectiveness of the proposed method.

Figures and Tables | References | Related Articles | Metrics

CLIP-based semantic offset transferable attacks on 3D point clouds

MA Yang, HUANG Lujie, PENG Weilong, WU Zhize, TANG Keke, FANG Meie

2025, 46(3): 588-601. DOI: 10.11996/JG.j.2095-302X.2025030588

HTML

PDF 55 21

Deep learning-based 3D point cloud understanding has received increasing attention in various applications such as autonomous driving, robotics, surveillance, etc., and the study of adversarial attacks on point cloud deep learning models helps to evaluate and improve their adversarial robustness. However, most of the existing attack methods are aimed at white-box attacks, generating adversarial samples that have very low success rate and are easily defensible against transferable attacks on black-box models with unknown model parameters. These methods only consider optimization in the geometric space to mislead specific classifiers and fail to essentially change the deep intrinsic semantic structure of point cloud data, resulting in their limited ability to transferable attacks under different classifiers. To address these issues, the proposed algorithm leveraged the rich semantic comprehension capability of large multimodal models to incorporate the semantic information of the point clouds into the attack, thereby ensuring that the adversarial samples diverged significantly from the original semantic attributes to a remarkable extent to enhance transferability. In addition, considering that the current adversarial samples with high attack transferability often exhibited insufficient imperceptibility, the algorithm integrated the above semantic adversarial attack into the spectral domain space, achieving a delicate balance between transferability and imperceptibility. Extensive evaluations demonstrated the 3D CLIP-based semantic offset attack (3DCLAT) can significantly improve the transferability of the adversarial samples and is more robust to defense methods.

Figures and Tables | References | Related Articles | Metrics

BGS-Net: fine-grained classification networks with balanced generalization and specialization for 3D point clouds

LIU Hongshuo, BAI Jing, YAN Hao, LIN Gan

2025, 46(3): 602-613. DOI: 10.11996/JG.j.2095-302X.2025030602

HTML

PDF 49 18

With the rapid advancement of 3D understanding and computer vision technologies, point cloud data have gained significant attention for their precise geometry and rich spatial information. In applications such as intelligent transportation, accurately classifying subtle differences in vehicle models is crucial, rendering fine-grained point cloud classification essential. However, existing methods focus heavily on task-specific networks that enhance classification by extracting local discriminative features, often neglecting the model’s generalization abilities. This resulted in decreased performance in diverse scenarios and unseen categories, especially in environments with noise, occlusions, or data distribution changes. To address these challenges, the balanced generalization specialization network (BGS-Net) was proposed as a two-stage framework that balanced generalization and specialization. In the first stage, BGS-Net employed mask distillation self-supervised learning with coupled masks to guide two student models in learning independent feature representations from a teacher model, thereby enhancing generalization. In the second stage, a balanced training strategy was implemented by freezing one encoder to preserve general features while fine-tuning the other encoder to extract locally discriminative features for task specialization. Experimental results demonstrated that BGS-Net significantly outperformed existing methods in fine-grained, meta-category, few-shot, and real-world classification tasks, thereby confirming its effectiveness in maintaining high generalization while achieving task specialization. This approach enhanced the applicability and robustness of point cloud classification in practical applications.

Figures and Tables | References | Related Articles | Metrics

Feedback-based iterative sampling denoising framework for point clouds with high-level noise

WANG Changchang, JIANG Kun, JIANG Kai, ZHANG Peng, SU Zhiyong

2025, 46(3): 614-624. DOI: 10.11996/JG.j.2095-302X.2025030614

HTML

PDF 61 18

During the 3D point cloud collection process, point cloud data is easily interfered by noise due to factors such as measurement anomalies, edge scattering, and the material properties of the measured object. However, the current depth point cloud denoising algorithms perform poorly under high-level noise conditions and can easily lead to smoothing of sharp features. To address this problem, a feedback-based sampling denoising framework for point clouds with high-level noise was proposed, with the aim of enhancing the performance of existing supervised denoising algorithms under high-level noise conditions. First, the noisy point cloud was denoised using the existing supervised noise network to obtain a pre-denoised point cloud. Second, the original noisy point cloud and the pre-denoised point cloud were jointly input into the sampling module to obtain a fusion point cloud containing geometric details and edge features. Third, the feedback-aware refinement network denoised the fused point cloud under the guidance of feedback from the pre-denoised point cloud to obtain the denoising result for this round of iterations. Finally, by using the denoised result from the current iteration as the feedback for the next round and as input to the sampling fusion module, the process was iterated progressively until the final denoising result was obtained. Experimental results demonstrated that this framework enhanced the performance of existing supervised point cloud noise denoising algorithms under high-level noise conditions, exhibiting excellent denoising effects and feature retention capabilities.

Figures and Tables | References | Related Articles | Metrics

3D human mesh reconstruction based on dual-stream network fusion

YU Bing, CHENG Guang, HUANG Dongjin, DING Youdong

2025, 46(3): 625-634. DOI: 10.11996/JG.j.2095-302X.2025030625

HTML

PDF 56 22

The reconstruction of 3D human body meshes holds significant application value in fields such as computer vision, animation production, and virtual reality. However, while most existing methods primarily focus on 3D human body reconstruction from single images, accurately and smoothly reconstructing 3D human motion from video data remains a challenging problem. To address this issue, a dual-stream network fusion architecture was proposed that utilized 3D human pose as an intermediary to achieve 3D human body mesh reconstruction from video data. Specifically, the proposed method comprised three components: First, a 3D pose estimation stream network was employed to estimate 3D joint points from the input video, providing precise joint information. Second, a temporal feature aggregation stream network was used to extract temporal image features from the video, capturing spatial motion and temporal pose characteristics. Finally, a fusion decoder was designed to regress the 3D mesh vertex coordinates by integrating the 3D joint points, temporal image features, and the mesh structure provided by the SMPL template. Experimental results demonstrated that the proposed method achieved superior prediction accuracy compared to MPS-Net. On the 3DPW dataset, the mean per joint position error (MPJPE) was reduced by 9.3%, and on the MPI-INF-3DHP dataset, the MPJPE was reduced by 9.2%. Moreover, the reconstructed results exhibited more visually plausible outcomes, demonstrating higher accuracy and smoothness.

Figures and Tables | References | Related Articles | Metrics

Implicit surface animation rendering based on temporal interval inversion

LI Xiaoli, ZHANG Kun, DU Zhenlong, CHEN Dong, SONG Shuang

2025, 46(3): 635-641. DOI: 10.11996/JG.j.2095-302X.2025030635

HTML

PDF 45 17

Animation rendering is an important branch of computer graphics that focuses on generating temporal dynamic image sequences. The common animation rendering methods involves per-frame rendering of geometric scenes along the timeline, which can easily lead to the waste of computing resources. To enhance the animation rendering efficiency, an implicit surface animation rendering method based on temporal interval reversal was proposed. This method exploited a sparse octree mesh to divide the implicit scene space, employed interval arithmetic for recursive subdivision of the implicit scene, and categorized the scenes containing implicit surfaces into interior, exterior, and surface areas. Interval arithmetic was utilized to limit the range of the time derivative, thereby localizing changes within the implicit scene. There were intervals of relative or absolute stillness between multiple consecutive implicit surfaces, and by selectively re-evaluating areas while maintaining the global error, the occlusion between implicit surfaces was achieved. Finally, parallelized threads were utilized to render implicit surface animation. Experimental results showed that the proposed method, compared with the frame-by-frame rendering method, achieved an acceleration of several tens of times while maintaining the rendering quality.

Figures and Tables | References | Related Articles | Metrics

Performance analysis of GPU-based parallel solvers for rigid body dynamics

LIANG Ruikai, LUO Xukun, GUO Yuzhong, HE Xiaowei

2025, 46(3): 642-654. DOI: 10.11996/JG.j.2095-302X.2025030642

HTML

PDF 43 18

Multi-body dynamic simulation involving rigid bodies and constraints plays a critical role in physical simulation and has widespread applications in engineering analysis, virtual reality, and game animation. Traditional rigid-body physics engines primarily rely on CPUs for computation. However, in modern computer graphics and real-time physics simulation, the parallel computing power of GPUs has been demonstrated to significantly enhance performance. This study explored the implementation of five Jacobian-based constraint solvers on the GPU and analyzed their performance and stability. These solvers included the projected Jacobi (PJ) solver, the combined projected Jacobi and nonlinear Jacobi (PJNJ) solver, the projected Jacobi with soft constraints (PJSoft) solver, the substep-based Jacobi (TJ) solver, and the substep-based Jacobi with soft constraints (TJSoft) solver. Benchmark tests revealed that the soft-constraint method provided smoother constraint impulse responses, while employing a substep strategy results in more stable solutions, particularly for high mass ratios and complex scenarios. Overall, this work offered a fresh perspective on evaluating GPU-based constraint solver strategies in multi-body simulations and served as an important reference for real-time physics simulation and interactive computer graphics.

Figures and Tables | References | Related Articles | Metrics

Visual analysis system for UAV path planning

HU Yue, SUN Zhida, HUANG Hui

2025, 46(3): 655-665. DOI: 10.11996/JG.j.2095-302X.2025030655

HTML

PDF 44 21

Currently, real-world image information have been used to reconstruct geometric models and generate high-quality rendered outcomes using image-based rendering methods, which has become a viable solution for acquiring high-quality art materials. A visual analysis system was designed to support UAV path planning, 3D reconstruction, and image-based rendering. In terms of engineering contributions, a blueprint editor was employed as the platform for this visual analysis system, wherein various visual analysis functions were implemented as function nodes within the editor, thereby enabling users to personalize their visual analysis workflow by simply dragging, connecting, and configuring graphic elements. In terms of algorithm innovation, an enhanced and optimized view selection algorithm based on sampling coverage was proposed. This algorithm offered improved results in terms of both algorithmic time cost and rendering coverage.

Figures and Tables | References | Related Articles | Metrics

Three-dimensional reconstruction method of on-board cables based on two-dimensional drawings

LI Mo, CAI Chenshu, CHEN Junzhao, WANG Ping, ZHAO Bo, ZENG Long, LI Ming, FANG Qiang

2025, 46(3): 666-675. DOI: 10.11996/JG.j.2095-302X.2025030666

HTML

PDF 45 23

Cables are widely employed on airplanes and other aircrafts, connecting various equipments and power systems on board. At present, large-scale installation of aircraft cables was predominantly executed manually based on two-dimensional drawings, resulting in low efficiency and poor accuracy. The extraction of cable data information from complex 2D drawings and its conversion into clear and intuitive 3D models to optimize the cable routing process has become a key issue requiring urgent resolution. The two-dimensional drawings contain extensive cable graphic structures and annotation information, which are densely arranged, have non-intuitive presentation forms, and interfere with each other, posing a huge challenge to the three-dimensional reconstruction of cables. In order to address this practical engineering challenge, a three-dimensional reconstruction method for on-board cables based on two-dimensional drawings was proposed. The method accurately and efficiently extracted, integrated, and reconstructed structural information from two-dimensional drawings to accurately reflect the three-dimensional spatial position relationship of cables. Through the automatic reading of two-dimensional cable installation drawings, a cable connection algorithm employing a bidirectional optimal matching strategy and a cable reconstruction method based on two-dimensional cable installation drawings were proposed. Finally, the three-dimensional cable topology and geometric structure were reconstructed, and tested and validated using practial engineering data.

Figures and Tables | References | Related Articles | Metrics

Voronoi diagram-based algorithm for 3D borehole modeling

HU Xinyang, WANG Pengfei, ZENG Qiong, JIANG Peng, XIN Shiqing, TU Changhe

2025, 46(3): 676-685. DOI: 10.11996/JG.j.2095-302X.2025030676

HTML

PDF 53 13

Geological models derived from three-dimensional geological modeling methods play an indispensable role across various engineering domains. Existing modeling approaches typically partition underground lithological regions through spatial data interpolation, yet they encountered challenges in maintaining topological consistency, thus constraining the reliability and practicality of three-dimensional models. To construct discontinuous structures within geological regions, a new method based on Voronoi diagrams was proposed to automatically generate three-dimensional stratigraphic surface models. In this method, drilling data were first discretized into scattered points, Voronoi diagrams were constructed, and interfaces between different lithological regions were extracted. Subsequently, the deformation of the interfaces was determined by establishing and solving a linear system for the vertices on these interfaces. In addition, a spatial deformation control algorithm was incorporated to enhance the model’s accuracy in representing complex structural features, such as geological faults and folds, thereby improving the performance of the 3D model in practical applications. This approach resolved the topological inaccuracies often encountered in traditional modeling methods for complex geological structures and exhibited a high degree of automation and robustness. Notably, this method demonstrated exceptional adaptability when handling irregular datasets, significantly reducing the need for manual intervention during model adjustments. Experiments on real engineering data confirmed that the resulting model possessed sound geological validity and can reconstruct non-manifold structures that were difficult to model using other methods.

Figures and Tables | References | Related Articles | Metrics

Model integration technology for landing gear systems based on MBSE

ZHANG Haoxuan, LIANG Zan, WANG Guoxin, WU Shouxuan, LU Jinzhi, YAN Yan, YUAN Yongji, QIAO Jiaxing

2025, 46(3): 686-696. DOI: 10.11996/JG.j.2095-302X.2025030686

HTML

PDF 39 20

To address the interoperability issues between architecture and simulation modeling tools in the design process of the landing gear system, a landing gear system model integration technology based on model-based systems engineering (MBSE) was proposed. This technology enabled the bidirectional transfer of cross-domain information and interface interoperability between architecture modeling and simulation modeling tools. First, the landing gear system architecture model was constructed using the multiple architecture modeling language called kombination of architecture model specification (KARMA), which was then utilized to generate the simulation model. Next, semantic mapping rules for the models were established, and an integrated data model for the simulation model was created by parsing its content. Additionally, a simulation tool adapter was developed to unify the compilation of heterogeneous data and to integrate the architecture and simulation models. Experimental results demonstrated that this method effectively enabled the bidirectional transfer of design information between heterogeneous models and ensured interoperability between architecture and simulation modeling tools, thereby supporting the integration of architecture and simulation models in the landing gear system.

Figures and Tables | References | Related Articles | Metrics

Transport-and-packing with buffer via deep reinforcement learning

LEI Yulin, LIU Ligang

2025, 46(3): 697-708. DOI: 10.11996/JG.j.2095-302X.2025030697

HTML

PDF 40 16

Addressing the challenge of limited container space utilization caused by initial object stacking constraints in physical scenarios, a neural optimization model based on a deep reinforcement learning framework was proposed for bufferable object transportation and packing, incorporating a buffer transfer mechanism to enhance container packing efficiency. The state encoder dynamically encoded priority information extracted from a priority graph and buffer information, effectively managed object stacking relationships, and leveraged the transfer capacity of the buffer zone. The sequence decoder perceived the current container state and employed an attention mechanism to calculate selection probabilities for candidate rotation state sequences, adaptively selecting sequences for either transfer or packing. Subsequently, the target decoder took the geometric and buffer information of the selected states as input, integrated the accumulated information from the sequence decoder to construct a conditional query vector, and performed attention aggregation on the encoded feature vectors to efficiently decide whether to buffer or pack objects. The REINFORCE algorithm with a baseline was employed to train the network, yielding optimized strategies for bufferable object packing. Experimental results on 2D and 3D RAND datasets demonstrated an approximate 4% improvement in container packing utilization compared to the advanced TAP-Net model, significantly outperforming heuristic methods designed for this newly defined problem. Furthermore, models trained on a fixed number of objects effectively generalized to packing instances involving a larger number of objects.

Figures and Tables | References | Related Articles | Metrics

Published as 3, 2025

2025, 46(3): 709.

PDF 28 52

Current Issue