Welcome to Journal of Graphics share: 
Bimonthly, Started in 1980
Administrated: China Association for  Science and Technology
Sponsored: China Graphics Society
Edited and Published: Editorial Board  of Journal of Graphics
Chief Editor: Guoping Wang
Editorial Director: Xiaohong Hou
ISSN 2095-302X
CN 10-1034/T
Current Issue
31 August 2024, Volume 45 Issue 4 Previous Issue   
For Selected: Toggle Thumbnails
Cover
Cover of issue 4, 2024
2024, 45(4): 1. 
PDF 25     31
Related Articles | Metrics
Contents
Table of Contents for Issue 4, 2024
2024, 45(4): 2. 
PDF 12     16
Related Articles | Metrics
Review
A review of neural radiance fields for outdoor large scenes
DONG Xiangtao, MA Xin, PAN Chengwei, LU Peng
2024, 45(4): 631-649.  DOI: 10.11996/JG.j.2095-302X.2024040631
HTML    PDF 49     35

The 3D modeling of large outdoor scenes can not only complete real-time urban mapping and roaming, but also provide technical support for autonomous driving. In recent years, the advancement of neural implicit representation has been rapid, and the emergence of the neural radiance fields (NeRF) has propelled neural implicit representation to a new height. With its characteristics of high-quality rendering and arbitrary angle rendering, NeRF has been widely applied in controllable editing, digital human body, urban scene reconstruction, and other fields. The neural radiance field utilizes deep learning methods to learn implicit three-dimensional scenes from two-dimensional pictures and their poses, synthesizing novel view images.However, the original NeRF can only yield realistic results in bounded scenes, posing challenges in modeling large outdoor scenes due to problems such as unbounded backgrounds, model capacity constraints, and scene appearance. In order to deploy NeRF in large outdoor scenes, researchers have improved from multiple angles and proposed a variety of NeRF variants. Our review will begin by introducing the background of neural radiance fields, then delve into the challenges specific to the large outdoor scenes, analyzing and discussing the solutions to each, before concluding with a summary of current progress of NeRF for large outdoor scenes and prospects for the future.

Figures and Tables | References | Related Articles | Metrics
Image Processing and Computer Vision
Improving YOLOv7 remote sensing image target detection algorithm
LI Daxiang, JI Zhan, LIU Ying, TANG Yao
2024, 45(4): 650-658.  DOI: 10.11996/JG.j.2095-302X.2024040650
HTML    PDF 51     79

In response to the problem of low detection accuracy caused by significant object scale variations and complex backgrounds in remote sensing images, an improved YOLOv7 object detection algorithm was designed. Firstly, in order to alleviate the interference of complex backgrounds on the detector, an attention-guided efficient layer aggregation network (ALAN) was designed to optimize the multi-path network to focus more on foreground objects, thereby reducing the impact of background. Secondly, in order to reduce the impact of significant object scale variations on detection accuracy, an attention multi-scale feature enhancement (AMSFE) module was designed to expand the receptive field of the backbone network output features, enhancing the network’s feature representation ability for objects with substantial scale variations. Finally, a rotating bounding box loss function was introduced to obtain precise location information of objects in any orientation. The experimental results on the DIOR-R dataset demonstrated that the proposed algorithm achieved a mean average precision (mAP) of 64.51%, an improvement of 3.43% over the baseline original YOLOv7 algorithm. Furthermore, it outperformd other similar algorithms and was capable of handling object detection tasks in remote sensing images with multi-scale and complex backgrounds.

Figures and Tables | References | Related Articles | Metrics
ASC-Net: fast segmentation network for surgical instruments and organs in laparoscopic video
ZHANG Xinyu, ZHANG Jiayi, GAO Xin
2024, 45(4): 659-669.  DOI: 10.11996/JG.j.2095-302X.2024040659
HTML    PDF 19     14

Laparoscopic surgery automation is an important component of intelligent surgery, which is based on the premise of real-time and precise segmentation of surgical instruments and organs under the scope of laparoscopy. Hindered by complex factors such as intraoperative blood contamination and smoke interference, the real-time and precise segmentation of surgical instruments and organs faced great challenges. The existing image segmentation methods all performed poorly. Therefore, a fast segmentation network based on attention perceptron and spatial channel (attention spatial channel net, ASC-Net) was proposed to achieve the rapid and precise segmentation of surgical instruments and organs in laparoscopic images. Under the UNet architecture, attention perceptron and spatial channel modules were designed, which were embedded between the network encoding and decoding modules through skip connections. This enabled the network to focus on the deep semantic information differences between similar targets in the images, while learning multi-dimensional features of each target at multiple scales. In addition, a pre-training fine-tuning strategy was adopted to reduce the network computation. Experimental results demonstrated that on the EndoVis2018 (Endovis robotic scene segmentation challenge 2018) dataset, the mean Dice coefficient (mDice), mean intersection-over-union (mIoU), and mean inference time (mIT) of this method were 90.64%, 86.40%, and 16.73 ms (about 60 frames/s), respectively, which were 26% and 39% higher than existing SOTA methods, with mIT reduced by 56%. On the AutoLaparo (automation in laparoscopic hysterectomy) dataset, the mDice, mIoU, and mIT of this method were 93.72%, 89.43%, and 16.41 ms (about 61 frames/s), respectively, outperforming the comparison method. While ensuring segmentation speed, the proposed method effectively enhanced segmentation accuracy, achieving the rapid and precise segmentation of surgical instruments and organs in laparoscopic images and advancing the field of laparoscopic surgery automation.

Figures and Tables | References | Related Articles | Metrics
A network based on the homogeneous middle modality for cross-modality person re-identification
LUO Zhihui, HU Haitao, MA Xiaofeng, CHENG Wengang
2024, 45(4): 670-682.  DOI: 10.11996/JG.j.2095-302X.2024040670
HTML    PDF 24     22

Visible-infrared cross-modality person re-identification (VI-ReID) aims to retrieve and match visible and infrared images of the same person captured by different cameras. In addition to addressing the intra-modality discrepancies caused by various factors such as viewpoint, pose, and scale variations in person re-identification, the modality discrepancy between the visible and infrared images represents a significant challenge for VI-ReID. Existing methods usually constrain only the features of the two modalities to reduce modality differences, while ignoring the essential differences in the imaging mechanism of cross-modality images. To address this, this paper attempted to narrow the discrepancy between modalities by jointly generating an intermediate modality from two modalities and optimizing feature learning on a vision Transformer (ViT)-based network through the fusion of local and global features. A feature fusion network based on the homogeneous middle modality (H-modality) was proposed for VI-ReID. Firstly, an H-modality generator was designed, using a parameter-sharing encoder-decoder structure, constrained by distribution consistency loss to bring the generated images closer in feature space. By jointly generating H-modality images from visible and infrared images, the three modal images were projected into a unified feature space for joint constraining, thereby reducing the discrepancy between visible and infrared modalities and achieving image-level alignment. Furthermore, a transformer-based VI-ReID method based on the H-modality was proposed, with an additional local branch to enhance the network’s local perception capability. In global feature extraction, a head enrich module was introduced to push multiple heads in the class token to obtain diverse patterns in the last transformer block. The method combined global features with local features, improving the model’s discriminative ability. The effect of each improvement was investigated through ablation experiments, where different combinations of Sliding window, H-modality, local feature, and global feature enhancements were designed on the baseline ViT model. The results indicated that each improvement led to performance gains, demonstrating the effectiveness of the proposed method. The proposed method achieved rank-1/mAP of 67.68%/64.37% and 86.16%/79.11% on the SYSU-MM01 and RegDB datasets, respectively, outperforming most state-of-the-art methods. The proposed H-modality can effectively reduce the modality discrepancy between visible and infrared images, and the feature fusion network can obtain more discriminative features. Extensive experiments on the SYSU-MM01 and RegDB datasets have demonstrated the superior performance of the proposed network compared with the state-of-the-art methods.

Figures and Tables | References | Related Articles | Metrics
Automatic portrait matting model based on semantic guidance
CHENG Yan, YAN Zhihang, LAI Jianming, WANG Guixi, ZHONG Linhui
2024, 45(4): 683-695.  DOI: 10.11996/JG.j.2095-302X.2024040683
HTML    PDF 19     21

To address the issues of semantic discrimination errors and unclear details in existing portrait matting methods, an automatic matting model based on semantic guidance was proposed.Firstly, a hybrid CNN-Transformer architecture EMO was introduced for feature encoding. Then, the semantic segmentation decoding branch utilized a multi-scale hybrid attention module to process the top-level encoded features, enhancing multi-scale representation and pixel-level discrimination capabilities. Next, a feature enhancement module was employed to merge high-level features, facilitating the flow of high-level semantic information through the shallow network. Simultaneously, the aggregation guidance module in the detail extraction decoding branch aggregated features from different branches, utilizing the aggregated features to better guide the network in extracting shallow features, thereby improving the accuracy of edge and detail extraction. Experiments on three datasets demonstrated that our approach outperformed the compared methods, achieving optimal performance while significantly reducing parameter count and computational complexity, validating the competitiveness of our proposed method.

Figures and Tables | References | Related Articles | Metrics
Two-stage storm entity prediction based on multiscale and attention
WEI Min, YAO Xin
2024, 45(4): 696-704.  DOI: 10.11996/JG.j.2095-302X.2024040696
HTML    PDF 12     12

Storms are a type of natural phenomenon characterized by a short life cycle, sudden occurrence, and small spatial scale. Radar echo backpropagation methods are commonly employed for prediction.However, time series prediction models find it difficult to locate the key information of storms among numerous features, leading to low prediction accuracy. The models cannot fully learn the high-frequency information in images, resulting in missing details in the predictions and blurry results. To enhance prediction performance, we proposed a two-stage framework for single storm forecasting. In the first stage, a multi-scale module extracted multi-scale information, while an attention mechanism mined important features impacting prediction. Spatiotemporal long-term and short-term memory units were utilized for sequence prediction. The second stage performed bias correction on the results of the first stage. Frequency domain loss enriched prediction details. Experimental results showed that on the radar echo dataset, compared with the mainstream PredRNN-V2 model, the mean squared error was reduced by 11.4% and SSIM was improved by 4.3%, showing superior performance in single storm forecasting tasks. On the Moving MNIST dataset, the mean squared error was reduced by 4.95%, the perceptual loss was reduced by 12.67%, and the SSIM was improved to 0.898, demonstrating strong time series prediction capabilities.

Figures and Tables | References | Related Articles | Metrics
Domain generalization based on data representation invariance
NI Yunhao, HUANG Lei
2024, 45(4): 705-713.  DOI: 10.11996/JG.j.2095-302X.2024040705
HTML    PDF 8     12

Domain generalization has become a prominent research direction in artificial intelligence, aiming to learn task-related invariant representations from different data distributions. It seeks to remove the impact of varying domains on learning tasks, thereby enhancing the model’s domain generalization capabilities. Based on the idea of minimizing the risk of invariance, this paper divided neural networks into feature extractors and invariance classifiers for exploration. For the feature extractor, a group whitening method based on Newtonian iteration was utilized to control the distribution of activation values. This allowed different images to remove part of the domain information after passing through the neural network, thus achieving the purpose of domain generalization. For the invariance classifier, the effects of the normalization method of features and weights on the generalization effect of the model domain were explored, and a snowflake algorithm based on the cosine similarity loss function was proposed. This algorithm improved the accuracy of model domain generalization. In addition, extensive theoretical derivations about the snowflake algorithm and in-depth analyses were provided, offering sufficient theoretical support for the experiment.

Figures and Tables | References | Related Articles | Metrics
Binocular ranging method based on improved YOLOv8 and GMM image point set matching
HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang
2024, 45(4): 714-725.  DOI: 10.11996/JG.j.2095-302X.2024040714
HTML    PDF 17     32

Addressing the research needs for unmanned tower crane systems, a binocular ranging method was proposed, based on the improved YOLOv8 and GMM image point set matching to detect and recognize the hooks of tower cranes in the outdoor environment of the driver’s cab and measure the distance. Image acquisition was performed through binocular cameras, and the FasterNet backbone network and Slim-neck connection layer was introduced to improve the YOLOv8 target detection algorithm, thereby effectively detecting the hooks of tower cranes in the image and obtaining the two-dimensional coordinate information of the detection box. The local sensitive hashing method was employed, and a phased matching strategy was integrated to improve the matching efficiency of the GMM image point set matching model, performing feature point matching for the hooks of tower cranes in the detection box. Finally, the depth information of the tower crane hook was calculated through the principle of binocular camera triangulation. The experimental results demonstrated that compared to the original algorithm, the improved YOLOv8 algorithm had increased precision P by 2.9%, average precision AP50 by 2.2%, reduced model complexity by 10.01 GFLops, and reduced parameter quantity by 3.37 M. This achieved model light-weighting while enhancing detection accuracy. Compared with the original algorithm, the improved image point set matching algorithm exhibited better robustness in various indicators. Finally, the recognition and ranging of tower crane hooks were effectively completed within an acceptable margin of error at the engineering site, verifying the feasibility of this method.

Figures and Tables | References | Related Articles | Metrics
Rotating target detection algorithm in ship remote sensing images based on YOLOv8
NIU Weihua, GUO Xun
2024, 45(4): 726-735.  DOI: 10.11996/JG.j.2095-302X.2024040726
HTML    PDF 21     46

Aiming at the problems of difficulty in detecting small targets in ship remote sensing target image detection, varied ship shapes, and excessive redundant information in traditional horizontal bounding boxes for targets with high aspect ratios, a rotating target detection algorithm for ship remote sensing images based on an improvedYOLOv8 was proposed. By improving the convolution structure in the backbone network, the problem of fine-grained information loss caused by stride convolution was alleviated, improving the accuracy of small target detection. By replacing some of the convolution modules in C2f with DCNv3 deformable convolution, the feature information extraction of irregular objects was enhanced, improving the nonlinear modeling capabilities of the model. Integrating the shallow feature map from the backbone network into the neck alleviated the problem of detailed information loss caused by multiple convolution operations, enhancing the detection capability for small target objects. Experimental results showed that the detection accuracy (mAP50) of the improved algorithm on the ShipRSImageNet dataset reached 84.317%, which is 4.054% higher than the baseline model. The model accuracy reached 93.235% on the HRSC2016 dataset, which is 1.555% higher than the baseline model. The improved algorithm achieved high detection performance with a small increase in the number of model parameters, effectively balancing model efficiency and performance.

Figures and Tables | References | Related Articles | Metrics
A water surface target detection algorithm based on SOE-YOLO lightweight network
ZENG Zhichao, XU Yue, WANG Jingyu, YE Yuanlong, HUANG Zhikai, WANG Huan
2024, 45(4): 736-744.  DOI: 10.11996/JG.j.2095-302X.2024040736
HTML    PDF 21     50

A lightweight water surface object detection algorithm SOE-YOLO based on YOLOv8 was proposed to address the issues of missed and false detections in complex and ever-changing water surface environments, as well as limited computing resources on the detection platform. Firstly, the Slim-Neck paradigm containing GSConv was employed to improve the weight of the model in the Neck part. Secondly, the Backbone section was reconstructed using a lightweight convolutional ODConv (omni-dimensional dynamic convolution) module, thereby reducing the number of parameters to improve the detection speed of the network. Finally, the multi-scale attention mechanism EMA (effective multi-scale attention) was introduced to enhance the network’s capability in extracting multi-scale features, thereby enhancing the small target detection accuracy. The experimental results on the WSODD (water surface object detection) test set demonstrated that the parameter and computational quantities of the SOE-YOLO model were 2.8 M and 6.6 GFLOPs, respectively, which were reduced by 12.5% and 18.6% compared to the original model. At the same time, mAP @% 0.5 and mAP@0.5-.95 reached 79.9% and 47.2%, respectively, which were 2.4% and 1.6% higher than the original model, and the missed detection rate decreased significantly, outperforming the current popular object detection algorithms. The FPS reached 64.25, meeting the requirements of real-time detection of surface targets. It could achieve better detection performance, while achieving lightweight, meeting deployment requirements in computing-resource-constrained environments.

Figures and Tables | References | Related Articles | Metrics
A deep architecture for reciprocal object detection and instance segmentation
GONG Yongchao, SHEN Xukun
2024, 45(4): 745-759.  DOI: 10.11996/JG.j.2095-302X.2024040745
HTML    PDF 8     10

Object detection and instance segmentation are two fundamental and closely correlated tasks in computer vision, yet their relations have not been fully explored in most previous works.For this reason, we presented the reciprocal object detection and instance segmentation network (RDSNet), a novel deep architecture. To reciprocate between these two tasks, we designed a two-stream structure to learn feature representations jointly at both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks), thus encoding object- and pixel-level information respectively. Moreover, three new modules were introduced for the interactions between the two streams, allowing object-level information to assist instance segmentation and pixel-level information to assist object detection. Specifically, a correlation module was used to measure the similarity between object- and pixel-level features, promoting the consistency in features belonging to the same object and enhancing the accuracy of instance masks consequently. We proposed a cropping module to better distinguish different instances and reduce background noise, by introducing the awareness of instance and translation variance to pixel-level perception. To further refine the alignment between bounding boxes and their corresponding objects, a mask-based boundary refinement module (MBRM) was proposed for the fusion of bounding boxes and instance masks, which had the potential to correct the errors in bounding boxes with the help of instance masks. Extensive experimental analyses and comparisons on the COCO dataset demonstrated the effectiveness and efficiency of RDSNet. In addition, we further improved the performance of RDSNet by integrating the mask scoring strategy into MBRM, which allowed object detection to benefit from instance segmentation in a new way.

Figures and Tables | References | Related Articles | Metrics
Human action recognition based on skeleton dynamic temporal filter
LI Songyang, WANG Xueting, CHEN Xianglong, CHEN Enqing
2024, 45(4): 760-769.  DOI: 10.11996/JG.j.2095-302X.2024040760
HTML    PDF 14     14

Human action recognition is one of the key research areas in computer vision, with a wide range of applications such as human-computer interaction and intelligent surveillance. Existing methods for skeleton-based action recognition often combine graph convolutional networks (GCN) with temporal convolutional networks (TCN). However, the limited size of convolutional kernel restricts the models’ global temporal modeling capability. Moreover, applying convolutional kernel to skeletal data leads to a lack of discriminative power among different skeleton points. Furthermore, using TCN to extract features often entails repeated calculations, leading to an increase in the parameter quantity of TCN as the network deepens. To address these issues, signal processing methods were utilized, and skeleton dynamic temporal filtering (SDTF) module was proposed for skeleton action recognition to replace TCN for global modeling. Based on this, lightweight improvements were made to AGCN, reducing the complexity. SDTF modeled temporal features through Fourier transform, multiplying the frequency domain features obtained from Fourier transform with the filtered frequency domain output, and then undergoing inverse Fourier transform. Extensive experiments conducted on the NTU-RGBD and Kinetics-Skeleton datasets demonstrated that the proposed model significantly reduced network parameters and computational complexity, while achieving comparable or even superior recognition performance compared to the original model.

Figures and Tables | References | Related Articles | Metrics
Research on multi-scale road damage detection algorithm based on attention mechanism
WU Bing, TIAN Ying
2024, 45(4): 770-778.  DOI: 10.11996/JG.j.2095-302X.2024040770
HTML    PDF 25     15

Road damage detection is an important task in road maintenance and repair. The existing road damage detection methods primarily rely on traditional manual detection, which requires significant manpower and material resources, resulting in low detection efficiency and an inability to meet the needs of current road development.To address these problems, an improved multi-scale road damage detection algorithm, YOLOv8-RDD, was proposed. Firstly, the YOLOv8-RDD algorithm employed Deformable Convolutional Networks (DCN) in the C2f module to build a new C2f_DCN module. This expanded the effective range of the receptive field and located the boundary and position of target objects more accurately, thus enhancing the ability to identify and locate the target. At the end of backbone network, a new SPPF_GS module was designed, introducing the Self-Attention (SA) mechanism and the Phantom Convolution Ghost module into the SPPF module, with the size of pooled kernel re-optimized to better deal with long-distance dependence and capture global information. Finally, Coordinate Attention (CA) was introduced into the Neck to strengthen the feature extraction ability of the model and reduce redundant information. Experimental results demonstrated that the improved algorithm achieved a Precision of 61.1%, a Recall rate of 55.5%, and a mean average precision (mAP) of 56.2% on the RDD2022 dataset. Compared with the YOLOv8n algorithm, the results were improved by 4.6%, 4.7%, and 5.2%, respectively, which achieved excellent performance in the target detection of road damage.

Figures and Tables | References | Related Articles | Metrics
Improved YOLO object detection algorithm for traffic signs
ZHAO Lei, LI Dong, FANG Jiandong, CAO Qi
2024, 45(4): 779-790.  DOI: 10.11996/JG.j.2095-302X.2024040779
HTML    PDF 18     28

To address the existing problems such as low recognition accuracy and numerous detection errors in the current algorithms when detecting traffic signs, a traffic sign detection method based on the optimization of YOLOv5 was proposed. In the Backbone section, to achieve receptive fields of various sizes, obtain features of different complexities, and enhance the critical features of feature maps while suppressing redundant ones, the reparameterization module DBB was employed instead of Conv convolution, and convolutions with diverse scales are utilized to obtain receptive fields of various sizes. By means of feature extraction branches with different scales and diverse complexities, the feature space is enriched. Simultaneously, the SE attention mechanism was introduced. to enhance the significant features of the feature map and suppress redundant features, thereby enhancing the detection performance of the network; In the Neck section, a new SLA Neck was designed to aggregate feature maps from different layers, effectively preventing the loss of small target feature information. is employed as the neck structure, which reduces the number of parameters and the amount of computation while fusing the feature information of different levels, capturing more context information and details, segmenting the background information, enabling the model to be more focused on the target characteristic area, and enhancing the performance of the model when encountering objects of different sizes to achieve precise positioning; concurrently, The fused features were upsampled, and a small object detection layer was added to enhance shallow feature information. In the Head section, the IoU-Aware query selection was introduced, and the IoU score was incorporated into the objective function of the classification branch, using the IoU between the predicted box and the ground truth (GT) as the label for category prediction. This could achieve the consistent constraint on the classification and localization of the positive samples. and enhance the matching mechanism of the model, and reduce the occurrences of incorrect detection and missed detection; simultaneously, The SIoU was introduced as the loss function instead of the CIoU loss function, taking into account the direction between the ground truth box and the predicted box is encompassed within the loss range to elevate convergence speed and inference capability. The experimental results indicated that on the TT100K dataset, the proposed method, compared with YOLOv5m, reduced the amount of computation by 3.3%, and the number of parameters by 34.8%, while mAP and mAP@50:95 were improved by 13.8% and 10.4%, respectively. The experiment demonstrated that this model enhanced the detection accuracy while reducing the number of model parameters and the size of the model, making it valuable for practical applications.

Figures and Tables | References | Related Articles | Metrics
Temporal dynamic frame selection and spatio-temporal graph convolution for interpretable skeleton-based action recognition
LIANG Chengwu, YANG Jie, HU Wei, JIANG Songqi, QIAN Qiyang, HOU Ning
2024, 45(4): 791-803.  DOI: 10.11996/JG.j.2095-302X.2024040791
HTML    PDF 10     12

Skeleton-based action recognition is a prominent research topic in computer vision and machine learning. Existing data-driven neural networks often overlook the temporal dynamic frame selection of skeleton sequences and lack the understandable decision logic inherent in the model, resulting in insufficient interpretability. To this end, we proposed an interpretable skeleton-based action recognition method based on temporal dynamic frame selection and spatio-temporal graph convolution, thereby enhancing the interpretability and recognition performance. Firstly, the quality of skeleton frames was estimated using the joint confidence to remove low-quality skeleton frames, addressing the skeleton noise problem. Secondly, based on the domain knowledge of human activity, an adaptive temporal dynamic frame selection module was proposed for calculating the motion salient regions to capture the dynamic patterns of key skeleton frames in human motion. To represent the intrinsic topology of human joints, an improved spatiotemporal graph convolutional network was used for interpretable skeleton-based action recognition. Experiments were conducted on three large public datasets, including NTU RGB+D, NTU RGB+D 120, and FineGym, and the results demonstrated that the recognition accuracy of this method outperformed comparative methods and possessed interpretability.

Figures and Tables | References | Related Articles | Metrics
Computer Graphics and Virtual Reality
High-quality texture reconstruction method for architectural painted patterns
GONG Chenchen, CAO Li, ZHANG Tengteng, WU Yize
2024, 45(4): 804-813.  DOI: 10.11996/JG.j.2095-302X.2024040804
HTML    PDF 16     11

Architectural painted patterns refer to the exquisite patterns painted on wooden structures. When digitizing ancient architecture, the general solution involves using a mesh combined with a single texture map for rendering. However, due to the limited resolution of a single texture map, not all details can be adequately displayed. Moreover, common textures are stored pixel by pixel, and using multiple high-resolution texture maps can lead to excessive graphics processing unit memory usage, resulting in lower data exchange efficiency. To address the aforementioned challenges, a method for high-quality texture map reconstruction was proposed. This method employed the self-similarity and symmetry of the painted patterns, thereby extracting the smallest unduplicated pattern elements and layout information of the painted patterns. Vectorized data was utilized to represent the smallest pattern elements, and a library of pattern elements was constructed. When editing the painted patterns of a 3D model, these pattern elements were reused and configured with corresponding transformation parameters, which were encoded into descriptive files to complete the rendering of the painted content. Experimental results demonstrated that the proposed method could effectively reduce the storage of redundant information and provide a better presentation of details, thus enhancing realistic rendering.

Figures and Tables | References | Related Articles | Metrics
Full process generation method of high-resolution face texture map
ZHU Baoxu, LIU Mandan, ZHANG Wenting, XIE Lizhi
2024, 45(4): 814-826.  DOI: 10.11996/JG.j.2095-302X.2024040814
HTML    PDF 8     12

Most research on face texture generation focuses on low-resolution generation. To address this, the image translation was applied to the generation of high-resolution texture maps, proposing a whole-process method for generating 1024*1024 texture maps using an image translation network as the main part. This method effectively alleviated the problem of low resolution of ultraviolet texture generation, while ensuring rapid and efficient generation. In the image translation network, the convolutional neural networks served as the backbone network, combined with the statistical texture learning network (STLNet) and the normalization method of soft adaptive layer-instance normalization (Soft-AdaLIN) to form the generator. Meanwhile, multi-scale discrimination was employed to guide the generation of high-resolution texture images, and finally color conversion and Poisson fusion were performed to complete texture correction. Images were randomly extracted from the FFHQ dataset for face normalization and tested. Through a series of evaluation indexes for quantitative evaluation, qualitative and quantitative comparisons with recent relevant research methods, the advantages of this whole-process generation method in generating 1024×1024 face UV texture images were verified.

Figures and Tables | References | Related Articles | Metrics
Research on information interactive application of holographic map in large public places
HOU Wenjun, GUO Yuyang, LI Tong
2024, 45(4): 827-833.  DOI: 10.11996/JG.j.2095-302X.2024040827
HTML    PDF 14     13

Holographic technology possesses the unique capability for true three-dimensional display, aiding in providing vivid and accurate visual perception and natural interactive experiences. In the future, it will have extensive practical applications in image and information presentation-related aspects. In addition, with the development of geographic information and the universal application of maps, maps have become a very important tool for the public. Among them, holographic maps of large public places demonstrate a wide range of demands and high application value, and its inherent characteristics of information fusion and information bearing also render its presentation form of profound research significance. Firstly, the information display features of holographic maps in large public places were analyzed from the perspectives of map content, spatial characteristics, and interactive characteristics, and the mapping methods and information hierarchy were studied through experiments.The experimental results revealed that the basic scene layer was optimally presented through spatial perspective locking. The associated ubiquitous information layer was best presented via user perspective locking, and its information hierarchy applied to the spatial-depth architectural scheme. Based on the conclusions, combined with the scene characteristics and user demands of large public spaces, the design practice of holographic map systems in large public places was undertaken, offering certain reference and inspiration for the adoption and application of holographic technology.

Figures and Tables | References | Related Articles | Metrics
A text-driven 3D scene editing method based on key views
ZHANG Ji, CUI Wenshuai, ZHANG Ronghua, WANG Wenbin, LI Yaqi
2024, 45(4): 834-844.  DOI: 10.11996/JG.j.2095-302X.2024040834
HTML    PDF 8     10

The zero-shot image editing method based on denoising diffusion model has made remarkable achievements, and its application to 3D scene editing enables zero-shot text-driven 3D scene editing. However, its 3D editing results are easily affected by the 3D continuity of the diffusion model and over-editing, leading to erroneous editing results. To address these problems, a new text-driven 3D editing method was proposed, which started from the dataset and proposed key view-based data iteration and pixel-based abnormal data masking module. The key view data could guide the editing of a 3D area to minimize the effect of 3D inconsistent data, while the data masking module could filter out anomalies in the 2D input data. Using this method, vivid photo-quality text-driven 3D scene editing effects could be realized. Experiments have demonstrated that compared to some current advanced text-driven 3D scene editing methods, the erroneous editing in the 3D scenes could be greatly reduced, resulting in more vivid and realistic 3D editing effects. In addition, the editing results generated by the method in this paper were more diversified and more efficient.

Figures and Tables | References | Related Articles | Metrics
BIM/CIM
Automated detection of truss geometric quality based on BIM and 3D laser scanning
ZOU Yakun, CHEN Xianchuan, TAN Yi, LIN Yongfeng, ZHANG Yafei
2024, 45(4): 845-855.  DOI: 10.11996/JG.j.2095-302X.2024040845
HTML    PDF 12     13

The truss structure, widely employed in large-span public buildings for its lightweight and high load-bearing capacity, requires periodic inspections of its geometric quality to ensure safety with its usage over time. However, conventional methods for inspecting the geometric quality of truss structures rely mainly on manual processes, which are inefficient and costly.This paper proposed an automated detection algorithm to perform geometric quality inspection of truss structures. Firstly, the truss structure was separated from the background in the acquired raw point cloud data using building information model (BIM). Subsequently, an algorithm based on key point detection technology automatically extracted geometric features of the truss structure and calculated node coordinates. Finally, by comparing computed results with the BIM design information, geometric quality inspection results were obtained. The validation of the proposed method was conducted in the auditorium of a campus in Shenzhen, China. The experimental results demonstrated that the computational outcomes of the proposed algorithm exhibited an error within 2 mm compared to the measurements obtained from the total station. When the computational results of the proposed method were contrasted with BIM model data, variations in the truss structure nodes were detected, indicating different degrees of settlement. Consequently, the proposed method enabled accurate and rapid spatial positioning of nodes, thereby enhancing the efficiency of geometric quality inspection for truss structures.

Figures and Tables | References | Related Articles | Metrics
Industrial Design
The effect of spatial location of HUD’s road guidance on novice drivers
WANG Fenghong, CHEN Dailin, GAO Ziting, WEN Zhaocheng
2024, 45(4): 856-867.  DOI: 10.11996/JG.j.2095-302X.2024040856
HTML    PDF 16     14

This study investigated the effects of different spatial locations of road guides, road types, intersection turns, traffic flow, and time periods on the driving stability and mental load of novice drivers in steering scenarios. Using a virtual driving simulation platform, mixed orthogonal tests were conducted with factors including road guide spatial locations, road types, intersection turns, traffic flow, and time periods. The test indexes included the standard deviation of the steering wheel turning angle and the pupil coefficient of variation. Using range analysis and interaction comparison of the test results, the optimal factor level combination was determined and analyzed. The results showed that the primary factors affecting driving stability were, in order of importance: intersection types, time periods, road guide spatial locations, and traffic flow. For mental load, the factors ranked as: road guide spatial locations, time periods, intersection types, and traffic flow. The spatial locations of road guidance, as a design element of head-up display (HUD) road guidance, either promoted or interfered with driver stability while consistently reducing mental load. These findings provided a reference for the design of information related to HUD road guidance, significantly contributing to HUD information design research and urban road driving research.

Figures and Tables | References | Related Articles | Metrics
Full-contact orthopedic insole design for plantar pressure optimization
HUANG Yuzhe, WANG Xupeng, CHEN Wenhui, ZHOU Zhongze, ZHAO Jiaxin, WANG Yunqian
2024, 45(4): 868-878.  DOI: 10.11996/JG.j.2095-302X.2024040868
HTML    PDF 17     12

In order to optimize the plantar pressure distribution and reduce the peak plantar pressure, a parametric design method for full-contact orthopedic insoles based on a three-dimensional foot model was proposed. Based on the plantar pressure data and combined with the 3D lattice structure analysis, the offloading structure of the insole was optimized. First of all, in order to realize the personalized design of the full-contact insole, the foot 3D scanning model was employed to develop the parametric design process of the full-contact insole model through the Grasshopper plug-in platform. Subsequently, the energy absorption efficiency of six common 3D cubic lattice structures made of TPU was analyzed under the Abaqus environment, and the effective elastic modulus of diamond lattice structures was calculated under different filling rates. The performance of different lattice structures in absorbing plantar pressure was evaluated to provide a scientific basis for the selection of pressure-reducing structures. In the next part, the image sampling algorithm was used to extract the areas with higher plantar pressure, and the corresponding effective modulus of elasticity was chosen to fill different parts, completing the optimization design of the full-contact orthopedic insole. Additionally, the pressure reduction performance of the orthopedic insole before and after the optimization was compared through simulation analysis. Finally, static and dynamic plantar pressure measurement experiments demonstrated that personalized orthopedic insoles based on foot shape and three-dimensional pressure-reducing structures can effectively optimize the distribution of plantar pressure, offload the peak value of plantar pressure, and improve foot stability.

Figures and Tables | References | Related Articles | Metrics
Published as
Published as 4, 2024
2024, 45(4): 879. 
PDF 14     16
Related Articles | Metrics