Journal of Graphics

Cover of issue 1, 2023

2023, 44(1): 0-0.

Abstract ( 310 )

PDF (1586KB) ( 306 )

Related Articles | Metrics

Review of image super-resolutionbased on deep learning

LI Hong-an, ZHENG Qiao-xue, TAO Ruo-lin, ZHANG Min, LI Zhan-li, KANG Bao-sheng

2023, 44(1): 1-15. DOI: 10.11996/JG.j.2095-302X.2023010001

Abstract ( 1392 )

HTML ( 79 )

PDF (3973KB) ( 696 )

Super-resolution (SR) is an essential technology in digital image processing that reconstructs and produces a matching high-resolution (HR) image based on the low-resolution (LR) image obtained by an observer, thereby improving the resolution of modern digital images. This technology is of significant research and practical value in fields such as digital image compression and transmission, medical imaging, remote sensing imaging, and video perception and monitoring. With the rapid growth of deep learning, novel solutions for SR challenges can be obtained by combining the latest deep learning algorithms. First, discussions were made on the background, development, and technological value of applying deep learning to SR. Second, a brief overview was provided concerning the fundamental methodology, categorization, and advantages and disadvantages of the classic SR methods. Deep learning-based SR methods were categorized and introduced based on distinct implementation strategies and network types. The application of convolutional neural networks (CNN), residual networks (ResNet), and generative adversarial networks (GAN) in SR was investigated and contrasted. The major evaluation indices and solution methodologies were then presented, and the performance of several SR methods on typical data sets was compared. Finally, the deep learning-based SR method was summarized, and the future development trend was forecasted.

Figures and Tables | References | Related Articles | Metrics

Mask detection algorithm based on YOLOv5 integrating attention mechanism

LI Xiao-bo, LI Yang-gui, GUO Ning, FAN Zhen

2023, 44(1): 16-25. DOI: 10.11996/JG.j.2095-302X.2023010016

Abstract ( 702 )

HTML ( 21 )

PDF (10477KB) ( 511 )

Wearing masks correctly during the COVID-19 pandemic can effectively prevent the spread of the virus. In response to the detection challenge posed by dense crowds and small detection targets in public places, a mask wearing detection algorithm based on the YOLOv5s model and integrating an attention mechanism was proposed. Four attention mechanisms were introduced into the backbone network of the YOLOv5s model to respectively suppress irrelevant information, enhance the ability of the feature map to express information, and improve the model?s detection ability for small-scale targets. Experimental results show that the introduction of the convolutional block attention module could increase the mAP value by 6.9 percentage points compared with the original network, with the greatest improvement among the four attention mechanisms. The normalization-based attention module also showed excellent performance, with the least quantity of parameters while losing a small amount of mAP. Through comparative experiments, the GIoU loss function was selected to calculate the bounding box regression loss, resulting in further improvements to positioning accuracy, resulting in an mAP value that was improved by 8.5 percentage points compared to the original network. The detection results of the improved model in different scenarios prove the accuracy and practicability of the algorithm for small target detection.

Figures and Tables | References | Related Articles | Metrics

Research on lightweight forest fire detection algorithm based on YOLOv5s

PI Jun, LIU Yu-heng, LI Jiu-hao

2023, 44(1): 26-32. DOI: 10.11996/JG.j.2095-302X.2023010026

Abstract ( 1012 )

HTML ( 27 )

PDF (1809KB) ( 371 )

A new algorithm for light-weight forest fire object detection was proposed based on YOLOv5s to address the low accuracy, poor flexibility, and high software and hardware limitations of the previous UAV-embedded equipment for forest fire inspection. The proposed algorithm replaced the backbone of YOLOv5s with the light-weight network Shufflenetv2, employed the idea of channel recombination to improve the speed of the backbone network in picture information extraction, and maintained both high accuracy and fast detection speed. Then, a coordinate attention (CA) positional attention module specially designed for light-weight network was added to the connection between Backbone and Neck, which could aggregate different position information of pictures into the channel, thus improving the attention of the detected object. Finally, the CIOU loss function was utilized in the prediction part to better optimize the ratio of length to width of the rectangular frame and accelerate the model convergence. The results of the algorithm deployed on Jetson Xavier NX show that compared with the Faster-RCNN, SSD, YOLOv4, and YOLOv5s experimental methods, the improved network model size was reduced by up to 98%, increasing the precision to 92.6%, accuracy rate to 95.3%, and FPS to 132 frames/s. It can effectively achieve the real-time prevention and detection of forest fire in daylight, darkness, or good visibility, exhibiting good accuracy and robustness.

Figures and Tables | References | Related Articles | Metrics

Cross modality person re-identification based on residual enhanced attention

SHAO Wen-bin, LIU Yu-jie, SUN Xiao-rui, LI Zong-min

2023, 44(1): 33-40. DOI: 10.11996/JG.j.2095-302X.2023010033

Abstract ( 257 )

HTML ( 5 )

PDF (2773KB) ( 125 )

Cross modality person re-identification mainly faces two problems: ① Modality discrepancies between infrared and visible images caused by different imaging mechanisms. ② Intra-class discrepancies caused by the insufficient identity discrimination of features. In order to address the above two problems, a cross modality person re-identification method based on residual enhanced attention was proposed to improve the modality invariance and identity discrimination of pedestrian features. First, with non-shared parameters at the shallow network and shared parameters at the deep layer a dual-stream convolutional neural network was designed as the backbone. Then, the problem of global weakening in the existing attention mechanism was analyzed, and a residual enhancement method was designed to solve this problem and improve the performance of the attention mechanism. It was applied to the shallow channel dimension and deep spatial location of the network respectively. Sufficient experiments on the two datasets SYSU-MM01 and RegDB have proved the effectiveness of the method.

Figures and Tables | References | Related Articles | Metrics

Cell counting method based on multi-scale feature fusion

ZHANG Qian, WANG Xia-li, WANG Wei-hao, WU Li-zhan, LI Chao

2023, 44(1): 41-49. DOI: 10.11996/JG.j.2095-302X.2023010041

Abstract ( 229 )

HTML ( 5 )

PDF (3866KB) ( 135 )

To address the problem of low cell counting accuracy caused by factors such as cell size variation in cell counting work, the highly crowded target recognition network CSRNet was introduced and improved, and a cell counting method based on multi-scale feature fusion was constructed. First, the first 10 layers of VGG16 were employed to extract cell features, avoiding the loss of small target information due to the deep network. Then, the spatial pyramid pooling structure was introduced to extract the multi-scale features of cells and perform feature fusion, reducing the counting errors caused by different cell shapes, sizes, and cell occlusion. Then the feature map was decoded using the hybrid dilated convolution to obtain the density map, solving the problem of missing pixels in the decoding process of CSRNet. Finally, the density map was regressed pixel by pixel to obtain the total number of cells. In addition, a new combined loss function was introduced in the training process to replace the Euclidean loss function, which not only considered the relationship between the ground truth density map and the single pixel point of the predicted density map, but also considered the global and local density levels. Experiments show that the optimized CSRNet could yield better results on VGG cells and MBM cells datasets, effectively improving the low cell counting accuracy caused by factors such as cell size variation.

Figures and Tables | References | Related Articles | Metrics

Multi-scale convolutional neural network incorporating attention mechanism for intestinal polyp segmentation

SHAN Fang-mei, WANG Meng-wen, LI Min

2023, 44(1): 50-58. DOI: 10.11996/JG.j.2095-302X.2023010050

Abstract ( 273 )

HTML ( 5 )

PDF (1022KB) ( 162 )

Intestinal polyp segmentation provides the location and morphology of polyps in colon, allowing doctors to infer the possibility of canceration according to the degree of structural deformation, which facilitates the early diagnosis and treatment of colon cancer. In view of the limited multi-scale features extracted by many existing convolutional neural networks (CNN), and the frequently caused redundant and interfering features, it is difficult to extract the complex and variable targets. To address this challenge, a multi-scale convolutional neural network incorporating attention mechanism was proposed for intestinal polyp segmentation. Specifically, the pyramid strategy based on different scales of pooling was designed to capture the rich multi-scale context information. Then a channel attention mechanism was incorporated into the network so that the model could adaptively select appropriate local and global contextual information for feature integration based on the region of interest. Following that, by combining the pyramid pooling strategy and the channel attention mechanism, a multi-scale effective semantic fusion decoder network was constructed to improve the model robustness for segmentation of intestinal polyps with complex and variable shapes and sizes. The experimental results show that the Dice coefficient, IoU, and sensitivity produced by the proposed model reach 90.6%, 84.4%, and 91.1% on the CVC-ClinicDB dataset, and 80.6%, 72.6%, and 79.0% on the ETIS-Larib dataset, indicating that the proposed model could accurately and effectively segments polyps in colonoscopy images.

Figures and Tables | References | Related Articles | Metrics

Learning attention for Dongba paintings emotion classification

PAN Sen-lei, QIAN Wen-hua, CAO Jin-de, XU Dan

2023, 44(1): 59-66. DOI: 10.11996/JG.j.2095-302X.2023010059

Abstract ( 172 )

HTML ( 1 )

PDF (3412KB) ( 131 )

Rich in emotions and limited in samples constitute the artistic characteristic of Dongba paintings. The classification algorithm of learning attention could effectively assist the emotional classification of Dongba paintings, solving the problem of limited samples of Dongba paintings. Firstly, Dongba paintings were divided into 4 themes such as figures, ghosts, animals, and plants. According to the painting emotions, Dongba paintings were divided into 12 kinds of emotions, such as industriousness, simplicity, grace, and beauty. Secondly, the architecture of encoder and decoder was employed to extract their emotional features, while the pre-training model was used to improve the generalization performance of the classification model and accelerate the convergence of emotion classification for small samples of Dongba paintings. Finally, the blank attention was set in the decoder and the output sequence of the encoder was fused. Through the decoder, the semantics of Dongba paintings were learned, guiding the model to more accurately and reasonably improve the classification ability. Experiments show that the classification algorithm of learning attention could attain a classification accuracy of 80.7% higher than the existing methods, solving the problem of rich and difficult-to-distinguish emotions in Dongba paintings.

Figures and Tables | References | Related Articles | Metrics

A sketch-guided facial image completion network via selective recurrent inference

SHAO Ying-jie, YIN Hui, XIE Ying, HUANG Hua

2023, 44(1): 67-76. DOI: 10.11996/JG.j.2095-302X.2023010067

Abstract ( 175 )

HTML ( 6 )

PDF (10398KB) ( 118 )

Image inpainting plays a key role in applications such as inpainting old photos and removing face mosaics. There are many problems in the existing deep learning-based inpainting models, such as interference information erroneously affecting the encoder and decoder in generating the result, which weakens inpainting quality, and the probabilistic diversity leading to deviation from users' expectations. To address the problems, a sketch-guided facial image completion network was proposed via selective recurrent inference. A selective recurrent inferential strategy was designed at first. A selection mechanism was introduced to solve the influence of erroneous interference on the inference of the encoder and decoder. Then a sketch-based structural information correction module was added to the skip connection between the encoder and decoder, thereby limiting the deviation of the repair results from users' expected structure. Experimental results on the CelebA-HQ dataset show that the proposed method could outperform other classical network models in terms of evaluation indicators and guidance for generating user-expected content. The experimental results on the manually drawn sketches show that user-specified content could be generated by simple hand-drawn methods, exhibiting certain significance in practical application.

Figures and Tables | References | Related Articles | Metrics

Multi-focus image fusion method based on fractional wavelet combined with guided filtering

ZHANG Chen-yang, CAO Yan-hua, YANG Xiao-zhong

2023, 44(1): 77-87. DOI: 10.11996/JG.j.2095-302X.2023010077

Abstract ( 188 )

HTML ( 2 )

PDF (3990KB) ( 138 )

To address the problems of losing details and producing artifacts at image edges in multi-focus image fusion, a new multi-focus image fusion method was proposed based on discrete fractional wavelet transform (DFRWT) combined with guided filtering. First, DFRWT was utilized to decompose the source images at multiple scales, obtaining the low frequency part and high frequency part. Then, in light of the energy distribution traits of wavelet modulus coefficients at different orders, the most suitable fractional order was selected. In the low frequency part, the initial decision was obtained by applying the Laplacian energy sum, and then the fusion rule was yielded by modifying the decision graph with guided filtering. For the high frequency part, the fractional spatial frequency fusion rule was adopted. Such rules, which can improve the efficiency of the image information on fusion. Lastly, the inverse DFRWT processing was carried out to obtain the composite image. The new method was compared with the existing five algorithms for visual comparison experiments and quantitative evaluation. The simulation experiments show that the proposed method could effectively suppress Gibbs effects and edge artifacts effects. The validity and advantages of the proposed method were verified in terms of the visual effect and objective evaluation, outperforming several classical algorithms in the quality of image fusion.

Figures and Tables | References | Related Articles | Metrics

Research on image detection algorithm of freight train brake shoe bolt and brake shoe fault

GU Yu, ZHAO Jun

2023, 44(1): 88-94. DOI: 10.11996/JG.j.2095-302X.2023010088

Abstract ( 108 )

HTML ( 3 )

PDF (1720KB) ( 72 )

The state of brake shoe bolt and brake shoe is of great importance to the safe operation of freight trains. Therefore, an improved SSD (single shot multi-box detector) target detection algorithm was proposed to detect the missing brake shoe bolt and brake shoe of freight trains. Firstly, the depthwise separable convolution module was introduced into the ResNet50 network model to reduce the number of parameters by about 50%, thereby improving the detection efficiency. Secondly, the improved ResNet50 network model was employed to replace the VGG16 network in SSD to improve the feature extraction capability of the SSD network model. Then, Conv5_3 and Conv7_2 were combined with Conv4_6 and Conv6_2 respectively by the method of high level feature and low level feature fusion to improve the detection accuracy. Finally, the network was trained to obtain more accurate weights through the self-built dataset of freight train brake component loss. The experimental results show that the improved SSD algorithm could attain an accuracy of 96.85% and a recall of 89.50% in brake shoe brazing loss detection, and an accuracy of 97.01% and a recall of 97.01% in the detection of brake shoe loss, thus meeting the requirement of missing brake shoe bolt and brake shoe detection.

Figures and Tables | References | Related Articles | Metrics

Video anomaly detection combining pedestrian spatiotemporal information

YAN Shan-wu, XIAO Hong-bing, WANG Yu, SUN Mei

2023, 44(1): 95-103. DOI: 10.11996/JG.j.2095-302X.2023010095

Abstract ( 217 )

HTML ( 10 )

PDF (1567KB) ( 120 )

To address the current problem that video anomaly detection cannot make full use of temporal information and ignores the diversity of normal behaviors, an anomaly detection method incorporating pedestrian spatiotemporal information was proposed. Based on the convolutional auto-encoder, the input frames were compressed and reduced by the encoder and decoder in it, and the anomaly detection was realized according to the difference between the output frames and the real value. In order to strengthen the feature information connection between consecutive frames of the video, the residual time shift module and the residual channel attention module were introduced to enhance the network's ability to model temporal and channel information, respectively. Considering the overgeneralization of the convolutional neural networks (CNN), a memory-augmented module was added between the skip connections of each layer of the encoder and decoder to limit the overpowering representation of anomalous frames by the auto-encoder and improve the anomaly detection accuracy of the network. In addition, the objective function was modified by a kind of feature separateness loss to effectively distinguish different normal behavior patterns. Experimental results on the CUHK Avenue and ShanghaiTech datasets show that the proposed method outperforms the current mainstream video anomaly detection methods while meeting the real-time requirements.

Figures and Tables | References | Related Articles | Metrics

An imitation U-shaped network for video object segmentation

HUANG Zhi-yong, HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai

2023, 44(1): 104-111. DOI: 10.11996/JG.j.2095-302X.2023010104

Abstract ( 153 )

HTML ( 3 )

PDF (2955KB) ( 61 )

For the semi-supervised video object segmentation method, the one-shot video object segmentation (OSVOS) method is guided by the object marking mask of the first frame to separate the foreground objects in the subsequent frames from the video. Despite the impressive segmentation results, this method is not applicable to cases where the appearance of foreground objects changes significantly or the appearances of foreground objects and background are similar. To solve these problems, an imitation U-shaped network structure for video object segmentation was proposed. The attention mechanism was added between the encoder and decoder of this network, thus establishing association between feature maps to generate global semantic information. At the same time, the loss function was optimized to further solve the imbalance between categories and improve the robustness of the model. In addition, multi-scale prediction was combined with fully connected conditional random field (FC/Dense CRF) to improve the smoothness of the edge of segmentation results. A large number of experiments were carried out on the challenging DAVIS 2016 dataset, and the proposed method obtained more competitive segmentation results than the most advanced ones.

Figures and Tables | References | Related Articles | Metrics

PointMLP-FD: a point cloud classification model based on multi-level adaptive downsampling

LIANG AO, LI Zhi-han, HUA Hai-yang

2023, 44(1): 112-119. DOI: 10.11996/JG.j.2095-302X.2023010112

Abstract ( 381 )

HTML ( 9 )

PDF (2280KB) ( 185 )

Due to the influence of objective factors, such as hardware limitations, object occlusion, and background clutter, the target point clouds collected by sensors have strong sparsity and density inhomogeneity, resulting in low learning efficiency of point cloud features by the classification model and poor classification generalization ability. To address these challenges, a point cloud classification model PointMLP-FD (feature-driven) was proposed based on multi-level adaptive downsampling. Multiple MLP modules were designed as network branches in the model, and with the shallow features of point clouds as inputs, feature expressions in each point cloud category dimension could be obtained. Then the points with stronger semantic features were selected to form the downsampled point set according to the ranking of the feature expressions. The information reflecting the essential features of the target could be self-adaptively retained by filtering the background and the information with low relevance to the target. Finally, the losses of branch networks were calculated separately and trained in parallel with the backbone network to optimize the point cloud features and reduce the model parameters. The proposed method was tested on the Scan Object NN dataset, and the results show that compared with PointMLP-elite, the classification accuracy is higher, with 1% improvement in mAcc and 0.8% improvement in OA, approaching the performance of the SOTA model with fewer parameters.

Figures and Tables | References | Related Articles | Metrics

CTH-Net: CNN-Transformer hybrid network for garment image generation from sketches and color points

PAN Dong-hui, JIN Ying-han, SUN Xu, LIU Yu-sheng, ZHANG Dong-liang

2023, 44(1): 120-130. DOI: 10.11996/JG.j.2095-302X.2023010120

Abstract ( 243 )

HTML ( 9 )

PDF (5602KB) ( 123 )

Drawing garment images is an important part of garment design. To address the problems such as low intelligence and high requirements for users' drawing skills and imagination, a CNN-Transformer hybrid network (CTH-Net) was proposed to generate garment images from sketches and color points. CTH-Net combined the advantages of convolutional neural networks (CNN) in extracting local information and Transformer in processing long-range dependencies, efficiently fusing the architectures of these two models. The ToPatch and ToFeatureMap modules were also designed to reduce the amount and dimension of data input into Transformer, thus reducing the consumption of computing resources. CTH-Net consisted of three phases: the first drafting phase, which aimed to predict the color distribution of garments and obtain watercolor images without gradients and shadows; the second refinement phase, which refined the watercolor image into a realistic garment image; the third tuning phase, which combined the outputs of the above two phases to further optimize the generation quality. The experimental results show that CTH-Net could generate high-quality garment images by simply inputting sketches and some color points. The proposed network could outperform the existing methods in the realism and accuracy of the generated images.

Figures and Tables | References | Related Articles | Metrics

Adaptive bilateral filtering point cloud smoothing and IMLS evaluation method considering normal outliers

CHEN Ya-chao, FAN Yan-guo, YU Ding-feng, FAN Bo-wen

2023, 44(1): 131-138. DOI: 10.11996/JG.j.2095-302X.2023010131

Abstract ( 217 )

HTML ( 8 )

PDF (3718KB) ( 87 )

In order to address the problems that the unreasonable parameters of bilateral filtering lead to poor smoothing effect of point cloud, volume shrinkage, and the limitation of existing quality evaluation methods, a bilateral filtering algorithm with adaptive parameters and a quality evaluation method based on implicit moving least squares (IMLS) were proposed. Firstly, the KD-tree data structure was constructed for point cloud topology, then the neighborhood of each point was searched to calculate the normal of each point using the SVD decomposition method, and the normal outlier factor was introduced into bilateral filtering to remove outliers in the neighborhood. Additionally, the space and normal characteristic parameters were calculated according to the Gaussian kernel function extended by the neighborhood norm. Finally, the constructed bilateral filtering model was applied to the smoothing of the point cloud, and the implicit moving least squares method was introduced to evaluate the quality of smoothing. The experimental results of the point cloud with noise show that the adaptive bilateral filtering point cloud smoothing algorithm considering normal outliers could attain a good effect and result in smaller volume shrinkage compared with other algorithms, and that the IMLS evaluation method could be objective and effective.

Figures and Tables | References | Related Articles | Metrics

A Transformer-based 3D human pose estimation method

WANG Yu-ping, ZENG Yi, LI Sheng-hui, ZHANG Lei

2023, 44(1): 139-145. DOI: 10.11996/JG.j.2095-302X.2023010139

Abstract ( 533 )

HTML ( 19 )

PDF (1089KB) ( 278 )

3D human pose estimation is the foundation of human behavior understanding, but predicting reasonable 3D human pose sequences remains a challenging problem. To solve this problem, a Transformer-based 3D human pose estimation method was proposed, utilizing a multi-layer long short-term memory (LSTM) unit and a multi-scale Transformer structure to enhance the accuracy of human pose sequence prediction. First, a generator based on time series was designed to extract image features through the ResNet pre-trained neural network. Secondly, multi-layer LSTM units were used to learn the relationship between human poses in temporally continuous image sequences, thereby outputting a reasonable skinned multi-person linear (SMPL) human parameter model sequence. Finally, a multi-scale Transformer-based discriminator was constructed, and the multi-scale Transformer structure was employed to learn detailed features for multiple segmentation granularities, especially the Transformer block encoding the relative position to enhance the local feature learning ability. Experimental results show that the proposed method could yield better prediction accuracy than the VIBE method, which is 7.5% lower than the average (per) joint position error (MPJPE) of VIBE on the 3DPW dataset, and 1.8% lower than VIBE's MPJPE on the MP-INF-3DHP dataset.

Figures and Tables | References | Related Articles | Metrics

Feature-preserving skeleton extraction algorithm for point clouds

WANG Jia-dong, CAO Juan, CHEN Zhong-gui

2023, 44(1): 146-157. DOI: 10.11996/JG.j.2095-302X.2023010146

Abstract ( 322 )

HTML ( 15 )

PDF (2657KB) ( 177 )

The skeleton extraction of 3D models is one of the most important research topics in computer graphics. For point clouds with noise, the difficulty of curve skeleton extraction lies in maintaining the correct topology and good centrality. For point clouds without noise, the difficulty of curve skeleton extraction lies in the preservation of the detail features of the model. The current mainstream point clouds skeleton extraction methods usually cannot solve these two difficulties at the same time. The proposed algorithm combined the idea of clustering on the basis of the optimal transport theory, and transformed the problem of point clouds skeleton extraction into an optimization problem. Firstly, the optimal transport plan between the original point cloud and the sampled point cloud was computed. The original point cloud was segmented by clustering and the sampling points served as the center of the clusters. Then the number of clusters was reduced and the clustering results were optimized by adjusting and merging between clusters. Finally, after being obtained by the iterative method, the rough skeleton was optimized by interpolation operation. A large number of experimental results show that the proposed algorithm can extract good-quality curve skeletons and retain the features of the model on both noisy and noise-free 3D point clouds.

Figures and Tables | References | Related Articles | Metrics

Edge length based 3D shape interpolation

LIU Zhen-ye, CHEN Ren-jie, LIU Li-gang

2023, 44(1): 158-165. DOI: 10.11996/JG.j.2095-302X.2023010158

Abstract ( 189 )

HTML ( 5 )

PDF (2890KB) ( 122 )

Shape interpolation is of important and fundamental significance to computer graphics and geometry processing, which is widely employed in computer animation and other fields. It is noted that for planar triangular meshes and 3D tetrahedral meshes, interpolating squared edge lengths is equivalent to interpolating pullback metric. Therefore, it has the good property that isometric distortion and conformation distortion are bounded simultaneously. A triangular mesh interpolation algorithm based on edge lengths was proposed by extending that to triangular meshes. Given the edge lengths, the edge length error energy was optimized using Newton's method in the stage of mesh reconstruction. In addition, the costly eigenvalue decomposition could be avoided by giving the analytic positive definite form of its Hessian matrix. It was noted that the interpolation of squared edge lengths of the tetrahedral meshes resulted in very low curvature, meaning that it could be flattened and embedded in 3D space with only a few modifications. Therefore, we proposed to first convert the triangular meshes into tetrahedral meshes, and then extract the surface from the interpolation result of the tetrahedral meshes. After that the surface served as an initialization on the Newton iteration of the edge length error energy, thus bringing the convergence result closer to the global optimum. Experiments performed on a series of triangular meshes show that the proposed method leads to smaller edge length error than that of previous edge length-based methods, and that the results obtained have bounded distortion.

Figures and Tables | References | Related Articles | Metrics

A homography estimation method robust to illumination and occlusion

FAN Zhen, LIU Xiao-jing, LI Xiao-bo, CUI Ya-chao

2023, 44(1): 166-176. DOI: 10.11996/JG.j.2095-302X.2023010166

Abstract ( 171 )

HTML ( 5 )

PDF (3274KB) ( 115 )

Homography estimation is a basic task in the field of computer vision. In order to improve the robustness of homography estimation to illumination and occlusion, a homography estimation model based on unsupervised learning was proposed. This model took two stacked images as input and the estimated homography matrix as output. The bidirectional homography was proposed to estimate the average photometric loss. Then, in order to increase the receptive field and improve the resistance of the network model to deformation and position change, we introduced the spatial transformer networks (STN) module and deformation convolution to the network model. Finally, by inserting random occlusion shapes, the occlusion factors were introduced into the synthetic dataset of the homography estimation task for the first time, thus making the trained model robust to occlusion. Compared with the traditional methods, the proposed method could maintain the same or achieve better accuracy, and give superior performance in estimating the homography of image pairs with low texture or large illumination changes. Compared with the learning-based homography estimation method, the proposed method is robust to occlusion and performs better on real datasets.

Figures and Tables | References | Related Articles | Metrics

Intelligent detection method of tunnel cracks based on improved Mask R-CNN deep learning algorithm

ZHU Lei, LI Dong-biao, YAN Xing-zhi, LIU Xiang-yang, SHEN Cai-hua

2023, 44(1): 177-183. DOI: 10.11996/JG.j.2095-302X.2023010177

Abstract ( 478 )

HTML ( 17 )

PDF (1361KB) ( 297 )

Tunnel crack detection is an important task for the prevention of major disasters and daily maintenance of the tunnel, but traditional manual detection would incur huge workload and cannot meet practical needs. The deep learning neural network Mask R-CNN model was used for intelligent automatic detection of cracks, avoiding the time-consuming and labor-intensive manual detection. By adjusting the algorithm parameters and optimizing the model detection results, the Mask R-CNN model suitable for tunnel crack detection was obtained. For the results of automatic identification of cracks, the geometric characteristic parameters were further calculated. In order to make full use of the long and narrow bending characteristics of cracks and reflect the trend and basic shape of cracks, a calculation method of geometric characteristics of cracks based on skeleton extraction and function fitting was proposed. According to the crack skeleton, the crack trend could be obtained and the crack length could be calculated. Through function fitting, the function penetrating the narrow and long region of the crack could be obtained, and the width could be calculated according to the normal vector of the function. According to the calculation results of crack geometric parameters, combined with the requirements of crack width to be repaired specified in the specification, the automatic early warning of tunnel crack detection could be realized, providing technical support for the automatic detection of tunnel cracks.

Figures and Tables | References | Related Articles | Metrics

Prediction model of cultural image based on DE-GWO and SVR

PEI Hui-ning, SHAO Xing-chen, TAN Zhao-yun, HUANG Xue-qin, BAI Zhong-hang

2023, 44(1): 184-193. DOI: 10.11996/JG.j.2095-302X.2023010184

Abstract ( 151 )

HTML ( 3 )

PDF (799KB) ( 78 )

In order to more objectively and accurately quantify the relationship between cultural characteristics and imagery, a cultural image prediction model integrating the hybrid gray wolf optimization algorithm (DE-GWO) and support vector regression (SVR) was proposed. First, the image space of the cultural characteristics of the Xiangtangshan Grottoes statues was constructed based on multiple sets of image vocabulary, and cultural image cognition experiments were conducted using eye tracking technology. In doing so, with the subjects' physiological cognition data obtained, the one-way analysis of variance was carried out, and then the eye movement index parameter data set of the cultural image prediction model was obtained. Secondly, the differential evolution strategy based on the DE algorithm was introduced to make up for the problem of GWO search process stagnation. Thirdly, the improved GWO algorithm was used to change the parameter C of the SVR model and g for optimization. Finally, the constructed DE-GWO-SVR model was utilized to realize the prediction of cultural image cognition. In order to further prove the generalization of the constructed model, five models including BP, ABC-SVR, and DT were involved to conduct comparative experiments. The results show that the proposed model could achieve a better prediction effect on cultural image cognition.

Figures and Tables | References | Related Articles | Metrics

Reflections on the orientation and development of engineering graphics in the era of computing

YU Hai-yan, LIU Yan-cong, HE Yuan-jun

2023, 44(1): 194-198. DOI: 10.11996/JG.j.2095-302X.2023010194

Abstract ( 164 )

HTML ( 4 )

PDF (1766KB) ( 168 )

On the basis of the two development reports on graphics, the inheritance of the science and discipline of engineering graphics was particularly discussed, and its orientation and development in the computing era were also explored. The theories, methods, and technologies of engineering graphics, as well as descriptive geometry, were reviewed, revealing the rigorous theories and scientific values implied in its unambiguous expression and engineering computing. In the computing era, discussions were conducted on some new forms that have found their way from engineering graphics to computer graphics. The commonalities of these two major branches of graphics were analyzed, especially their supporting role for CAD regarding graphic thinking and computational thinking. With these common foundations as the core, the engineering application as the guide, and scientific research and talent training as the goal, explorations were pursued on the establishment of a disciplinary development framework for engineering graphics integrating with multiple thinking modes.

Figures and Tables | References | Related Articles | Metrics

22 references in Issue 1, 2023

2023, 44(1): 200.

Abstract ( 26 )

PDF (155KB) ( 11 )

Table of Contents