Journal of Graphics

Table of Contents for Issue 2, 2022

2022, 43(2): 1.

Abstract ( 125 )

PDF (211KB) ( 60 )

Related Articles | Metrics

Literature review of audio-driven cross-modal visual generation algorithms

JIANG Lai, YU Zhen, WANG Peng-fei, ZHOU Dong-sheng, HOU Ya-qing

2022, 43(2): 181-188. DOI: 10.11996/JG.j.2095-302X.2022020181

Abstract ( 439 )

PDF (1074KB) ( 245 )

Audio driven cross-modal visual generation algorithms have been widely employed in many fields, and
have gained attention from industry and academia in recent years. Audio and vision are the most important and
common modalities in people’s daily life. However, it has been a great challenge to creatively generate a visual scene
corresponding to the audio. The existing literature has not systematically and comprehensively studied the topic of
audio driven cross-modal visual generation. This paper summarized the existing algorithms for audio-driven
cross-modal visual generation and divided them into three categories: audio to image, audio to body motion video, and
audio to talking face video. For each category, we first described the fields of its specific applications and processes of
mainstream algorithms, and analyzed the framework technologies involved. Then the core contents, advantages, and
disadvantages of related algorithms were described according to the order of technology advancement, and their generation and performance effects were explained. Finally, the opportunities and challenges in the current field were
discussed and the future research suggestions were provided.

Related Articles | Metrics

Deep learning based pixel-level public architectural floor plan space recognition

GAO Ming, ZHANG He-hua, ZHANG Ting-rui, ZHANG Xuan-ming

2022, 43(2): 189-196. DOI: 10.11996/JG.j.2095-302X.2022020189

Abstract ( 485 )

PDF (1332KB) ( 351 )

Pixel-level floor plan space recognition plays an important role in applications such as floor plan review and model reconstruction from drawings. Targeting at housing floor plans, the existing methods recognize spaces directly based on semantic segmentation. Public architectural floor plans feature more noising lines and elements, higher resolution, and more space varieties. Higher resolution makes it hard to acquire global information in a floor plan, while the variety of spaces makes it impossible to gain the clear range of room types, both features rendering the existing space recognition approaches unpractical. To recognize spaces in public architectural floor plans, a dataset named Public Architectural Floor Plan Dataset was proposed, including 20 floor plans labeled with walls at the pixel level and 100 floor plans labeled with elements at the bounding box level. A deep learning-based space boundary recognition approach was proposed. This approach could enhance the accuracy in recognizing walls, with the proposed center line extraction and key line minimum square error loss function, and could recognize spaces by enclosing space. A space contour optimization algorithm was proposed, which in experiments could reduce the number of contour points and reserve the shape of spaces. Experimental results show that this method breaks through the limitation of resolution and room type range, attains satisfying space recognition performance, and presents a solution to recognizing spaces of public architectural floor plans. Compared with existing methods, the proposed method reaches a higher recall ratio while the precision score is guaranteed.

Related Articles | Metrics

Multimodal small target detection based on remote sensing image

HU Jun, GU Jing-jing, WANG Qiu-hong

2022, 43(2): 197-204. DOI: 10.11996/JG.j.2095-302X.2022020197

Abstract ( 974 )

PDF (3440KB) ( 421 )

Since targets in remote sensing images are relatively small and easily affected by illumination, weather, and
other factors, deep-learning based target detection methods from single modality remote sensing images suffer from
low accuracy. However, the image information between different modalities can enhance each other to improve the
performance of target detection. Therefore, based on RGB and infrared images fusion, we proposed a balanced
multimodal depth model (BMDM) for multimodal small target detection from remote sensing images. As opposed to
simple element-wise summation, element-wise multiplication, and concatenation to fuse the feature information of the
two modalities, we designed a balanced multimodal feature method to enhance target features to make up for the
shortcomings of single modal information. We first extracted low-level features from RGB and infrared images,
respectively. Secondly, we fused the feature information of the two modalities and extracted deep-level features.
Thirdly, we constructed a multimodal small target detection model based on the one-stage method. Finally, the
effectiveness of the proposed method was verified by the experimental results of multimodal small target detection
performed on the public dataset VEDAI of remote sensing images.

Related Articles | Metrics

Image segmentation algorithm based on improved pixel correlation model

ZHANG Yan , GAO Xin , LIU Yi , ZHANG Xiao-feng, ZHANG Cai-ming

2022, 43(2): 205-213. DOI: 10.11996/JG.j.2095-302X.2022020205

Abstract ( 151 )

PDF (1643KB) ( 100 )

Image segmentation is the research hotspot and difficulty in computer vision. Based on local information, the
fuzzy local information C-means (FLICM) clustering algorithm improves the robustness of the algorithm to a certain
extent, but cannot attain the expected image segmentation effect in the case of high noise intensity. Aiming at the low
segmentation accuracy of traditional fuzzy clustering algorithm, an improved image segmentation algorithm based on
pixel correlation model was proposed. Firstly, a new pixel correlation model was designed by analyzing the local
statistical characteristics of pixels. On this basis, non-local information was effectively employed to mine the details in
the image and improve the image segmentation effect. In the experiment, a variety of evaluation indexes were used to
evaluate the segmentation results, and compared with a variety of common fuzzy clustering algorithms. Experimental
results show that the fuzzy clustering algorithm based on improved pixel correlation can effectively balance the degree of
resistance to noise and the degree of retention of image details in synthetic images, natural images, medical images, and
remote sensing images, and that the segmentation effect and robustness are superior to the correlation algorithm.

Related Articles | Metrics

Monocular depth estimation of ASPP networks based on hierarchical compress excitation

LIAO Zhi-wei , JIN Jing , ZHANG Chao-fan, YANG Xue-zhi

2022, 43(2): 214-222. DOI: 10.11996/JG.j.2095-302X.2022020214

Abstract ( 160 )

PDF (3117KB) ( 75 )

Scene depth estimation is a basic task of scene understanding, and its accuracy reflects the degree of
computer’s understanding of scene. Traditional depth estimation employs the atrous spatial pyramid pooling (ASPP)
module to process different pixel features without changing the image resolution. However, this module does not
consider the relationship between different pixel features, leading to inaccurate scene feature extraction. In view of the disadvantages of the ASPP module in depth estimation, an improved ASPP module was proposed to solve the
distortion problem of the ASPP module in image processing. Firstly, the proposed module was added after the
convolution kernel. Combined with the relationship between the features of each pixel, the method of enabling the
network to adaptively learn the part of interest can effectively extract the features accurately according to the given
image. Then the problem of network hierarchy optimization was solved by constructing difference matrix. Finally, the
depth estimation network model was built on the indoor public dataset NYU-Depthv2. Compared with the current
mainstream algorithms, the algorithm can achieve good performance in both qualitative and quantitative indexes.
Under the same evaluation index, compared with the most advanced algorithm, the accuracy of  1 threshold is
improved by nearly 3%, the root mean square error and absolute error are decreased by 1.7%, and the log domain error
(lg) is decreased by about 0.3%. The improved ASPP network model proposed in this paper addresses the problem that
the traditional ASPP modules fail to take into account the relationship between different pixel features. It can
effectively make the model more convergent, significantly improve the ability of feature extraction, and produce more
accurate results of scene depth estimation.

Related Articles | Metrics

Sequential multi-scale autoencoder for video anomaly detection

LYU Hao, YI Peng-fei, LIU Rui, ZHOU Dong-sheng, ZHANG Qiang, WEI Xiao-peng

2022, 43(2): 223-229. DOI: 10.11996/JG.j.2095-302X.2022020223

Abstract ( 213 )

PDF (1550KB) ( 116 )

Video anomaly detection refers to identifying events inconsistent with expected behaviors. Many current
methods detect abnormalities through reconstruction errors. However, due to the powerful capabilities of deep neural
networks, abnormal behaviors may be reconstructed, which is inconsistent with the hypothesis that the reconstructed
error of abnormal behavior is large. However, the method of predicting future frames for anomaly detection has
achieved good results, but most of these methods neither consider the diversity of normal sample, nor establish the
association between consecutive frames of the video. In order to solve this problem, we proposed a sequential
multi-scale autoencoder network to predict future frames, and completed video anomaly detection through the
difference between the predicted value and the truth value. The network not only explicitly considers the diversity of normal events, but also constructs long-range spatial dependencies through a powerful encoder, thereby enhancing the
diversity of output features. In addition, for the complex dataset containing more noises, we proposed denoising
network to further improve the accuracy of the model. Under the premise of fulfilling real-time requirements, this
method has achieved the best accuracy so far on the Avenue dataset.

Related Articles | Metrics

Efficient pedestrian detector combining depthwise separable convolution and standard convolution

ZHANG Yun-bo, YI Peng-fei, ZHOU Dong-sheng, ZHANG Qiang, WEI Xiao-peng

2022, 43(2): 230-238. DOI: 10.11996/JG.j.2095-302X.2022020230

Abstract ( 187 )

PDF (768KB) ( 98 )

Pedestrian detectors require the algorithm to be fast and accurate. Although pedestrian detectors based on deep
convolutional neural networks (DCNN) have high detection accuracy, such detectors require higher capacity of
calculation. Therefore, such pedestrian detectors cannot be deployed well on lightweight systems, such as mobile devices,
embedded devices, and autonomous driving systems. Considering these problems, a lightweight and effective pedestrian detector (EPDNet) was proposed, which can better balance speed and accuracy. First, the shallow convolution layers of the backbone network employed depthwise separable convolution to compress the parameters of model, and the deeper convolution layers utilized standard convolution to extract high-level semantic features. In addition, in order to further improve the performance of the model, the backbone network adopted a feature fusion method to enhance the expression ability of its output features. Through comparative experiments, EPDNet has shown superior performance on two
challenging pedestrian datasets, Caltech and CityPersons. Compared with the benchmark model, EPDNet has obtained a
better trade-off between speed and accuracy, improving the speed and accuracy of EPDNet at the same time.

Related Articles | Metrics

Face detection and embedded implementation of lightweight network

ZHANG Ming, ZHANG Fang-hui, ZONG Jia-ping, SONG Zhi, CEN Yi-gang, ZHANG Lin-na

2022, 43(2): 239-246. DOI: 10.11996/JG.j.2095-302X.2022020239

Abstract ( 201 )

PDF (10890KB) ( 221 )

In recent years, face detection based on convolutional neural networks (CNN) has dominated this field, and
the detection results on the public benchmark set have also been significantly improved. However, the computational
cost and model complexity are on the rise. It remains a challenge to apply face detection model to embedded devices
with limited computing power and memory capacity. Aiming at the application of face detection of 320×240
resolution input images in embedded systems, a low-resolution face detection algorithm based on lightweight network
was proposed. The backbone network employed the attention module, combined Distance-IoU (DIoU) and Non-Maximum Suppression (NMS), and adopted the Mish activation function. Meanwhile, an appropriate a priori box
was set for the face feature ratio. In doing so, the balance could be achieved between precision and speed, and it could
be deployed to the embedded platform. Specifically, deep separable convolution was used to replace ordinary
convolution, and an attention convolutional block attention module (CBAM) was added after the convolution block to
keep the network’s focus on the target object to be recognized. Instead of the ReLU activation function, the Mish
activation function was used to improve the model inference speed. By combining DIoU and NMS, the algorithm’s
detection accuracy for small faces was enhanced. The results of experiments on the WIDER FACE dataset prove that
the proposed method not only can detect human faces with high accuracy in real time, but also has higher accuracy
than traditional algorithms in small resolution input. After expanding the dataset, the proposed model also improves
the detection accuracy under complex illuminations.

Related Articles | Metrics

Fully automatic matting algorithm for portraits based on deep learning

SU Chang-bao, GONG Shi-cai

2022, 43(2): 247-253. DOI: 10.11996/JG.j.2095-302X.2022020247

Abstract ( 157 )

PDF (1222KB) ( 101 )

Aiming at the problems of low completeness of character matting, insufficiently refined edges, and
cumbersome matting in matting tasks, an automatic matting algorithm for portraits based on deep learning was
proposed. The algorithm employed a three-branch network for learning: the semantic information of the
semantic segmentation branch (SSB) learning  graph, and the detailed information of the detail branch (DB)
learning  graph. The combination branch (COM) summarized the learning results of the two branches. First, the
algorithm’s coding network utilized a lightweight convolutional neural network MobileNetV2, aiming to
accelerate the feature extraction process of the algorithm. Second, an attention mechanism was added to the SSB
branch to weight the importance of image feature channels, the atrous spatial pyramid pooling module was added
to the DB branch, and multi-scale fusion was achieved for the features extracted from the different receptive
fields of the image. Then, the two branches of the decoding network merged the features extracted by the
encoding network at different stages through the jump connection, thus conducting the decoding. Finally, the
features learned by the two branches were fused together to obtain the image  graph. The experimental results
show that on the public data set, this algorithm can outperform the semi-automatic and fully automatic matting algorithms based on deep learning, and that the effect of real-time streaming video matting is superior to that of
Modnet.

Related Articles | Metrics

Stereo image zero watermarking algorithm based on DOCT and SURF

HAN Shao-cheng, ZHANG Peng

2022, 43(2): 254-262. DOI: 10.11996/JG.j.2095-302X.2022020254

Abstract ( 71 )

PDF (11970KB) ( 57 )

Aiming at the poor geometric attack resistance of most current stereo image zero watermarking schemes, a
blind detection stereo image zero watermarking algorithm based on discrete octonion cosine transform (DOCT) and
speeded up robust features (SURF) was proposed. Firstly, the stationary wavelet transform (SWT) was performed on six
components of left and right views of the original stereo image under CIEXYZ color space. Secondly, the above six
low-frequency subbands were divided into non-overlapping blocks to construct the octonion image blocks at the
corresponding positions, and then the DC coefficients of all image blocks after DOCT were directly calculated in spatial
domain. Finally, the robust feature matrix was constructed by comparing the magnitude relationship between the
modulus of each octonion DC coefficient and their overall mean value. Then, the final zero watermark for authenticating
was generated by executing XOR operation on the feature matrix and encrypted watermark that had been processed by
quantum key scrambling and 2D-LALM system encrypting. In addition, the SURF method was employed to perform
geometric correction on the stereo image to be authenticated before the zero-watermark detection. Experimental results
show that the proposed algorithm displays better robustness against conventional attacks and geometric attacks.

Related Articles | Metrics

Ultrasound image segmentation model based on edge entropy and local FT distribution

CUI Wen-chao , XU De-wei , SUN Shui-fa , PAN Zhi-hong , WANG Xi-dong

2022, 43(2): 263-272. DOI: 10.11996/JG.j.2095-302X.2022020263

Abstract ( 96 )

PDF (1129KB) ( 55 )

Local Gaussian distribution fitting (LGDF) or local Rayleigh distribution fitting (LRDF) models often give
relatively poor performance on segmenting ultrasound images, due to the large bias in describing ultrasound images
by either Gaussian or Rayleigh distribution, and the lack of guidance for ultrasound images edge information during
image segmentation. To deal with these problems, an edge entropy weighted local Fisher-Tippett (FT) distribution
fitting model was presented in this paper. According to the fact that the object and background in local regions of
ultrasound images meet with different FT distributions, the proposed model adopted maximum a posteriori (MAP)
probability to derive an energy function to be minimized. The energy function was solved by the level set method.
Meanwhile, the edge entropy was included into the length regularization term as a weight function to guide the active
contour to better capture the obscure and weak edges of the object. Extensive experiments on synthetic and real ultrasound images have demonstrated that the proposed model can not only achieve an enhancement for the local FT
distribution fitting and the inclusion of the edge entropy, but also qualitatively and quantitatively outperform many of
the existing methods.

Related Articles | Metrics

A U-Net based contour enhanced attention for medical image segmentation

LI Cui-yun, BAI Jing, ZHENG Liang

2022, 43(2): 273-278. DOI: 10.11996/JG.j.2095-302X.2022020273

Abstract ( 1666 )

PDF (1519KB) ( 942 )

Medical image segmentation is vital for medical image processing. With the development of deep learning,
image segmentation techniques have achieved remarkable development. However, there remain fuzzy and inaccurate
problems in the discrimination of contour pixels for lesion features. To address the problems, we proposed a contour
enhanced attention (CEA) module. It can obtain rich location information by feature encoding in two different
directions and strengthen contours by calculating the offset between location features and input features. Furthermore,
we constructed a U-Net for medical image segmentation based on the proposed module, it can break through the space
limitation of convolution kernel, thus capturing position-aware cross-channel information and clearer edge contour
information. In doing so, the accuracy of segmentation can be improved. Experiments on the public Kvasir-SEG dataset demonstrates that the network with CEA module achieves better results in Dice, precision, recall rate, and
other evaluation indexes in medical segmentation.

Related Articles | Metrics

Finger-knuckle-print recognition based on multi-dimensional matching distances fusion

HUANG Jie, WEI Xin, YANG Zi-yuan, MIN Wei-dong

2022, 43(2): 279-287. DOI: 10.11996/JG.j.2095-302X.2022020279

Abstract ( 89 )

PDF (942KB) ( 77 )

As a novel biometric modality, finger-knuckle-print (FKP) recognition has gained much attention for its
security and stability. Coding-based methods are considered as one of the most effective methods in this field. Such
methods can distinguish samples according to one single matching distance between two images computed from the
extracted features in the template matching stage. However, some fuzzy samples cannot be effectively distinguished
by one single matching distance, leading to false acceptance and false rejection. To address this problem, a
light-weight and effective method based on multi-dimensional matching distances fusion was proposed in this paper. The proposed method utilized the difference and complementarity between different matching distances of multiple
coding-based methods, and applied support vector machine (SVM) to the classification of the multi-dimensional
feature vectors constructed by the multiple matching distances. What’s more, the proposed method is a general
method, which can be easily embedded into the existing coding-based methods. Extensive experiments were
conducted for the range from two-dimensional matching distances to four-dimensional matching distances on the
public FKP database, PolyU-FKP. The results have shown that the proposed method can generally improve their
performances, with a maximum reduction of 22.19% in EER.

Related Articles | Metrics

Landmark detection based on perspective down-sampling and neural network

LI Yu-zhen, CHEN Hui, WANG Jie, RONG Wen

2022, 43(2): 288-295. DOI: 10.11996/JG.j.2095-302X.2022020288

Abstract ( 70 )

PDF (9278KB) ( 50 )

In the field of intelligent driving, a neural network-based and perspective down-sampling-based landmark
detection method was proposed to accurately detect the road guide signs in real time. This proposed method can
effectively solve the problems of poor real-time performance of traditional detection methods and low detection
accuracy for complex scenes and remote small targets. Firstly, the region of interest for the image was selected for
perspective down-sampling to reduce the near resolution of the road image, reduce the image size, and eliminate the
perspective projection error. Secondly, the YOLOv3-tiny target detection network was enhanced. The boundary frame
clustering of self-built data set was implemented by k-means++. The convolution layer was added to strengthen the
shallow features and enhance the small target representation ability. By changing the fusion scale of feature pyramid,
the prediction output was adjusted to 26×26 and 52×52. Finally, the accuracy rate was elevated from 78% to 99% on
the self-built multi-scene data set, and the model size was reduced from 33.8 MB to 8.3 MB. The results show that a neural network-based and perspective down-sampling-based landmark detection method displays strong robustness,
higher detection accuracy for small targets, and is readily deployable on low-end embedded devices.

Related Articles | Metrics

A traffic police object detection method based on optimized YOLO model

LI Ni-ni, WANG Xia-li, FU Yang-yang, ZHENG Feng-xian, HE Dan-dan, YUAN Shao-xin

2022, 43(2): 296-305. DOI: 10.11996/JG.j.2095-302X.2022020296

Abstract ( 277 )

PDF (19603KB) ( 362 )

To tackle the problems of low accuracy of detection and localization for traffic police object in complex
traffic scenes, a method to detect traffic police object based on the optimized YOLOv4 model was proposed in this
study. Firstly, four random transformation methods were employed to expand the self-built traffic police data set, so as
to solve the problem of model over-fitting and improve the generalization ability of the network model. Secondly, the
YOLOv4 backbone network was replaced with the lightweight MobileNet. The Inception-Resnet-v1 structure was
introduced to reduce the parameter numbers and deepen the network layers of the model effectively. Then, the
K-means++ clustering algorithm was adopted to perform clustering analysis on the self-built data set. In doing so, the
initial candidate frame of the network was redefined, and the learning efficiency was improved for traffic police object
depth features. Finally, to address the problem of the imbalance of positive and negative samples in the process of
network training, the focus loss function was introduced to optimize the classification loss function. Experimental
results demonstrate that the size of the optimized YOLOv4 model is only 50 M and the AP value reaches up to 98.01%.
compared with Faster R-CNN, YOLOv3, and the original YOLOv4 model, the optimized network has been
significantly improved. The proposed method can effectively solve the problems of missed detection, false detection, and low accuracy for traffic police object in current complex traffic scenes.

Related Articles | Metrics

Mixed reality simulation system for emergency escape design of civil aircraft flight crew

WU Cheng-cheng, LYU Yi, YUAN Xin-hao, XU Shu-hong

2022, 43(2): 306-315. DOI: 10.11996/JG.j.2095-302X.2022020306

Abstract ( 100 )

PDF (4249KB) ( 74 )

Emergency escape simulation for civil aircraft crew helps to identify the potential problems of crew escape
hatch design during the early development of aircrafts, and ensures the safety of crew members. This paper presented
a mixed-reality simulation system for emergency escape of civil aircraft flight crew. To solve the key problem of
human body virtual-physical matching, an optical-inertial hybrid whole-body human motion capture method was
proposed. The method, working together with the Kinect2-based human body key dimension matching technique, can
effectively improve the efficiency and robustness of human body virtual-physical matching. The proposed
mixed-reality simulation system has been successfully applied to the development of large domestic aircrafts.
Experimental results show its efficiency in the evaluation of crew escape hatch design.

Related Articles | Metrics

Two-stage adjustable perceptual distillation network for virtual try-on

CHEN Bao-yu, ZHANG Yi, YU Bing-bing, LIU Xiu-ping

2022, 43(2): 316-323. DOI: 10.11996/JG.j.2095-302X.2022020316

Abstract ( 85 )

PDF (5051KB) ( 73 )

It is known that image-based virtual try-on can fit a target garment image to a person image, and that this
task has gained much attention in recent years for its wide applications in e-commerce and fashion image editing. In
response to the characteristics of the task and the shortcomings of existing approaches, a method of two-stage
adjustable perceptual distillation (TS-APD) was proposed in this paper. This method consisted of 3 steps. Firstly, two
semantic segmentation networks were pre-trained on garment image and person image respectively, thus generating
more accurate garment foreground segmentation and upper garment segmentation. Then, these two semantic
segmentations and other parsing information were employed to train a parser-based “tutor” network. Finally, a
parser-free “student” network was trained through a two-stage adjustable perceptual distillation scheme, taking the
fake image generated by the “tutor” network as input and the original real person images as supervision. It can be
perceived that the “student” model with distillation is able to produce high-quality try-on images without human
parsing. The experimental results on VITON datasets show that this algorithm can achieve 9.10 FID score, 0.015 3 L 1
score, and 0.985 6 PCKh score, outperforming the existing methods. The user survey also shows that compared with
other methods, the images generated by the proposed method are more photo-realistic, with all the preference scores reaching more than 77%.

Related Articles | Metrics

Track fastener localization algorithm based on geometric features and the spike center point localization

CAO Yi-qin, YI Hu, QIU Yi, ZHOU Yi-wei

2022, 43(2): 324-332. DOI: 10.11996/JG.j.2095-302X.2022020324

Abstract ( 78 )

PDF (25856KB) ( 65 )

To solve the problems of positioning failure and accuracy reduction caused by skewedness and nonstandard
size of images in the track image, a fastener positioning algorithm based on the spike center point location and
geometric structure features was proposed. The new method adopted the idea of first locating the center point of the
spike, and then locating the fasteners with geometric features. Based on the edge image obtained by image
preprocessing, the edges of track spike in the image would be characteristic of roundness after being corroded and
dilated. Then, by means of the Hough transform circle detection algorithm, the rough area of the spike was located and
expanded, so that the spike area could be roughly extracted from the original image. The edges of spike area image
were then detected and OpenCV, a contour extraction and polygon detection algorithm, was employed to accurately fit
the spike hexagon and calculate the spike center point. Finally, the coordinates of each vertex of the fastener bounding
box was obtained using the fastener location algorithm proposed based on geometric structure features. The
experiment results show that the positioning accuracy of the new algorithm is 99.33%, the precision is 0.997, and the
speed is 29.8 fps, superior to the algorithms compared. Meanwhile, under different circumstances, such as weather
conditions, spike corrosion, or occlusion, the new algorithm displays better robustness and anti-interference ability.

Related Articles | Metrics

Lightweight human pose estimation with global pose perception

LIU Yu-jie, ZHANG Min-jie, LI Zong-min, LI Hua

2022, 43(2): 333-341. DOI: 10.11996/JG.j.2095-302X.2022020333

Abstract ( 229 )

PDF (4932KB) ( 135 )

Human pose estimation has been a hot topic in the field of human-computer interaction in recent years. At
present, the common methods for human pose estimation focus on improving the accuracy by increasing the network
complexity. However, the cost-effectiveness of the model was ignored, resulting in high accuracy of the model in
practice but huge consumption of computational resources. In this paper, a model for lightweight hu-man pose estimation
with global pose perception was designed. It has an accuracy of 68.2% AP on the MSCOCO dataset, and the speed
remains at 255 fps, and the parameter amount and FLOPS are 10% and 0.9% that of the OpenPose method, respectively.
In the human pose estimation task, the number of output channels of the network will be set according to the number of
predicted key joints, leading to independent detection of each key joint. Global information, such as the relative position
between key points and the overall layout, is of great significance to the pose estimation task for difficult samples, in which was absent from previous studies. In order to utilize the global pose information, a global pose perception module
was designed to extract the global pose features, and the two-branch network was employed to fuse the global and local
pose features. Experiments show that the lightweight human pose estimation network with global pose perception can
increase the accuracy by 1.5% and 1.3% on the MPII and MSCOCO datasets, respectively.

Related Articles | Metrics

FFF of a three-dimensional continuous weaving filling pattern

WU Huan-xiao, YAO Yuan, YANG Jin-xiu, DING Cheng

2022, 43(2): 342-347. DOI: 10.11996/JG.j.2095-302X.2022020342

Abstract ( 83 )

PDF (2071KB) ( 62 )

In order to improve the mechanical strength and reduce the anisotropy of fused filament fabrication (FFF)
workpiece, a three-dimensional continuous braiding path planning method was proposed. The method employed
continuous fiber reinforced material as printed material, designed an eight-loop structure, and utilized 3D printer
nozzle to extrude wire for the generation of warp/weft yarn. The movement of the FFF platform was controlled in z
direction, continuous deposition route similar to 3D woven fiber was produced, and different layers were mutually
interacted and embedded in order to realize interlock between adjacent section planes, thus improving the connection
strength within and between layers. This cyclic structure supported continuous path planning, so that it can be
manufactured on a conventional three-axis fuse manufacturing platform and be widely applicable. Compared with the
standard sample, the rationality and feasibility of the braided structure were verified. Experiments show that the 3D
continuous fiber braided printing path can support the filling of different structures, and can effectively reduce the
anisotropy of mechanical properties caused by layered deposition of materials, thereby enhancing the reliability of
printing pieces with complex structures.

Related Articles | Metrics

Relationship model between Chinese character font stroke shape and emotional image

OUYANG Jin-yan, GAO Xuan-han, ZHANG Shu-tao, WANG Xu-hong, ZHOU Ai-min

2022, 43(2): 348-355. DOI: 10.11996/JG.j.2095-302X.2022020348

Abstract ( 145 )

PDF (2562KB) ( 105 )

In order to reveal the internal relationship between the morphological features of Chinese characters and the
emotional images of the audience, a relationship model between the morphological features of Chinese characters and
the emotional image was proposed from the perspective of visual cognition. First, the design elements of Chinese
character font stroke shape were analyzed to construct its project and category table using the morphological analysis
method. Then, the K-means clustering algorithm was employed to select the representative emotional image words,
and a semantic difference (SD) questionnaire was issued to obtain the emotional image scores for each font sample.
Finally, the multiple linear regression method was used to establish the relationship model between the design
elements of font stroke shape and emotional images. From the coefficient of the expression, the influence of each
morphological feature element on the emotional images can be analyzed. The model can provide technical support for
the image positioning of Chinese character font design, and provide a new idea and method for the relevant research.
It is applicable to the practice in the field, and the results show that the method is of high feasibility and reliability.

Related Articles | Metrics

Complexity analysis method of human-machine interaction task in intelligent vehicle cockpit

MA Ning, WANG Ya-hui

2022, 43(2): 356-360. DOI: 10.11996/JG.j.2095-302X.2022020356

Abstract ( 354 )

PDF (780KB) ( 287 )

The tasks and behaviors of human-machine interaction (HMI) in the intelligent vehicle cockpit directly
affect users’ experience in the cockpit. To prevent the risk of poor interface usability for the automobile interior and
exterior designers and car UI designers, the HMI behaviors in intelligent vehicles were studied quantitatively, and the
complexity indexes of HMI tasks were summarized. Then the specific task indexes affecting the HMI complexity in
the intelligent cockpit and their weight distribution were extracted, and an entropy-based measurement method for
HMI task complexity of intelligent vehicles was proposed. Finally, the algorithm was verified by an example of an
intelligent car cockpit. The results showed that the complexity of HMI tasks in the cockpit was impacted by many
factors, such as the logical structure of HMI task, the knowledge level and cognitive quantity of HMI, and the
complexity of HMI digital interface layout in the cockpit. These factors warrant more attention from designers. The
proposed method can help designers avoid the high risk of design complexity and cost of user learning, and assist
them to intervene in advance in the design problems related to the above indicators.

Related Articles | Metrics

Published as 2, 2022

2022, 43(2): 361-361.

Abstract ( 51 )

PDF (109822KB) ( 43 )

Related Articles | Metrics