Welcome to Journal of Graphics share: 
Bimonthly, Started in 1980
Administrated: China Association for  Science and Technology
Sponsored: China Graphics Society
Edited and Published: Editorial Board  of Journal of Graphics
Chief Editor: Guoping Wang
Editorial Director: Xiaohong Hou
ISSN 2095-302X
CN 10-1034/T
Current Issue
31 December 2024, Volume 45 Issue 6 Previous Issue   
For Selected: Toggle Thumbnails
Cover
Cover of issue 6, 2024
2024, 45(6): 1. 
PDF 32     56
Related Articles | Metrics
Contents
Table of Contents for Issue 6, 2024
2024, 45(6): 2. 
PDF 9     20
Related Articles | Metrics
Special Topic on “Large Models and Graphics Technology and Applications”
Research progress and trends in large model technologies for virtual reality
YANG Haozhong, KONG Xiaoyu, GU Ruikun, WANG Miao
2024, 45(6): 1117-1131.  DOI: 10.11996/JG.j.2095-302X.2024061117
HTML    PDF 57     53

With the advancement of computer technology, virtual reality (VR) has matured, offering users immersive and high-quality experiences across various applications. VR has become a pivotal research direction in computer graphics and human-computer interaction. Large model technologies, a prominent research focus, have provided novel solutions and ideas for classic problems across multiple fields. However, there remains a notable shortage of comprehensive reviews on the application of large model technologies within the VR domain. To address this research gap and stimulate future work, this paper systematically collected, organized, and synthesized recent studies on the utilization of large models in VR environments. It outlined the fundamental principles and representative categories of large models, followed by a detailed analysis of their research progress and applications in two main areas: content generation and human-computer interaction. Finally, the challenges associated with integrating large models into VR environments were discussed, along with prospects for future development trends in this field.

Figures and Tables | References | Related Articles | Metrics
Prospects for the application of large models technology in the power industry
LIU Jichen, LI Jinxing, WU Jia, ZHANG Wei, QI Yunuo, ZHOU Guoliang
2024, 45(6): 1132-1144.  DOI: 10.11996/JG.j.2095-302X.2024061132
HTML    PDF 56     36

Artificial intelligence (AI) technologies have been widely applied across various specialized domains within the electric power industry, driving it towards the development of intelligent and automated systems. Particularly within the field of graph studies, the utilization of AI large models has become a prominent research focus, demonstrating substantial potential in image recognition, pattern identification, and the analysis of graph data. The application of AI large models addresses specialized issues in the power industry, such as image recognition, natural language processing, and business content analysis. This has significantly enhanced the efficiency and accuracy across various operational domains in the power sector. Focusing on the application prospects of AI large models in electric power dispatching, transmission, and marketing, this study first introduced the research background, development process, and technical characteristics of AI large models. Then, it reviewed the current application status of AI technology in electric power dispatching fault handling, transmission drone inspection, and electric power marketing customer service. It also analyzed the existing problems and challenges in applying AI large models in the electric power industry. Finally, the development trends and technical application analysis of large model technology in the power industry were reviewed, along with a prospect of future application scenarios.

Figures and Tables | References | Related Articles | Metrics
Review on object detection in UAV aerial images
LI Qiong, KAO Yueying, ZHANG Ying, XU Pei
2024, 45(6): 1145-1164.  DOI: 10.11996/JG.j.2095-302X.2024061145
HTML    PDF 52     418

With the rapid development and deep integration of unmanned aerial vehicle (UAV) and computer vision technologies, research on object detection in UAV aerial images has gained increasing attention and has been widely applied in precision agriculture, animal monitoring, urban management, emergency rescue, and other fields. Compared to images captured from conventional perspectives, images acquired by UAVs feature a wider field of view, significantly reduced object size, and variations in viewpoint and scale, rendering conventional object detection methods inadequate. Accordingly, a detailed review of progress in object detection methods from a conventional perspective was first provided, including traditional methods, deep learning methods, and large-model-based methods. Subsequently, the innovative strategies and optimization methods proposed by existing object detection methods were summarized, specifically addressing six challenging issues specific to UAV aerial image object detection, i.e., image quality degradation, scale and viewpoint variation, small-object detection difficulty, complex background and occlusion, imbalance in large fields of view, and high real-time requirements. Additionally, UAV aerial image object detection datasets were consolidated and analyzed, with an evaluation of the performance of existing methods on two representative datasets. Finally, potential research directions for the future were outlined based on the unresolved issues in the field of UAV aerial image object detection, providing reference for the development and application of object detection in UAV aerial images.

Figures and Tables | References | Related Articles | Metrics
An efficient reinforcement learning method based on large language model
XU Pei, HUANG Kaiqi
2024, 45(6): 1165-1177.  DOI: 10.11996/JG.j.2095-302X.2024061165
HTML    PDF 34     38

Deep reinforcement learning, as a key technology supporting breakthrough works such as AlphaGo and ChatGPT, has become a research hotspot in frontier science. In practical applications, deep reinforcement learning, as an important intelligent decision-making technology, is widely used in a variety of planning and decision-making tasks, such as obstacle avoidance in visual scenes, optimal generation of virtual scenes, robotic arm control, digital design and manufacturing, and industrial design decision-making. However, deep reinforcement learning faces the challenge of low sample efficiency in practical applications, which greatly limits its application effectiveness. In order to improve the sample efficiency, this paper proposes an efficient exploration method based on large model guidance, which combines the large model with the mainstream exploration techniques. Specifically, we utilize the semantic extraction capability of a large language model to obtain semantic information of states, which is then used to guide the exploration behavior of agents. Then, we introduce the semantic information into the classical methods in single-policy exploration and population exploration, respectively. By using the large model to guide the exploration behavior of deep reinforcement learning agents, our method shows significant performance improvement in popular environments. This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems, but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

Figures and Tables | References | Related Articles | Metrics
Large language model powered UI evaluation system
CHEN Xiaojiao, SHU Yunfeng, WANG Ruihan, ZHOU Jiahuan, CHEN Wei
2024, 45(6): 1178-1187.  DOI: 10.11996/JG.j.2095-302X.2024061178
HTML    PDF 22     20

The quality of user interface (UI) design directly impacts product usability and user experience. Designers often face challenges related to consistency and accessibility during the UI design process, increasing cognitive load for users reducing efficiency. Despite awareness of these issues, they currently lack comprehensive knowledge and tools or automatic identification and resolution. To address this challenge, a comprehensive set of UI design evaluation criteria was proposed, covering five key aspects: color, text, layout, control, and icon, specifically targeting consistency and accessibility issues in UI design. Based on these evaluation criteria, a prompt template for evaluating UI consistency and accessibility was developed to enhance the accuracy of large language models (LLMs) like GPT-4 in UI evaluation tasks. Furthermore, a UI evaluation system based on the GPT-4 model was developed. This [26] deeply understood UI design content, automatically detected UI design issues according to the evaluation criteria, and provided targeted improvement suggestions to help designers optimize their UI designs. Experimental results demonstrated that using the prompt template significantly improved the accuracy of GPT-4 in UI evaluations. User studies indicated that employing this UI evaluation system in design practice can significantly enhance the quality of UI designs, thereby boosting product usability and user experience. This system provided designers with an automated UI evaluation tool, offering a new approach to enhancing UI design quality.

Figures and Tables | References | Related Articles | Metrics
Intelligent MBSE design approach based on retrieval augmented large language model
YU Han, CHEN Zhiyuan, XIONG Xirui, DAI Yuanxing, CAI Hongming
2024, 45(6): 1188-1199.  DOI: 10.11996/JG.j.2095-302X.2024061188
HTML    PDF 22     18

Model-based systems engineering (MBSE) is one of the most important methods for today’s digital design of products. However, due to the high specialization of systems engineering and the complex interrelationships within products, the application of MBSE to complex products has proven challenging. To address this problem, an intelligent design method based on retrieval-augmented large language model was proposed for the first time. The method first established an object-oriented multi-modal vector representation for models, leveraging retrieval-augmented generation techniques that incorporate domain knowledge and modeling rules to guide the model in more accurately generating MBSE model diagrams. Secondly, a diagram optimization method based on the MBSE model relations was proposed, cross-validating the model accuracy through the results of contextual interaction. Thirdly, the large language model was employed to call modelling APIs and to select the proper materials to generate design models and eBOM. Finally, a dataset containing 24 scenario models was constructed for method validation. Experimental results showed that the approach possessed high accuracy and usability. A case study with water jet propulsion as the modelling object further demonstrated that the approach can effectively enhance the modelling efficiency while maintaining usability, marking an important step toward intelligent application of model-based systems engineering.

Figures and Tables | References | Related Articles | Metrics
An intelligent maintenance system for public buildings integrating digital twin and large language model
XU Jinglin, PENG Yang, OU Jinwu, TAN Junjie, SHU Jiangpeng, YU Fangqiang
2024, 45(6): 1200-1206.  DOI: 10.11996/JG.j.2095-302X.2024061200
HTML    PDF 27     28

To address the challenges encountered in smart building operations and maintenance based on digital twins, such as complex system operations, difficulties in accessing a vast amount of construction documentation, and limited decision support in complex scenarios, a smart building operations and maintenance system integrating large models and digital twins was constructed. Innovations include efficient retrieval technology for massive information based on Retrieval Augmented Generation, efficient invocation technology for building operations and maintenance services based on large models, and intelligent building adaptation technology based on swarm intelligence. The system was applied and verified in three typical operations and maintenance scenarios. The results demonstrated that integrating large models and digital twins in constructing a public building smart operations and maintenance system aided in providing personalized building operations and maintenance services, enhanced user experience, offered complex decision support, and enabled more convenient, comfortable, safe, and green smart operations and maintenance management for public buildings.

Figures and Tables | References | Related Articles | Metrics
The computational paradigm and software framework for mechanism and data-driven physical simulation
HE Xiaowei, SHI Jian, LIU Shusen, REN Lixin, GUO Yuzhong, CAI Yong, WANG Hu, ZHU Fei, WANG Guoping
2024, 45(6): 1207-1221.  DOI: 10.11996/JG.j.2095-302X.2024061207
HTML    PDF 15     293

As the cornerstone of modern industrial software, physical simulation encompasses various computational paradigms, including mechanism-driven, data-driven, and hybrid-driven models. Meeting the demands of diverse physical simulation requires the construction of a general framework capable of flexibly adapting to various physical simulation computational paradigms while achieving efficient coupling across various computational paradigms, presenting a critical challenge in software design and development. To address this, the Data field—Node—Module— Scene graph (FNMS) architecture was proposed, targeting multi-physics simulation computational paradigms. Its core lies in the design of a four-layer structure: Data field, Node, Module, and Scene graph. Specifically, the Data field layer provides a unified data management and access interface for the simulation process, enhancing the convenience and efficiency in data sharing for physical simulation computations. The Module layer encapsulated various physical simulation algorithms, realizing algorithm modularization and reusability while solving the asynchronous coordination of simulation computation, rendering, and interaction. Through data and algorithm decoupling, the Node layer enabled algorithm reuse across different physical simulation computational paradigms, and it facilitated the exchange and sharing within multi-physics coupling processes. The Scene graph layer supported efficient coupled computations of various physical simulation computational paradigms by organizing nodes into a directed acyclic graph. Through the combination of these four layers, the FNMS architecture not only enhanced the computational efficiency and flexibility in physical simulations but also provided strong technical support for interdisciplinary and cross-domain physical simulation research.

Figures and Tables | References | Related Articles | Metrics
Adversarial example generation method for open-vocabulary detection large models based on visually-textual fusion loss
SHI Hao, WANG Shu, HAN Jianhong, LUO Zhaoyi, WANG Yupei
2024, 45(6): 1222-1230.  DOI: 10.11996/JG.j.2095-302X.2024061222
HTML    PDF 19     187

Recently, open-vocabulary detection (OVD) has become a research focus in the field of computer vision due to its potential to recognize objects from unknown categories. As a representative approach in this domain, YOLO-World possesses powerful real-time detection capabilities; however, security issues stemming from the vulnerabilities of deep learning networks cannot be overlooked. Against this backdrop, a white-box adversarial examples generation method was proposed, targeting the YOLO-World algorithm, providing insights into identifying and quantifying vulnerabilities in large models. The method utilized gradient data generated during backpropagation in the YOLO-World network to optimize predefined perturbations, which were then added to original examples to form adversarial examples. Initially, confidence scores and bounding box information from model outputs served as a basis for preliminary optimization, resulting in adversarial examples with a certain level of attack effectiveness. This was further enhanced by a visually-textual fusion loss designed according to the RepVL-PAN structure in the YOLO-World model, to increase the destructiveness of adversarial examples against the model. Finally, perturbation magnitude loss was integrated to constrain the total amount of perturbation, generating adversarial examples with limited disturbance. The adversarial examples generated by this method were capable of achieving attack objectives such as confidence reduction and bounding box displacement according to practical needs. Experimental results demonstrated that the proposed method significantly impaired the YOLO-World model, with mean average precision dropping below 5% after testing on the LIVS dataset.

Figures and Tables | References | Related Articles | Metrics
Research on KB-VQA knowledge retrieval strategy based on implicit knowledge enhancement
ZHENG Hongyan, WANG Hui, LIU Hao, ZHANG Zhiping, YANG Xiaojuan, SUN Tao
2024, 45(6): 1231-1242.  DOI: 10.11996/JG.j.2095-302X.2024061231
HTML    PDF 17     16

The knowledge-based visual question answering (KB-VQA) requires not only image and question information but also relevant knowledge from external sources to answer questions accurately. Existing methods typically involve using a retriever to fetch external knowledge from a knowledge base or relying on implicit knowledge from large models. However, solely depending on image and textual information often proves insufficient for acquiring the necessary knowledge. To address this issue, an enhanced retrieval strategy was proposed for both the query and external knowledge stages. On the query side, implicit knowledge from large models was utilized to enrich the existing image and question information, aiding. The retriever in locating more accurate external knowledge from the knowledge base. On the external knowledge side, a pre-simulation interaction module was introduced to enhance the external knowledge. This module generated a new lightweight vector for the knowledge vector, allowing the retriever to pre-simulate the interaction between the query and the knowledge passage, thus better capturing their semantic relationship. Experimental results demonstrated that the improved model can achieve an accuracy of 61.3% on the OK-VQA dataset by retrieving only a small amount of knowledge.

Figures and Tables | References | Related Articles | Metrics
Research on prompt engineering for large model art image generation
WANG Changsheng
2024, 45(6): 1243-1255.  DOI: 10.11996/JG.j.2095-302X.2024061243
HTML    PDF 26     22

With the rapid advancement of artificial intelligence technology in the field of art, prompt-driven art image generation has become highly popular. However, the rules and methods for generating artistic images using prompts remain underexplored. This study quantitatively evaluated images generated by the Midjourney model through CLIP model calculations and expert assessments, combined with participatory observation through netnography, to comprehensively reveal the rules and methods of prompt-generated art images. The results showed that with the advancement of versions (from Midjourney V2 to V5), the aesthetic quality of images generated by the Midjourney model has significantly improved, highlighting the necessity for artists and creators to continuously learn to adapt to the evolving AI models. Therefore, an optimized prompt formula was proposed, which can swiftly and efficiently generate various high-aesthetic quality images. The AI model demonstrated different capabilities across various themes, excelling in generating oil paintings, watercolor ink paintings, and anime characters, and performing well in both figurative and abstract themes, though relatively weaker in sketch and colored pencil styles. Creators should leverage its strengths in these styles for image creation. Additionally, it was found that using the best prompt combinations tailored to specific versions can greatly enhance the quality of generated images. Carefully designing prompts is crucial, and newer versions are not necessarily superior to older ones. Creators need to explore and accumulate the best prompts that match the versions. This study not only revealed the rules and methods of prompt-generated art images but also provided theoretical and practical guidance for art creators in the field of AI art creation.

Figures and Tables | References | Related Articles | Metrics
Molecular amplification time series prediction research combining Transformer with Kolmogorov-Arnold network
LIU Canfeng, SUN Hao, DONG Hui
2024, 45(6): 1256-1265.  DOI: 10.11996/JG.j.2095-302X.2024061256
HTML    PDF 26     693

With the development of medical diagnosis and treatment intervention techniques, there has been an exponential growth in medical data along time series. Artificial intelligence (AI), particularly deep learning (DL), has demonstrated significant potential in uncovering medical data along time series. This study proposed, for the first time, a method that integrates the Transformer architecture with the Kolmogorov-Arnold network (KAN) to enable predictive analysis of nucleic acid amplification experimental data. Through experimental data analysis methods, the effectiveness of the model in accurately predicting amplification trends and endpoint values was validated, achieving an endpoint value error of merely 1.87 and an R-square coefficient as high as 0.98. Moreover, the model was capable of effectively identifying experimental data from different sample types. Furthermore, this research delved into the impact of the model’s components and parameters on predictive performance through ablation experiments and hyperparameter tuning. Finally, a generalization capability test was conducted on 911 clinical data records provided by the Fujian Provincial Hospital across 10 deep learning models. The results demonstrated that the proposed Transformer-KAN network outperformed other models in terms of predictive accuracy and generalization capability. This study not only provided a new perspective for improving routine diagnostic techniques during pandemics but also offered empirical evidence for further research on the KAN model and its corresponding foundational theories.

Figures and Tables | References | Related Articles | Metrics
Traffic anomaly event analysis method for highway scenes based on multimodal large language models
WU Jingyi, JING Jun, HE Yifan, ZHANG Shiyu, KANG Yunfeng, TANG Wei, KONG Delan, LIU Xiangdong
2024, 45(6): 1266-1276.  DOI: 10.11996/JG.j.2095-302X.2024061266
HTML    PDF 29     41

To address the limitations of current traffic anomaly detection systems, which lack deep incident perception capabilities, and to address the high cost of manual review for alarmed incidents, a highway traffic anomaly analysis method based on multimodal large language models (MLLM) was researched. Three MLLM-based tasks were designed and validated: first, automatically generating detailed work order descriptions for anomalous events, enhancing the depth of event perception depth; second, reviewing alarm events using MLLM, reducing false alarms and improving detection accuracy; and third, generating descriptive narratives for anomaly event videos based on MLLM, enhancing the interpretability of events. Experimental results demonstrated that the MLLM-based work order description method improved work order information completeness and accuracy through the construction of visual instruction-tuned datasets and model fine-tuning. In the review of alarm events, MLLM effectively filtered out false alarms caused by poor image quality, false positives, and misclassifications, thus reducing manual review costs. Furthermore, the MLLM-based video description method enabled efficient anomaly analysis by sampling and describing event video frames, thus improving event explainability. Although open-source models were slightly inferior to closed-source models in specific scenarios, both types demonstrated the ability to review various false alarm issues, confirming the potential application of MLLM in anomaly event reviews. This study provides a novel solution for intelligent traffic monitoring systems, enhancing the automation and practicality of handling anomaly events.

Figures and Tables | References | Related Articles | Metrics
Product design and evaluation methods based on AI-generated content
LU Peng, WU Fan, TANG Jian
2024, 45(6): 1277-1288.  DOI: 10.11996/JG.j.2095-302X.2024061277
HTML    PDF 37     35

Generative artificial intelligence (GAI) has become a transformative force in product design, significantly enhancing design efficiency. However, systematic application methods and cases of collaborative multi-type GAI application examples remain scarce. To highlight the innovative role of GAI in product design, a method based on AI-generated content (AIGC) for product form design and evaluation was proposed. First, ChatGPT was utilized to capture the emotional needs of target product users and summarize them into design target imageries. Additionally, ChatGPT served as a prompt generator for Midjourney, generating necessary prompt phrases for the target product. Midjourney constructed a reference library for product forms using these target imageries and prompt phrases. Perceptual questionnaires were then utilized to select distinctive designs as alternatives. Next, the grey relational analysis (GRA) and analytic hierarchy process (AHP) methods were employed to evaluate these alternatives and select the optimal design, with Rhino used to optimize human-machine interaction. Finally, stable diffusion was utilized to quickly generate rendering effects for the optimal design. A case study on electric motorcycles and household vacuum cleaners validated the proposed method. It was found that the collaborative model of multi-type generative AI excelled in analyzing user needs, transforming design concepts, and optimizing design details. This approach revolutionized traditional design processes and improved design efficiency. The proposed method provided product designers with an AIGC-based design approach and established a quantitative evaluation method for AIGC.

Figures and Tables | References | Related Articles | Metrics
Image Processing and Computer Vision
Spatiotemporal data visualization based on density map multi-target tracking
SONG Sicheng, CHEN Chen, LI Chenhui, WANG Changbo
2024, 45(6): 1289-1300.  DOI: 10.11996/JG.j.2095-302X.2024061289
HTML    PDF 17     17

The spatiotemporal data tracking visualization has received widespread attention. The focus of this research is on depicting the dynamic details of the data and ensuring trajectory consistency with the observation results. In this paper, a model that combined deep learning with traditional tracking techniques was proposed to perform tracking tasks, thereby improving the speed and accuracy of visualization. First, a high-quality Perlin noise dataset was generated, on which a multi-target tracking model was trained. Second, a two-stage, multi-model deep learning framework was proposed to enhance the analysis depth of dynamic scenes. Finally, in order to continuously display detailed tracking information, a visualization solution that combined trajectories and vector fields was introduced to enhance the visual effect of tracking information. Different cases in this study demonstrated the usefulness and robustness of the proposed method, quantitatively evaluating and omparing the method from multiple aspects. The results showed that the method proposed in this study can help users in understanding multi-target tracking information in different scenarios.

Figures and Tables | References | Related Articles | Metrics
Research on gangue target detection algorithm based on MBI-YOLOv8
LI Zhenfeng, FU Shichen, XU Le, MENG Bo, ZHANG Xin, QING Jianjun
2024, 45(6): 1301-1312.  DOI: 10.11996/JG.j.2095-302X.2024061301
HTML    PDF 25     1672

To achieve a balance between detection performance and resource consumption in the gangue sorting domain, an efficient, real-time, lightweight object detection algorithm based on an improved YOLOv8 was proposed, suitable for low-performance detection platforms. This algorithm built on the YOLOv8n architecture and incorporated MobileNetv3 to replace the original backbone network, leveraging its lightweight structure to reduce model parameters and computational load, thereby enhancing detection speed. Additionally, the algorithm integrated the BIFPN module for feature enhancement, which employed multi-scale feature fusion to compensate for the loss of detection accuracy associated with the lightweight network, thus achieving model lightweighting while maintaining detection accuracy. Furthermore, the Inner-CIoU bounding box regression loss function was introduced to balance the training results of images with varying qualities, improving the model’s localization capability and further enhancing detection accuracy and speed. To validate the effectiveness of the proposed algorithm, experiments were conducted to compare it with YOLOv3-tiny, YOLOv5n, YOLOv7, and YOLOv8n on a custom dataset. Experimental results demonstrated that the proposed algorithm exhibited optimal overall detection performance. While maintaining detection accuracy, the model’s parameter count was reduced to 1,188,725, representing a 60.46% decrease compared to YOLOv8n. The computational load was reduced from 8.1 GFLOPs to 2.8 GFLOPs, and the FPS increased from 86.02 Hz to 216.58 Hz. This research indicated that the proposed algorithm is a highly efficient, real-time, lightweight gangue detection method with significant potential and advantages in balancing detection performance and computational resource consumption.

Figures and Tables | References | Related Articles | Metrics
Automatic reading of pointer meters based on R-YOLOv7 and MIMO-CTFNet
LI Shengtao, HOU Liqun, DONG Yasong
2024, 45(6): 1313-1327.  DOI: 10.11996/JG.j.2095-302X.2024061313
HTML    PDF 15     19

To solve the problems in current pointer meter reading methods, such as the complicated reading process, significant reading errors, and the motion blur caused by camera shakes, an automatic reading method based on R-YOLOv7 and MIMO-CTFNet (multi-input multi-output CNN-transformer fusion network) was proposed. First, the R-YOLOv7 algorithm was constructed to consider both accuracy and lightweight for detecting the dial and its key information. Then, a MIMO-CTFNet algorithm was designed to recover the motion-blurred meter images. Finally, the angle method based on the extracted small scales was utilized to perform meter reading. The experimental results showed that for the data set of dial key information finding, the parameters, FLOPs, ADT, and mAP50:95 were 12 M, 60.30 G, 17.04 ms, and 86.5%, respectively. The PSNR and SSIM of the improved MIMO-CTFNet algorithm achieved 33.05 dB and 0.935 3, respectively. The maximum fiducial error of the proposed reading method was 0.35%, and the reading time for images requiring and not requiring motion blur was 0.561 s and 0.128 s, respectively, validating the effectiveness of the proposed method.

Figures and Tables | References | Related Articles | Metrics
Lightweight UAV image target detection algorithm based on YOLOv8
YAN Jianhong, RAN Tongxiao
2024, 45(6): 1328-1337.  DOI: 10.11996/JG.j.2095-302X.2024061328
HTML    PDF 26     26

To address the problems of low target pixels, complex backgrounds, and difficult model deployment in unmanned aerial vehicle (UAV) images, a lightweight multi-scale feature fusion small target detection algorithm based on YOLOv8 was proposed. In order to reduce the number of network parameters and improve the model detection speed, the FasterNet block was used to replace the bottleneck of C2f, resulting in the construction of a lightweight feature extraction module, FasterC2f. To enhance the multi-scale feature fusion ability of the model, a new focus diffusion feature was designed that enables each layer feature map of the neck network to focus on three layers of feature information. A shared convolution detection head was designed, allowing each detection head to contain feature information from different scales while optimizing the model parameters. The small target detection network was reconstructed to utilize a larger-scale three-layer detection head, improving the model’s feature learning capability for small targets. Experimental results on the Visdrone data set indicated that compared with YOLOv8s, the precision rate, recall rate, and mAP of this model increased by 5.1%, 5.4%, and 6.6%, respectively. The number of parameters was reduced by 68%, and the model file size decreased by 15.3 MB, while FPS increased by 16%. These results demonstrated that the model possesses advantages in high detection accuracy, fast detection speed, and ease of deployment.

Figures and Tables | References | Related Articles | Metrics
Computer Graphics and Virtual Reality
Observation quality field based collaborative object manipulation in VR
LUAN Shuai, WU Jian, FAN Runze, WANG Lili
2024, 45(6): 1338-1348.  DOI: 10.11996/JG.j.2095-302X.2024061338
HTML    PDF 12     11

In virtual reality (VR), interacting with objects serves as a crucial form of interaction, especially in collaborative VR applications where efficient and accurate operations are extremely important. However, traditional collaborative operation techniques have failed to adequately consider the interactions among objects, targets, and environmental dynamics, nor have they provided effective guidance to assist users in selecting the best viewpoint during operations. To address this issue, a new collaborative operation technique based on the Observation Quality Field (OQF) was introduced, aimed at enhancing operational accuracy and efficiency. This technique guided users to choose the most appropriate viewpoint based on their observation quality score, facilitating more efficient and coordinated object operation. Initially, the concept and construction method of the Observation Quality Field were introduced, followed by two strategies to accelerate the OQF update process. Subsequently, a collaborative operation method using OQF guidance for object operation was presented. Through a user study involving 36 participants conducted in three different virtual environments—living room, warehouse, and pipeline—the efficiency and accuracy of this technique were evaluated. The results showed that compared to traditional methods, the OQF technique significantly reduced task completion time, positional errors, rotational errors, and overall task load.

Figures and Tables | References | Related Articles | Metrics
The impact of scenery and time on spatial orientation cognition in virtual reality
REN Yangfu, YU Ge, FU Yueyao, XU Senzhe, HE Yu, WANG Juhong, ZHANG Songhai
2024, 45(6): 1349-1363.  DOI: 10.11996/JG.j.2095-302X.2024061349
HTML    PDF 14     27

Sense of direction refers to the ability of users to construct mental maps based on their personal perceptions by observing or navigating scenes, allowing them to understand and interpret map information and make judgments on direction, angle, and distance. In fields such as psychology and medicine, numerous studies have shown that the sense of direction is influenced by multiple factors, including spatial memory, spatial perception, and spatial imagination. Within virtual environments, users also rely on this ability to judge direction, using virtual devices to gather scene information. This study primarily examined how users determine their orientation in virtual scenes through abilities such as spatial memory, perception, and imagination. The metric for users’ sense of direction in this study included two aspects: accuracy and efficiency. Accuracy refers to the angular and distance errors between the user’s and the target’s orientation and position, while efficiency refers to the decision time for the user to judge the direction and the time to move to the target. Six experiments were conducted to explore the impact of visual scene differences on users’ sense of direction. The experimental results showed that: ① visual information is a crucial factor for users’ direction judgments in virtual reality; ② within similarly structured scenes, smaller spaces with more objects enhanced users’ sense of direction; ③ changes in scene style had little impact on users’ sense of direction under constant visual range. Additionally, the accuracy of users’ orientation judgments was influenced by both decision and movement time, with movement time exerting a more significant effect, while decision time had a relatively smaller impact. The findings of this study provided valuable insights for virtual reality scene design, measuring user sense of direction, optimizing scene layouts, and enhancing user navigation capabilities.

Figures and Tables | References | Related Articles | Metrics
Web3D global illumination cloud rendering based on advanced DDGI
LIU Chang, ZHANG Yuming, ZHANG Qian, OU Qiaofeng, ZHAO Tongshuo, CHEN Hao, SHI Lei
2024, 45(6): 1364-1374.  DOI: 10.11996/JG.j.2095-302X.2024061364
HTML    PDF 18     13

A Web-Cloud rendering strategy was proposed to address the issues of insufficient rendering performance and the inability to perform real-time global illumination rendering caused by the compatibility of Web3D applications on various devices. This strategy aimed to improve the dynamic diffuse global illumination (DDGI) technology through layout optimization algorithms, significantly enhancing the efficiency and quality of global illumination rendering in the Web3D environment. Firstly, the DDGI layout was automatically optimized through segmentation detection and layout optimization strategies on the cloud server to meet the requirements of the scenario. Secondly, the page cloud rendering strategy allocated global illumination computing tasks based on the device computing resources. Finally, the low-data global lighting information was transmitted to the web client, allowing users to interact through the web interface and make real-time adjustments to scene resources such as viewpoints, models, and lighting. This approach enabled the real-time rendering of high-quality dynamic global lighting effects on the web client. The research results demonstrated that the improved DDGI-based Web3D scene cloud rendering significantly enhanced rendering quality, providing an effective rendering optimization solution for the advancement of Web3D technology.

Figures and Tables | References | Related Articles | Metrics
Cloud Sphere: a 3D shape representation method via progressive deformation
WANG Zongji, LIU Yunfei, LU Feng
2024, 45(6): 1375-1388.  DOI: 10.11996/JG.j.2095-302X.2024061375
HTML    PDF 13     13

As 3D data proliferates, 3D models are exhibiting increasingly diverse and complex shapes. Dedicated to discovering distinctive information from the shape formation process, a method has been developed to uniformly represent the shapes of 3D models through progressive deformation. For any input 3D model, a spherical point cloud template was gradually deformed to fit the input shape through a coarse-to-fine progressive deformation-based auto-encoder. The 3D shape deformation process was modeled using deep neural networks, extracting unique shape features from the multi-stage deformation process and avoiding the reliance on manual annotations common in general task-driven learning methods. The deformation residuals during the shape generation process were explicitly encoded. It not only captured the final shape but also recorded the progressive deformation process from the initial state to the final shape. In terms of deep neural network training, a multi-stage information supervision approach was developed for feature learning, improving the accuracy of deformation reconstruction. Experimental results showed that the proposed method has the ability to reconstruct 3D shapes with high fidelity, and consistent topology was preserved in the multi-stage deformation process. This deformation representation is applicable to various computer graphics applications such as model classification, shape transfer, and co-editing, demonstrating versatility and providing underlying data representation method support for automatic parsing and efficient editing of 3D model geometric properties.

Figures and Tables | References | Related Articles | Metrics
Total Contents
Total Contents of 2024
2024, 45(6): 1389. 
PDF 8     6
Related Articles | Metrics
Published as
Published as 6, 2024
2024, 45(6): 1390. 
PDF 10     16
Related Articles | Metrics