Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (6): 1165-1177.DOI: 10.11996/JG.j.2095-302X.2024061165

• Special Topic on “Large Models and Graphics Technology and Applications” • Previous Articles     Next Articles

An efficient reinforcement learning method based on large language model

XU Pei1(), HUANG Kaiqi1,2,3()   

  1. 1. Center for Research on Intelligent System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China
    3. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-08-09 Accepted:2024-10-29 Online:2024-12-31 Published:2024-12-24
  • Contact: HUANG Kaiqi
  • About author:First author contact:

    XU Pei (1993-), assistant researcher, Ph.D. His main research interests cover reinforcement learning, multi-agent learning. E-mail:pei.xu@ia.ac.cn

  • Supported by:
    National Science and Technology Major Project(2022ZD0116403);Postdoctoral Fellowship Program of CPSF(GZC20232995);Strategic Priority Research Program of Chinese Academy of Sciences(XDA27010201)

Abstract:

Deep reinforcement learning, as a key technology supporting breakthrough works such as AlphaGo and ChatGPT, has become a research hotspot in frontier science. In practical applications, deep reinforcement learning, as an important intelligent decision-making technology, is widely used in a variety of planning and decision-making tasks, such as obstacle avoidance in visual scenes, optimal generation of virtual scenes, robotic arm control, digital design and manufacturing, and industrial design decision-making. However, deep reinforcement learning faces the challenge of low sample efficiency in practical applications, which greatly limits its application effectiveness. In order to improve the sample efficiency, this paper proposes an efficient exploration method based on large model guidance, which combines the large model with the mainstream exploration techniques. Specifically, we utilize the semantic extraction capability of a large language model to obtain semantic information of states, which is then used to guide the exploration behavior of agents. Then, we introduce the semantic information into the classical methods in single-policy exploration and population exploration, respectively. By using the large model to guide the exploration behavior of deep reinforcement learning agents, our method shows significant performance improvement in popular environments. This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems, but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

Key words: deep reinforcement learning, large language model, efficient exploration

CLC Number: