欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (6): 1165-1177.DOI: 10.11996/JG.j.2095-302X.2024061165

• “大模型与图学技术及应用”专题 • 上一篇    下一篇

大模型引导的高效强化学习方法

徐沛1(), 黄凯奇1,2,3()   

  1. 1.中国科学院自动化研究所智能系统与工程研究中心,北京 100190
    2.中国科学院脑科学与智能技术卓越创新中心,上海 200031
    3.中国科学院大学人工智能学院,北京 100049
  • 收稿日期:2024-08-09 接受日期:2024-10-29 出版日期:2024-12-31 发布日期:2024-12-24
  • 通讯作者:黄凯奇(1977-),男,研究员,博士。主要研究方向为计算机视觉、模式识别和博弈决策。E-mail:kqhuang@nlpr.ia.ac.cn
  • 第一作者:徐沛(1993-),男,助理研究员,博士。主要研究方向为强化学习、多智能体学习。E-mail:pei.xu@ia.ac.cn
  • 基金资助:
    新一代人工智能国家科技重大专项(2022ZD0116403);国家资助博士后研究人员计划项目(GZC20232995);中国科学院战略性先导科技专项资助项目(XDA27010201)

An efficient reinforcement learning method based on large language model

XU Pei1(), HUANG Kaiqi1,2,3()   

  1. 1. Center for Research on Intelligent System and Engineering, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2. Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China
    3. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-08-09 Accepted:2024-10-29 Published:2024-12-31 Online:2024-12-24
  • Contact: HUANG Kaiqi (1977-), researcher, Ph.D. His main research interests cover computer vision, pattern recognition, game theory. E-mail:kqhuang@nlpr.ia.ac.cn
  • First author:XU Pei (1993-), assistant researcher, Ph.D. His main research interests cover reinforcement learning, multi-agent learning. E-mail:pei.xu@ia.ac.cn
  • Supported by:
    National Science and Technology Major Project(2022ZD0116403);Postdoctoral Fellowship Program of CPSF(GZC20232995);Strategic Priority Research Program of Chinese Academy of Sciences(XDA27010201)

摘要:

深度强化学习作为支撑AlphaGo和ChatGPT等突破性工作的关键技术,已成为前沿科学的研究热点。在实际应用上,深度强化学习作为一种重要的智能决策技术,被广泛应用于视觉场景的避障、虚拟场景的优化生成、机器臂控制、数字化设计与制造、工业设计决策等多种规划决策任务。然而,深度强化学习在实际应用中面临样本效率低下的挑战,严重限制了其应用效果。为缓解这一问题,针对现有强化学习探索机制的不足,将大模型技术与多种主流探索技术相结合,提出了一种基于大模型引导的高效探索方法,以提升样本效率。通过利用大模型来指导深度强化学习智能体的探索行为,该方法在多个国际公认的测试环境中显示出显著的性能提升,不仅展示了大模型技术在深度强化学习探索问题中的潜力,也为实际应用中改善样本效率提供了新的解决思路。

关键词: 深度强化学习, 大语言模型, 高效探索

Abstract:

Deep reinforcement learning, as a key technology supporting breakthrough works such as AlphaGo and ChatGPT, has become a research hotspot in frontier science. In practical applications, deep reinforcement learning, as an important intelligent decision-making technology, is widely used in a variety of planning and decision-making tasks, such as obstacle avoidance in visual scenes, optimal generation of virtual scenes, robotic arm control, digital design and manufacturing, and industrial design decision-making. However, deep reinforcement learning faces the challenge of low sample efficiency in practical applications, which greatly limits its application effectiveness. In order to improve the sample efficiency, this paper proposes an efficient exploration method based on large model guidance, which combines the large model with the mainstream exploration techniques. Specifically, we utilize the semantic extraction capability of a large language model to obtain semantic information of states, which is then used to guide the exploration behavior of agents. Then, we introduce the semantic information into the classical methods in single-policy exploration and population exploration, respectively. By using the large model to guide the exploration behavior of deep reinforcement learning agents, our method shows significant performance improvement in popular environments. This research not only demonstrates the potential of large model techniques in deep reinforcement learning exploration problems, but also provides a new idea to alleviate the low sample efficiency problem in practical applications.

Key words: deep reinforcement learning, large language model, efficient exploration

中图分类号: