基于 Spark 的分布式机器人强化学习训练框架

doi:10.11996/JG.j.2095-302X.2019050852

图学学报

• 专论：第16届媒体智能与大数据计算会议（CIDE & DEA 2019 大连） • 上一篇下一篇

基于 Spark 的分布式机器人强化学习训练框架

(1. 浙江大学智能系统与控制研究所，浙江杭州 310027； 2. 淮北职业技术学院计算机科学技术系，安徽淮北 235000； 3. 杭州电子科技大学计算机学院，浙江杭州 310018； 4. 国家电网浙江省电力有限公司物资分公司，浙江杭州 310000； 5. 重庆文理学院大数据智能计算与可视化研究所，重庆 402160)

出版日期:2019-10-31 发布日期:2019-11-06
基金资助:
浙江大学工业控制技术国家重点实验室开放课题项目(ICT1800413)；重庆市发改委重大产业技术研发项目(2018148208)；重庆市教委科技项目(KJ1601129)；安徽省高校自然科学研究重点项目(KJ2018A0713)；安徽高校优秀青年骨干人才国内访问研修项目(gxgnfx2018108)；广东省重点领域研发计划项目(2019B010120001)

Training Framework of Distributed Robot Reinforcement Learning Based on Spark

(1. Institute of Cyber Systems and Control, Zhejiang University, Hangzhou Zhejiang 310027, China; 2. Department of Computer Science and Technology, Huaibei Vocational and Technical College, Huaibei Anhui 235000, China; 3. School of Computer Science, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China; 4. Materials Branch, State Grid Zhejiang Electric Power Company, LTD, Hangzhou Zhejiang 310000, China; 5. Institute of Intelligent Computing and Visualization Based on Big Data, Chongqing University of Arts and Sciences, Chongqing 402160, China)

Online:2019-10-31 Published:2019-11-06

摘要/Abstract

摘要： 强化学习能够通过自主学习的方式对机器人难以利用控制方法实现的各种任务进行训练完成，有效避免了系统设计人员对系统建模或制定规则。然而，强化学习在机器人开发应用领域中训练成本高昂，需要花费大量时间成本、硬件成本实现学习训练，虽然基于仿真可以一定程度减少硬件成本，但对类似 Gazebo 这样的复杂机器人训练平台，仿真过程工作效率低，数据采样耗时长。为了有效解决这些问题，针对机器人仿真过程的平台易用性、兼容性等方面进行优化，提出一种基于 Spark 的分布式强化学习框架，为强化学习的训练与机器人仿真采样提供分布式支持，具有高兼容性、健壮性的特性。通过实验数据分析对比，表明本系统框架不仅可有效提高机器人的强化学习模型训练速度，缩短训练时间花费，且有助于节约硬件成本。

关键词: 机器人, 强化学习, Spark, 分布式, 数据管道

Abstract: Through autonomous learning, reinforcement learning can train robots to complete various tasks that are difficult for them to implement with control methods, and this can effectively avoid system designers from systemic modeling or rules making. However, the training cost of reinforcement learning in the field of robot development and application is high, and it takes a large amount of time cost and hardware cost to realize learning and training. Although the hardware cost can be reduced to some extent based on simulation, for the complicated robot training platform such as Gazebo, the working efficiency of simulation process is low, and it takes a long time for data sampling. In order to effectively solve these problems, a distributed reinforcement learning framework based on Spark is put forward, which optimizes the usability and compatibility of platform of robot simulation process, offers distributed support for the training of reinforcement learning and robot simulation sampling, and has the characteristics of high compatibility and robustness. Through analyzing and contrasting the experimental data, the system framework can not only effectively improve the training speed of reinforcement learning model of robot and shorten the training time, but also help with the saving of hardware cost.

Key words: robot, reinforcement learning, Spark, distribute, data pipeline

方伟 1,2，黄增强 3，徐建斌 4，黄羿 1,5，马新强 1,5 . 基于 Spark 的分布式机器人强化学习训练框架[J]. 图学学报, DOI: 10.11996/JG.j.2095-302X.2019050852.

FANG Wei1,2, HUANG Zeng-qiang3, XU Jian-bin4, HUANG Yi1,5, MA Xin-qiang1,5 . Training Framework of Distributed Robot Reinforcement Learning Based on Spark[J]. Journal of Graphics, DOI: 10.11996/JG.j.2095-302X.2019050852.

[1]	王秋惠, 王雅馨. 医院消杀机器人作业安全与交互设计策略[J]. 图学学报, 2022, 43(1): 172-180.
[2]	马欢, 冀晶晶, 刘佳豪, 刘雨婷. 面向机器人自主分割的肉品识别分类系统实现[J]. 图学学报, 2021, 42(6): 924-930.
[3]	王秋惠, 姚景一. 下肢外骨骼康复机器人人因工程研究进展[J]. 图学学报, 2021, 42(5): 712-718.
[4]	伍一鹤 , 张振宁 , 仇栋 , 李蔚清 , 苏智勇 . 基于深度强化学习的虚拟手自适应抓取研究[J]. 图学学报, 2021, 42(3): 462-469.
[5]	王豪，郭斌，郝少阳，张秋韵，於志文 . 基于深度学习的个性化对话内容生成方法[J]. 图学学报, 2020, 41(2): 210-216.
[6]	张肇轩 1，王诚斌 1，杨鑫 1，朴星霖 2，王鹏杰 3，尹宝才 1 . 基于模板替换的室内场景建模方法研究[J]. 图学学报, 2020, 41(2): 270-276.
[7]	王秋惠，杨悦 . 基于 QFD 与 RAHP 的餐馆服务机器人人因工程设计[J]. 图学学报, 2019, 40(4): 739-745.
[8]	孙瑞，张文胜. 基于改进蚁群算法的移动机器人平滑路径规划[J]. 图学学报, 2019, 40(2): 344-350.
[9]	孙昭 1，柳有权 1，张彩荣 1，石剑 2，陈彦云 3. 一种场景内容分布的交互式渲染系统[J]. 图学学报, 2019, 40(1): 87-91.
[10]	刘宗明，葛碧慧. 基于QFD 的老年家用陪护机器人设计[J]. 图学学报, 2018, 39(4): 695-699.
[11]	李振雨，王好臣. 基于视觉识别定位的苹果采摘系统研究[J]. 图学学报, 2018, 39(3): 493-500.
[12]	孙辉，吕健，寸文哲. VR 系统信息可视化模型[J]. 图学学报, 2018, 39(2): 317-326.
[13]	李蕾1，李玲2. 请求下降邻域叠加选取分布式P2P 视频点播调度[J]. 图学学报, 2018, 39(1): 30-35.
[14]	张奔. 四足机器人步态分析及仿真实现[J]. 图学学报, 2017, 38(5): 670-674.
[15]	熊文诗，卿粼波，吴晓红，陈真真. DMDVC 中基于可靠性评估的边信息融合算法[J]. 图学学报, 2017, 38(4): 531-536.

基于 Spark 的分布式机器人强化学习训练框架

Training Framework of Distributed Robot Reinforcement Learning Based on Spark

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价