基于Q 学习的任务调度问题的改进研究

图学学报

基于Q 学习的任务调度问题的改进研究

出版日期:2012-06-29 发布日期:2015-07-28

Improvement of task scheduling based on Q-learning

Online:2012-06-29 Published:2015-07-28

摘要/Abstract

摘要： 论文针对协同工作中的任务调度问题，建立了相应的马尔可夫决策过程模
型，在此基础上提出了一种改进的基于模拟退火的Q 学习算法。该算法通过引入模拟退火，
并结合贪婪策略，以及在状态空间上的筛选判断，显著地提高了收敛速度，缩短了执行时间。
最后与其它文献中相关算法的对比分析，验证了本改进算法的高效性。

关键词: 任务调度, Q 学习, 强化学习, 模拟退火

Abstract: In this paper, a Markov Decision Process model is built to describe the problem of
task scheduling in cooperative work, and a improved Q-learning algorithm based on Metropolis
rule is present to solve the problem. In the algorithm, Metropolis rule combined with Greedy
Strategy is introduced and a selection in state space is adopted, which accelerate the convergence,
and shorten the running time. Finally, the algorithm is compared to some related algorithms of
other papers, and the algorithm performance is analyzed as well, which indicates the efficiency of
the improved Q-learning algorithm.

Key words: task scheduling, Q-learning, reinforcement learning, simulated annealing

刘晓平，杜琳，石慧. 基于Q 学习的任务调度问题的改进研究[J]. 图学学报.

Liu Xiaoping, Du Lin, Shi Hui. Improvement of task scheduling based on Q-learning[J]. Journal of Graphics.

[1]	高宜琛, 连宙辉, 唐英敏, 肖建国. 一种新的矢量中文字库自动压缩方法[J]. 图学学报, 2021, 42(3): 426-431.
[2]	伍一鹤 , 张振宁 , 仇栋 , 李蔚清 , 苏智勇 . 基于深度强化学习的虚拟手自适应抓取研究[J]. 图学学报, 2021, 42(3): 462-469.
[3]	梁裕卿, 吉久茂, 杨佳蕾, 张东升, 王珂, 王凌宇, . 基于人工智能的 BIM 疏散设计自动化方法[J]. 图学学报, 2021, 42(2): 299-306.
[4]	方伟 1,2，黄增强 3，徐建斌 4，黄羿 1,5，马新强 1,5 . 基于 Spark 的分布式机器人强化学习训练框架[J]. 图学学报, 2019, 40(5): 852-857.
[5]	周家智，尹令，张素敏. 基于遗传模拟退火算法的布局优化研究[J]. 图学学报, 2018, 39(3): 567-572.
[6]	曾德标1，万世明1，李迎光2，刘勇1，李东明1. 装配机器人加工站位设置混合优化算法[J]. 图学学报, 2016, 37(4): 496-501.
[7]	董德威，颜云辉. 缺陷板材非规则件优化排样[J]. 图学学报, 2013, 34(2): 31-37.
[8]	韩鹏飞,孙占磊，赵罡. 改进离散粒子群算法及其在飞机装配任务调度中的应用研究[J]. 图学学报, 2013, 34(1): 60-65.
[9]	王金敏，王保春，朱艳华. 求解矩形布局问题的自适应算法[J]. 图学学报, 2012, 33(3): 29-33.
[10]	许良凤，林辉，胡敏，吴东升，徐元英，景佳. 基于模拟退火并行遗传算法的Otsu双阈值[J]. 图学学报, 2011, 32(5): 25-29.
[11]	黄彬，高诚辉，陈亮. 基于时延库所Petri网的动态联盟任务调度研究[J]. 图学学报, 2011, 32(1): 148-153.