欢迎访问《图学学报》 分享到:

图学学报 ›› 2025, Vol. 46 ›› Issue (3): 697-708.DOI: 10.11996/JG.j.2095-302X.2025030697

• 数字化设计与制造 • 上一篇    

基于深度强化学习的可缓冲的物体运输和装箱

雷玉林(), 刘利刚()   

  1. 中国科学技术大学数学科学学院,安徽 合肥 230026
  • 收稿日期:2024-10-19 接受日期:2025-02-19 出版日期:2025-06-30 发布日期:2025-06-13
  • 通讯作者:刘利刚(1975-),男,教授,博士。主要研究方向为计算机图形学与CAD/CAE等。E-mail:lgliu@ustc.edu.cn
  • 第一作者:雷玉林(2000-),男,硕士研究生。主要研究方向为计算机图形学和深度学习。E-mail:yllei@mail.ustc.edu.cn
  • 基金资助:
    国家自然科学基金(62025207)

Transport-and-packing with buffer via deep reinforcement learning

LEI Yulin(), LIU Ligang()   

  1. School of Mathematical Sciences, University of Science and Technology of China, Hefei Anhui 230026, China
  • Received:2024-10-19 Accepted:2025-02-19 Published:2025-06-30 Online:2025-06-13
  • Contact: LIU Ligang (1975-), professor, Ph.D. His main research interests cover computer graphics and CAD/CAE, etc. E-mail:lgliu@ustc.edu.cn
  • First author:LEI Yulin (2000-), master student. His main research interests cover computer graphics and deep learning. E-mail:yllei@mail.ustc.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62025207)

摘要:

针对物理场景中物体初始堆叠约束导致装箱空间利用率受限的问题,提出一种基于深度强化学习框架的可缓冲物体运输与装箱的神经优化模型,引入缓冲中转机制提升容器装箱利用率。首先,模型的状态编码器动态编码优先图提取的优先级信息和缓冲信息,有效地处理物体之间的堆叠关系和利用缓冲区的中转能力;然后,序列解码器感知当前容器状态,利用注意力机制对编码后的特征向量计算候选旋转状态序列的选取概率,自适应地选取执行中转或装箱的状态序列;接着,目标解码器将选取状态的几何信息和缓冲信息作为输入,融合序列解码器累积信息构建条件嵌入向量,对编码后的特征向量进行注意力汇聚,高效决策物体进行缓冲或装箱。最后使用带基线的REINFORCE算法训练网络得到可缓冲物体装箱的优化策略。在二维和三维RAND数据集上的实验结果表明,相较于先进的TAP-Net模型,容器装箱利用率提高了4%左右,并且明显优于针对此新定义问题设计的启发式方法。此外,基于固定数量物体训练的模型能够有效泛化到更大规模物体数量的装箱实例。

关键词: 装箱问题, 深度强化学习, 神经优化, 组合优化, 注意力机制

Abstract:

Addressing the challenge of limited container space utilization caused by initial object stacking constraints in physical scenarios, a neural optimization model based on a deep reinforcement learning framework was proposed for bufferable object transportation and packing, incorporating a buffer transfer mechanism to enhance container packing efficiency. The state encoder dynamically encoded priority information extracted from a priority graph and buffer information, effectively managed object stacking relationships, and leveraged the transfer capacity of the buffer zone. The sequence decoder perceived the current container state and employed an attention mechanism to calculate selection probabilities for candidate rotation state sequences, adaptively selecting sequences for either transfer or packing. Subsequently, the target decoder took the geometric and buffer information of the selected states as input, integrated the accumulated information from the sequence decoder to construct a conditional query vector, and performed attention aggregation on the encoded feature vectors to efficiently decide whether to buffer or pack objects. The REINFORCE algorithm with a baseline was employed to train the network, yielding optimized strategies for bufferable object packing. Experimental results on 2D and 3D RAND datasets demonstrated an approximate 4% improvement in container packing utilization compared to the advanced TAP-Net model, significantly outperforming heuristic methods designed for this newly defined problem. Furthermore, models trained on a fixed number of objects effectively generalized to packing instances involving a larger number of objects.

Key words: bin packing problem, deep reinforcement learning, neural optimization, combinatorial optimization, attention mechanism

中图分类号: