图学学报 ›› 2025, Vol. 46 ›› Issue (3): 697-708.DOI: 10.11996/JG.j.2095-302X.2025030697
• 数字化设计与制造 • 上一篇
收稿日期:
2024-10-19
接受日期:
2025-02-19
出版日期:
2025-06-30
发布日期:
2025-06-13
通讯作者:
刘利刚(1975-),男,教授,博士。主要研究方向为计算机图形学与CAD/CAE等。E-mail:lgliu@ustc.edu.cn第一作者:
雷玉林(2000-),男,硕士研究生。主要研究方向为计算机图形学和深度学习。E-mail:yllei@mail.ustc.edu.cn
基金资助:
Received:
2024-10-19
Accepted:
2025-02-19
Published:
2025-06-30
Online:
2025-06-13
Contact:
LIU Ligang (1975-), professor, Ph.D. His main research interests cover computer graphics and CAD/CAE, etc. E-mail:lgliu@ustc.edu.cnFirst author:
LEI Yulin (2000-), master student. His main research interests cover computer graphics and deep learning. E-mail:yllei@mail.ustc.edu.cn
Supported by:
摘要:
针对物理场景中物体初始堆叠约束导致装箱空间利用率受限的问题,提出一种基于深度强化学习框架的可缓冲物体运输与装箱的神经优化模型,引入缓冲中转机制提升容器装箱利用率。首先,模型的状态编码器动态编码优先图提取的优先级信息和缓冲信息,有效地处理物体之间的堆叠关系和利用缓冲区的中转能力;然后,序列解码器感知当前容器状态,利用注意力机制对编码后的特征向量计算候选旋转状态序列的选取概率,自适应地选取执行中转或装箱的状态序列;接着,目标解码器将选取状态的几何信息和缓冲信息作为输入,融合序列解码器累积信息构建条件嵌入向量,对编码后的特征向量进行注意力汇聚,高效决策物体进行缓冲或装箱。最后使用带基线的REINFORCE算法训练网络得到可缓冲物体装箱的优化策略。在二维和三维RAND数据集上的实验结果表明,相较于先进的TAP-Net模型,容器装箱利用率提高了4%左右,并且明显优于针对此新定义问题设计的启发式方法。此外,基于固定数量物体训练的模型能够有效泛化到更大规模物体数量的装箱实例。
中图分类号:
雷玉林, 刘利刚. 基于深度强化学习的可缓冲的物体运输和装箱[J]. 图学学报, 2025, 46(3): 697-708.
LEI Yulin, LIU Ligang. Transport-and-packing with buffer via deep reinforcement learning[J]. Journal of Graphics, 2025, 46(3): 697-708.
图1 可缓冲的三维物体装箱场景((a)初始堆叠物体;(b)缓冲区;(c)码垛机械臂;(d)目标容器)
Fig. 1 A bufferable 3D object packing scenario ((a) Initially stacked objects; (b) A buffer zone; (c) A stacking robotic arm; (d) A target container)
图3 堆叠物体之间的移动访问阻挡和侧边访问阻挡((a)初始堆叠物件;(b)移动访问阻挡;(c)负向侧边访问阻挡;(d)正向侧边访问阻挡)
Fig. 3 Moving access blocks and side access blocks between stacked objects ((a) Initially stacked objects; (b) Movement access blockage; (c) Negative side access blockage; (d) Positive side access blockage)
图5 物体的优先级信息编码向量(矩形的原始状态和旋转状态分别用实体和条纹进行标记)
Fig. 5 Priority information encoding vector of the object (the original state and rotation state of the rectangle are marked with solid and striped patterns, respectively)
数据 | 方法 | 利用率 | 平均时间/ms |
---|---|---|---|
RAND-2D | Random | 0.805 0 | 5.274 3 |
Greedy | 0.887 5 | 18.545 3 | |
TAP-Net | 0.926 8 | 3.087 6 | |
Ours (B=2) | 0.966 6 | 3.817 0 | |
Ours (B=4) | 0.974 9 | 3.880 4 | |
RAND-3D | Random | 0.631 5 | 9.024 2 |
Greedy | 0.758 7 | 109.014 1 | |
TAP-Net | 0.806 3 | 6.459 8 | |
Ours (B=2) | 0.842 0 | 7.812 9 | |
Ours (B=4) | 0.845 9 | 7.749 9 |
表1 各方法在RAND数据集上的性能对比
Table 1 Performance comparison of various methods on the RAND dataset
数据 | 方法 | 利用率 | 平均时间/ms |
---|---|---|---|
RAND-2D | Random | 0.805 0 | 5.274 3 |
Greedy | 0.887 5 | 18.545 3 | |
TAP-Net | 0.926 8 | 3.087 6 | |
Ours (B=2) | 0.966 6 | 3.817 0 | |
Ours (B=4) | 0.974 9 | 3.880 4 | |
RAND-3D | Random | 0.631 5 | 9.024 2 |
Greedy | 0.758 7 | 109.014 1 | |
TAP-Net | 0.806 3 | 6.459 8 | |
Ours (B=2) | 0.842 0 | 7.812 9 | |
Ours (B=4) | 0.845 9 | 7.749 9 |
数据 | 方法 | 利用率 | 平均时间/ms |
---|---|---|---|
PPSG-2D | Random | 0.813 0 | 5.376 9 |
Greedy | 0.906 6 | 16.752 4 | |
TAP-Net | 0.945 2 | 3.066 6 | |
Ours (B=2) | 0.985 4 | 3.867 2 | |
Ours (B=4) | 0.994 7 | 3.865 9 | |
PPSG-3D | Random | 0.627 3 | 10.154 0 |
Greedy | 0.785 8 | 109.299 0 | |
TAP-Net | 0.827 8 | 6.199 4 | |
Ours (B=2) | 0.849 0 | 7.806 6 | |
Ours (B=4) | 0.847 4 | 7.871 1 |
表2 各方法在PPSG数据集上的性能对比
Table 2 Performance comparison of various methods on the PPSG dataset
数据 | 方法 | 利用率 | 平均时间/ms |
---|---|---|---|
PPSG-2D | Random | 0.813 0 | 5.376 9 |
Greedy | 0.906 6 | 16.752 4 | |
TAP-Net | 0.945 2 | 3.066 6 | |
Ours (B=2) | 0.985 4 | 3.867 2 | |
Ours (B=4) | 0.994 7 | 3.865 9 | |
PPSG-3D | Random | 0.627 3 | 10.154 0 |
Greedy | 0.785 8 | 109.299 0 | |
TAP-Net | 0.827 8 | 6.199 4 | |
Ours (B=2) | 0.849 0 | 7.806 6 | |
Ours (B=4) | 0.847 4 | 7.871 1 |
图9 本文网络和TAP-Net训练过程的奖励函数对比((a)使用二维RAND数据集;(b)使用三维RAND数据集;(c)使用二维PPSG数据集;(d)使用三维PPSG数据集)
Fig. 9 Comparison of the reward functions during the training processes of the proposed network and TAP-Net ((a) Using the 2D RAND dataset; (b) Using the 3D RAND dataset; (c) Using the 2D PPSG dataset; (d) Using the 3D PPSG dataset)
图12 各方法在三维数据集上的可视化装箱结果((a)使用RAND数据集;(b)使用PPSG数据集)
Fig. 12 Visualized packing results on the 3D dataset ((a) Using the RAND dataset; (b) Using the PPSG dataset)
物体数量 | 方法 | 利用率 |
---|---|---|
30 | Greedy | 0.896 2 |
Ours | 0.981 0 | |
40 | Greedy | 0.899 9 |
Ours | 0.984 1 | |
50 | Greedy | 0.902 4 |
Ours | 0.985 8 | |
60 | Greedy | 0.903 7 |
Ours | 0.986 8 |
表3 训练的网络在含有更多物体的数据集中的性能比较
Table 3 Performance comparison of the trained networks on datasets containing more objects
物体数量 | 方法 | 利用率 |
---|---|---|
30 | Greedy | 0.896 2 |
Ours | 0.981 0 | |
40 | Greedy | 0.899 9 |
Ours | 0.984 1 | |
50 | Greedy | 0.902 4 |
Ours | 0.985 8 | |
60 | Greedy | 0.903 7 |
Ours | 0.986 8 |
方法 | 利用率 |
---|---|
Ours (B=2)+静态信息 | 0.956 9 |
Ours (B=2)+动态信息 | 0.966 6 |
Ours (B=4)+静态信息 | 0.963 1 |
Ours (B=4)+动态信息 | 0.974 9 |
表4 网络使用静态信息和动态信息作为输入的装箱结果
Table 4 Packing results of the network using static and dynamic information as inputs
方法 | 利用率 |
---|---|
Ours (B=2)+静态信息 | 0.956 9 |
Ours (B=2)+动态信息 | 0.966 6 |
Ours (B=4)+静态信息 | 0.963 1 |
Ours (B=4)+动态信息 | 0.974 9 |
方法 | 利用率 |
---|---|
Ours (B=2)+仅几何信息 | 0.899 4 |
Ours (B=2)+添加高度图 | 0.966 6 |
Ours (B=4)+仅几何信息 | 0.912 5 |
Ours (B=4)+添加高度图 | 0.974 9 |
表5 使用高度图编码信息前后网络的性能比较
Table 5 Comparison of network performance before and after using height map encoding information
方法 | 利用率 |
---|---|
Ours (B=2)+仅几何信息 | 0.899 4 |
Ours (B=2)+添加高度图 | 0.966 6 |
Ours (B=4)+仅几何信息 | 0.912 5 |
Ours (B=4)+添加高度图 | 0.974 9 |
[1] | FENG B, LI Y Z, SHEN Z J M. Air cargo operations: literature review and comparison with practices[J]. Transportation Research Part C: Emerging Technologies, 2015, 56: 263-280. |
[2] | YOU S J, JI S H. Design of a multi-robot bin packing system in an automatic warehouse[C]// The 11th International Conference on Informatics in Control, Automation and Robotics. New York: IEEE Press, 2014: 533-538. |
[3] | SILVA E F, TOFFOLO T A M, WAUTERS T. Exact methods for three-dimensional cutting and packing: a comparative study concerning single container problems[J]. Computers & Operations Research, 2019, 109: 12-27. |
[4] | LODI A, MARTELLO S, VIGO D. Heuristic algorithms for the three-dimensional bin packing problem[J]. European Journal of Operational Research, 2002, 141(2): 410-420. |
[5] | FAROE O, PISINGER D, ZACHARIASEN M. Guided local search for the three-dimensional bin-packing problem[J]. INFORMS Journal on Computing, 2003, 15(3): 267-283. |
[6] | CRAINIC T G, PERBOLI G, TADEI R. Extreme point-based heuristics for three-dimensional bin packing[J]. INFORMS Journal on Computing, 2008, 20(3): 368-384. |
[7] | FANSLAU T, BORTFELDT A. A tree search algorithm for solving the container loading problem[J]. INFORMS Journal on Computing, 2010, 22(2): 222-235. |
[8] | 易向阳, 潘卫平, 张俊晖. 基于五块模式的单一矩形件排样算法[J]. 图学学报, 2015, 36(4): 521-525. |
YI X Y, PAN W P, ZHANG J H. Algorithm for generating five block mode cutting patterns of single rectangular items[J]. Journal of Graphics, 2015, 36(4): 521-525 (in Chinese). | |
[9] | KANG K, MOON I, WANG H F. A hybrid genetic algorithm with a new packing strategy for the three-dimensional bin packing problem[J]. Applied Mathematics and Computation, 2012, 219(3): 1287-1299. |
[10] | 王金敏, 王保春, 朱艳华. 求解矩形布局问题的自适应算法[J]. 图学学报, 2012, 33(3): 29-33. |
WANG J M, WANG B C, ZHU Y H. An adaptive algorithm for rectangular packing problems[J]. Journal of Graphics, 2012, 33(3): 29-33 (in Chinese). | |
[11] | VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]// The 29th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 2692-2700. |
[12] | BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. [2024-10-18]https://arxiv.org/abs/1611.09940. |
[13] | NAZARI M, OROOJLOOY A, TAKÁC M, et al. Reinforcement learning for solving the vehicle routing problem[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871. |
[14] | HU H Y, ZHANG X D, YAN X W, et al. Solving a new 3D bin packing problem with deep reinforcement learning method[EB/OL]. [2024-10-18]https://arxiv.org/abs/1708.05930. |
[15] | DUAN L, HU H Y, QIAN Y, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem[EB/OL]. [2024-10-18]https://arxiv.org/abs/1804.06896. |
[16] | LATERRE A, FU Y G, JABRI M K, et al. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization[EB/OL]. [2024-10-18]https://arxiv.org/abs/1807.01672. |
[17] | LI D D, GU Z Q, WANG Y X, et al. One model packs thousands of items with recurrent conditional query learning[J]. Knowledge-Based Systems, 2022, 235: 107683. |
[18] | ZHAO H, SHE Q J, ZHU C Y, et al. Online 3D bin packing with constrained deep reinforcement learning[C]// The 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 741-749. |
[19] | 朱鹏辉, 袁宏涛, 聂勇伟, 等. AC-HAPE3D: 基于强化学习的异形填充算法[J]. 图学学报, 2022, 43(6): 1096-1103. |
ZHU P H, YUAN H T, NIE Y W, et al. AC-HAPE3D: an algorithm for irregular packing based on reinforcement learning[J]. Journal of Graphics, 2022, 43(6): 1096-1103 (in Chinese). | |
[20] | HU R Z, XU J Z, CHEN B, et al. TAP-Net: transport-and-pack using reinforcement learning[J]. ACM Transactions on Graphics, 2020, 39(6): 232. |
[21] | XU J Z, GONG M L, ZHANG H, et al. Neural packing: from visual sensing to reinforcement learning[J]. ACM Transactions on Graphics, 2023, 42(6): 267. |
[22] | RAMOS A G, OLIVEIRA J F, GONÇALVES J F, et al. A container loading algorithm with static mechanical equilibrium stability constraints[J]. Transportation Research Part B: Methodological, 2016, 91: 565-581. |
[23] | BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172. |
[24] | WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3): 229-256. |
[1] | 牛杭, 葛鑫雨, 赵晓瑜, 杨珂, 王乾铭, 翟永杰. 基于改进YOLOv8的防振锤缺陷目标检测算法[J]. 图学学报, 2025, 46(3): 532-541. |
[2] | 于冰, 程广, 黄东晋, 丁友东. 基于双流网络融合的三维人体网格重建[J]. 图学学报, 2025, 46(3): 625-634. |
[3] | 张立立, 杨康, 张珂, 魏薇, 李晶, 谭洪鑫, 张翔宇. 面向柴油车辆排放黑烟的改进型YOLOv8检测算法研究[J]. 图学学报, 2025, 46(2): 249-258. |
[4] | 崔克彬, 耿佳昌. 基于EE-YOLOv8s的多场景火灾迹象检测算法[J]. 图学学报, 2025, 46(1): 13-27. |
[5] | 陈冠豪, 徐丹, 贺康建, 施洪贞, 张浩. 基于转置注意力和CNN的图像超分辨率重建网络[J]. 图学学报, 2025, 46(1): 35-46. |
[6] | 张文祥, 王夏黎, 王欣仪, 杨宗宝. 一种强化伪造区域关注的深度伪造人脸检测方法[J]. 图学学报, 2025, 46(1): 47-58. |
[7] | 苑朝, 赵明雪, 张丰羿, 冯晓勇, 李冰, 陈瑞. 基于点云特征增强的复杂室内场景3D目标检测[J]. 图学学报, 2025, 46(1): 59-69. |
[8] | 卢洋, 陈林慧, 姜晓恒, 徐明亮. SDENet:基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103. |
[9] | 徐沛, 黄凯奇. 大模型引导的高效强化学习方法[J]. 图学学报, 2024, 45(6): 1165-1177. |
[10] | 胡凤阔, 叶兰, 谭显峰, 张钦展, 胡志新, 方清, 王磊, 满孝锋. 一种基于改进YOLOv8的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900. |
[11] | 刘义艳, 郝婷楠, 贺晨, 常英杰. 基于DBBR-YOLO的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921. |
[12] | 吴沛宸, 袁立宁, 胡皓, 刘钊, 郭放. 基于注意力特征融合的视频异常行为检测[J]. 图学学报, 2024, 45(5): 922-929. |
[13] | 刘丽, 张起凡, 白宇昂, 黄凯烨. 结合Swin Transformer的多尺度遥感图像变化检测研究[J]. 图学学报, 2024, 45(5): 941-956. |
[14] | 章东平, 魏杨悦, 何数技, 徐云超, 胡海苗, 黄文君. 特征融合与层间传递:一种基于Anchor DETR改进的目标检测方法[J]. 图学学报, 2024, 45(5): 968-978. |
[15] | 谢国波, 林松泽, 林志毅, 吴陈锋, 梁立辉. 基于改进YOLOv7-tiny的道路病害检测算法[J]. 图学学报, 2024, 45(5): 987-997. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||