Transport-and-packing with buffer via deep reinforcement learning

doi:10.11996/JG.j.2095-302X.2025030697

Abstract

Abstract:

Addressing the challenge of limited container space utilization caused by initial object stacking constraints in physical scenarios, a neural optimization model based on a deep reinforcement learning framework was proposed for bufferable object transportation and packing, incorporating a buffer transfer mechanism to enhance container packing efficiency. The state encoder dynamically encoded priority information extracted from a priority graph and buffer information, effectively managed object stacking relationships, and leveraged the transfer capacity of the buffer zone. The sequence decoder perceived the current container state and employed an attention mechanism to calculate selection probabilities for candidate rotation state sequences, adaptively selecting sequences for either transfer or packing. Subsequently, the target decoder took the geometric and buffer information of the selected states as input, integrated the accumulated information from the sequence decoder to construct a conditional query vector, and performed attention aggregation on the encoded feature vectors to efficiently decide whether to buffer or pack objects. The REINFORCE algorithm with a baseline was employed to train the network, yielding optimized strategies for bufferable object packing. Experimental results on 2D and 3D RAND datasets demonstrated an approximate 4% improvement in container packing utilization compared to the advanced TAP-Net model, significantly outperforming heuristic methods designed for this newly defined problem. Furthermore, models trained on a fixed number of objects effectively generalized to packing instances involving a larger number of objects.

Key words: bin packing problem, deep reinforcement learning, neural optimization, combinatorial optimization, attention mechanism

CLC Number:

LEI Yulin, LIU Ligang. Transport-and-packing with buffer via deep reinforcement learning[J]. Journal of Graphics, 2025, 46(3): 697-708.

Figures/Tables 18

Fig. 1 A bufferable 3D object packing scenario ((a) Initially stacked objects; (b) A buffer zone; (c) A stacking robotic arm; (d) A target container)

Fig. 2 Six candidate rotation states of a 3D object

Fig. 3 Moving access blocks and side access blocks between stacked objects ((a) Initially stacked objects; (b) Movement access blockage; (c) Negative side access blockage; (d) Positive side access blockage)

Fig. 4 Object stacking state and its priority relationship graph

Fig. 5 Priority information encoding vector of the object (the original state and rotation state of the rectangle are marked with solid and striped patterns, respectively)

Fig. 6 The target container configuration ((a) Original height map; (b) Gradient height map)

Fig. 7 Structure of the policy network

Fig. 8 Structure of the value network

Table 1 Performance comparison of various methods on the RAND dataset

数据	方法	利用率	平均时间/ms
RAND-2D	Random	0.805 0	5.274 3
	Greedy	0.887 5	18.545 3
	TAP-Net	0.926 8	3.087 6
	Ours (B=2)	0.966 6	3.817 0
	Ours (B=4)	0.974 9	3.880 4
RAND-3D	Random	0.631 5	9.024 2
	Greedy	0.758 7	109.014 1
	TAP-Net	0.806 3	6.459 8
	Ours (B=2)	0.842 0	7.812 9
	Ours (B=4)	0.845 9	7.749 9

Table 2 Performance comparison of various methods on the PPSG dataset

数据	方法	利用率	平均时间/ms
PPSG-2D	Random	0.813 0	5.376 9
	Greedy	0.906 6	16.752 4
	TAP-Net	0.945 2	3.066 6
	Ours (B=2)	0.985 4	3.867 2
	Ours (B=4)	0.994 7	3.865 9
PPSG-3D	Random	0.627 3	10.154 0
	Greedy	0.785 8	109.299 0
	TAP-Net	0.827 8	6.199 4
	Ours (B=2)	0.849 0	7.806 6
	Ours (B=4)	0.847 4	7.871 1

Fig. 9 Comparison of the reward functions during the training processes of the proposed network and TAP-Net ((a) Using the 2D RAND dataset; (b) Using the 3D RAND dataset; (c) Using the 2D PPSG dataset; (d) Using the 3D PPSG dataset)

Fig. 10 Visualized packing results on the 2D RAND dataset

Fig. 11 Visualized packing results on the 2D PPSGdataset

Fig. 12 Visualized packing results on the 3D dataset ((a) Using the RAND dataset; (b) Using the PPSG dataset)

Fig. 13 Evaluation results of the network trained with different resolutions

Table 3 Performance comparison of the trained networks on datasets containing more objects

物体数量	方法	利用率
30	Greedy	0.896 2
30	Ours	0.981 0
40	Greedy	0.899 9
40	Ours	0.984 1
50	Greedy	0.902 4
50	Ours	0.985 8
60	Greedy	0.903 7
60	Ours	0.986 8

Table 4 Packing results of the network using static and dynamic information as inputs

方法	利用率
Ours (B=2)+静态信息	0.956 9
Ours (B=2)+动态信息	0.966 6
Ours (B=4)+静态信息	0.963 1
Ours (B=4)+动态信息	0.974 9

Table 5 Comparison of network performance before and after using height map encoding information

方法	利用率
Ours (B=2)+仅几何信息	0.899 4
Ours (B=2)+添加高度图	0.966 6
Ours (B=4)+仅几何信息	0.912 5
Ours (B=4)+添加高度图	0.974 9

References 24

[1]	FENG B, LI Y Z, SHEN Z J M. Air cargo operations: literature review and comparison with practices[J]. Transportation Research Part C: Emerging Technologies, 2015, 56: 263-280.
[2]	YOU S J, JI S H. Design of a multi-robot bin packing system in an automatic warehouse[C]// The 11th International Conference on Informatics in Control, Automation and Robotics. New York: IEEE Press, 2014: 533-538.
[3]	SILVA E F, TOFFOLO T A M, WAUTERS T. Exact methods for three-dimensional cutting and packing: a comparative study concerning single container problems[J]. Computers & Operations Research, 2019, 109: 12-27.
[4]	LODI A, MARTELLO S, VIGO D. Heuristic algorithms for the three-dimensional bin packing problem[J]. European Journal of Operational Research, 2002, 141(2): 410-420.
[5]	FAROE O, PISINGER D, ZACHARIASEN M. Guided local search for the three-dimensional bin-packing problem[J]. INFORMS Journal on Computing, 2003, 15(3): 267-283.
[6]	CRAINIC T G, PERBOLI G, TADEI R. Extreme point-based heuristics for three-dimensional bin packing[J]. INFORMS Journal on Computing, 2008, 20(3): 368-384.
[7]	FANSLAU T, BORTFELDT A. A tree search algorithm for solving the container loading problem[J]. INFORMS Journal on Computing, 2010, 22(2): 222-235.
[8]	易向阳, 潘卫平, 张俊晖. 基于五块模式的单一矩形件排样算法[J]. 图学学报, 2015, 36(4): 521-525.
	YI X Y, PAN W P, ZHANG J H. Algorithm for generating five block mode cutting patterns of single rectangular items[J]. Journal of Graphics, 2015, 36(4): 521-525 (in Chinese).
[9]	KANG K, MOON I, WANG H F. A hybrid genetic algorithm with a new packing strategy for the three-dimensional bin packing problem[J]. Applied Mathematics and Computation, 2012, 219(3): 1287-1299.
[10]	王金敏, 王保春, 朱艳华. 求解矩形布局问题的自适应算法[J]. 图学学报, 2012, 33(3): 29-33.
	WANG J M, WANG B C, ZHU Y H. An adaptive algorithm for rectangular packing problems[J]. Journal of Graphics, 2012, 33(3): 29-33 (in Chinese).
[11]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]// The 29th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 2692-2700.
[12]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. [2024-10-18]https://arxiv.org/abs/1611.09940.
[13]	NAZARI M, OROOJLOOY A, TAKÁC M, et al. Reinforcement learning for solving the vehicle routing problem[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9861-9871.
[14]	HU H Y, ZHANG X D, YAN X W, et al. Solving a new 3D bin packing problem with deep reinforcement learning method[EB/OL]. [2024-10-18]https://arxiv.org/abs/1708.05930.
[15]	DUAN L, HU H Y, QIAN Y, et al. A multi-task selected learning approach for solving 3D flexible bin packing problem[EB/OL]. [2024-10-18]https://arxiv.org/abs/1804.06896.
[16]	LATERRE A, FU Y G, JABRI M K, et al. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization[EB/OL]. [2024-10-18]https://arxiv.org/abs/1807.01672.
[17]	LI D D, GU Z Q, WANG Y X, et al. One model packs thousands of items with recurrent conditional query learning[J]. Knowledge-Based Systems, 2022, 235: 107683.
[18]	ZHAO H, SHE Q J, ZHU C Y, et al. Online 3D bin packing with constrained deep reinforcement learning[C]// The 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 741-749.
[19]	朱鹏辉, 袁宏涛, 聂勇伟, 等. AC-HAPE3D: 基于强化学习的异形填充算法[J]. 图学学报, 2022, 43(6): 1096-1103.
	ZHU P H, YUAN H T, NIE Y W, et al. AC-HAPE3D: an algorithm for irregular packing based on reinforcement learning[J]. Journal of Graphics, 2022, 43(6): 1096-1103 (in Chinese).
[20]	HU R Z, XU J Z, CHEN B, et al. TAP-Net: transport-and-pack using reinforcement learning[J]. ACM Transactions on Graphics, 2020, 39(6): 232.
[21]	XU J Z, GONG M L, ZHANG H, et al. Neural packing: from visual sensing to reinforcement learning[J]. ACM Transactions on Graphics, 2023, 42(6): 267.
[22]	RAMOS A G, OLIVEIRA J F, GONÇALVES J F, et al. A container loading algorithm with static mechanical equilibrium stability constraints[J]. Transportation Research Part B: Methodological, 2016, 91: 565-581.
[23]	BUSONIU L, BABUSKA R, DE SCHUTTER B. A comprehensive survey of multiagent reinforcement learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156-172.
[24]	WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3): 229-256.