Research on UAV three-dimensional scene navigation based on deep reinforcement learning

doi:10.11996/JG.j.2095-302X.2025051010

Abstract

Abstract:

In recent years, with the UAV industry and application demands expanding, the realization of UAV autonomy and intelligence has been identified as a critical challenge As a foundational technology in the field of autonomous control of UAVs, UAV navigation and exploration have become a top priority in UAV application research. Currently, most UAV navigation and exploration methods rely on the reconstruction of environmental information, consuming excessive computation and memory, thus failing to meet the increasingly complex scenarios and real-time requirements. Therefore, based on the excellent representation learning ability of deep learning and the self-learning decision-making ability of reinforcement learning, an autonomous navigation method for unmanned aerial vehicles was proposed. By continuously optimizing decision-making strategies through self-learning, the navigation task could be better completed. The method first constructed a continuous action space and a non-sparse reward function to guide the learning process of the drone; then designed feature-extraction and decision-making modules to enhance the perception and decision-making capabilities of the UAV. The experimental results demonstrated that the algorithm exhibited the best navigation and obstacle avoidance performance in the simulated 3D scene. The navigation success rate in the designed 3D scene reached 87%, a 33% increase in average cumulative reward convergence value over that of the same period method, reduced the training time, and improved training stability.

Key words: deep reinforcement learning, attention mechanism, unmanned aerial vehicle, navigation and obstacle avoidance, 3D scene

CLC Number:

LIU Bokai, YIN Xuefeng, SUN Chuanyu, GE Huilin, WEI Ziqi, JIANG Yutong, PIAO Haiyin, ZHOU Dongsheng, YANG Xin. Research on UAV three-dimensional scene navigation based on deep reinforcement learning[J]. Journal of Graphics, 2025, 46(5): 1010-1017.

Figures/Tables 10

Fig. 1 Pipeline of human-machine navigation method

Fig. 2 Overall network of feature extraction module

Fig. 3 Self-supervised image feature extraction network based on attention mechanism

Fig. 4 LSTM-SAC network architecture

Fig. 5 3D simulation map ((a) Simple 3D map; (b) Primary 3D map; (c) Middle 3D map; (d) Advanced 3D map; (e) Verification map1; (f) Verification map2)

Fig. 6 Comparison of different methods ((a) Average cumulative rewards; (b) Success rates)

Table 1 Comparison of verification experiments in scene_1

方法	成功率	奖励	平均步数
Ours	0.90	326.72	523.15
SDDPG	0.65	201.61	526.70
TD3	0.45	125.98	623.00
PPO	0.45	108.31	712.65

Table 2 Comparison of verification experiments in scene_2

方法	成功率	奖励	平均步数
Ours	0.85	317.04	553.25
SDDPG	0.6	195.46	608.15
TD3	0.35	138.95	781
PPO	0.45	106.30	675.50

Fig. 7 Comparison of navigation paths ((a) Ours; (b) SDDPG; (c) TD3; (d) PPO)

Fig. 8 Comparison of ablation studies ((a) Average cumulative rewards; (b) Success rates)

References 29

[1]	KLAUSER F. Policing with the drone: towards an aerial geopolitics of security[J]. Security Dialogue, 2022, 53(2): 148-163.
[2]	SRIVASTAVA S K, SENG K P, ANG L M, et al. Drone-based environmental monitoring and image processing approaches for resource estimates of private native forest[J]. Sensors, 2022, 22(20): 7872.
[3]	ROLDÁN-GÓMEZ J J, GONZÁLEZ-GIRONDA E, BARRIENTOS A. A survey on robotic technologies for forest firefighting: applying drone swarms to improve firefighters’ efficiency and safety[J]. Applied Sciences, 2021, 11(1): 363.
[4]	ZHENG Q Q, LIN N, FU D, et al. Smart-contract-based agricultural service platform for drone plant protection operation optimization[J]. IEEE Internet of Things Journal, 2023, 10(24): 21363-21376.
[5]	QI Y B, YANG R H, SU C H. An OSINT-driven security analysis of intelligent construction of water conservancy projects in China[C]// The 7th International Conference on Civil Engineering. Cham: Springer, 2023: 139-150.
[6]	ELMOKADEM T, SAVKIN A V. A hybrid approach for autonomous collision-free UAV navigation in 3D partially unknown dynamic environments[J]. Drones, 2021, 5(3): 57.
[7]	IVERSEN N, SCHOFIELD O B, COUSIN L, et al. Design, integration and implementation of an intelligent and self-recharging drone system for autonomous power line inspection[C]// 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2021: 4168-4175.
[8]	BENARBIA T, KYAMAKYA K. A literature review of drone-based package delivery logistics systems and their implementation feasibility[J]. Sustainability, 2021, 14(1): 360.
[9]	ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: a brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.
[10]	ZHAO X S, CHONG J Z, QI X H, et al. Vision object-oriented augmented sampling-based autonomous navigation for micro aerial vehicles[J]. Drones, 2021, 5(4): 107.
[11]	DAI A N, PAPATHEODOROU S, FUNK N, et al. Fast frontier-based information-driven autonomous exploration with an MAV[C]// 2020 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2020: 9570-9576.
[12]	CHEN S Y, ZHOU W F, YANG A S, et al. An end-to-end UAV simulation platform for visual SLAM and navigation[J]. Aerospace, 2022, 9(2): 48.
[13]	AMER K, SAMY M, SHAKER M, et al. Deep convolutional neural network based autonomous drone navigation[EB/OL]. [2024-10-17]. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11605/2587105/Deep-convolutional-neural-network-based-autonomous-drone-navigation/10.1117/12.2587105.short.
[14]	ARSHAD M A, KHAN S H, QAMAR S, et al. Drone navigation using region and edge exploitation-based deep CNN[J]. IEEE Access, 2022, 10: 95441-95450.
[15]	CHAPLOT D S, GANDHI D, GUPTA S, et al. Learning to explore using active neural SLAM[EB/OL]. [2024-10-02]. https://arxiv.org/abs/2004.05155.
[16]	SADEGHI F, LEVINE S. CAD²RL:real single-image flight without a single real image[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1611.04201.
[17]	ANWAR A, RAYCHOWDHURY A. Autonomous navigation via deep reinforcement learning for resource constraint edge nodes using transfer learning[J]. IEEE Access, 2020, 8: 26549-26560.
[18]	WANG C, WANG J, WANG J J, et al. Deep-reinforcement- learning-based autonomous UAV navigation with sparse rewards[J]. IEEE Internet of Things Journal, 2020, 7(7): 6180-6190.
[19]	丁建川, 肖金桐, 赵可新, 等. 基于脉冲神经网络的复杂场景导航避障算法[J]. 图学学报, 2023, 44(6): 1121-1129. DOI
	DING J C, XIAO J T, ZHAO K X, et al. Spiking neural network-based navigation and obstacle avoidance algorithm for complex scenes[J]. Journal of Graphics, 2023, 44(6): 1121-1129. (in Chinese)
[20]	ZHANG Z X, DONG B, LI T, et al. Single depth-image 3D reflection symmetry and shape prediction[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 8862-8872.
[21]	伍一鹤, 张振宁, 仇栋, 等. 基于深度强化学习的虚拟手自适应抓取研究[J]. 图学学报, 2021, 42(3): 462-469.
	WU Y H, ZHANG Z N, QIU D, et al. Research on adaptive grasping of virtual hands based on deep reinforcement learning[J]. Journal of Graphics, 2021, 42(3): 462-469. (in Chinese)
[22]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1409.1556.
[23]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// The 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1097-1105.
[24]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[25]	KINGMA D. P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1312.6114.
[26]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. [2024-10-17]. http://proceedings.mlr.press/v80/haarnoja18b.html.
[27]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI PMID
[28]	BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]// The 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 41-48.
[29]	ZHANG L J, PENG J B, YI W G, et al. A state-decomposition DDPG algorithm for UAV autonomous navigation in 3-D complex environments[J]. IEEE Internet of Things Journal, 2024, 11(6): 10778-10790.