基于深度强化学习的无人机三维场景导航方法研究

doi:10.11996/JG.j.2095-302X.2025051010

图学学报 ›› 2025, Vol. 46 ›› Issue (5): 1010-1017.DOI: 10.11996/JG.j.2095-302X.2025051010

• 图像处理与计算机视觉 • 上一篇下一篇

基于深度强化学习的无人机三维场景导航方法研究

刘伯凯¹(), 殷雪峰¹, 孙传昱¹, 葛慧林²(), 魏子麒³, 姜雨彤⁴, 朴海音⁵, 周东生⁶, 杨鑫¹

¹ 大连理工大学计算机学院社会计算与认知智能教育部重点实验室，辽宁大连 116024
² 江苏科技大学自动化学院，江苏镇江 212100
³ 中国科学院自动化研究所，北京 100190
⁴ 中国北方车辆研究所先进越野系统技术全国重点实验室，北京 100072
⁵ 中国航空工业集团公司沈阳飞机设计研究所，辽宁沈阳 110035
⁶ 大连大学软件工程学院，辽宁大连 116024

收稿日期:2024-12-17 接受日期:2025-04-21 出版日期:2025-10-30 发布日期:2025-09-10
通讯作者:葛慧林(1989-)，男，副研究员，博士。主要研究方向为水下目标探测、计算机视觉等。E-mail：ghl1989@just.edu.cn
第一作者:刘伯凯(1999-)，男，硕士研究生。主要研究方向为图形图像处理等。E-mail：lbk2593469678@163.com
基金资助:
国家自然科学基金(62441216);科技部“脑科学与类脑研究”重大项目(2022ZD0210500)

Research on UAV three-dimensional scene navigation based on deep reinforcement learning

LIU Bokai¹(), YIN Xuefeng¹, SUN Chuanyu¹, GE Huilin²(), WEI Ziqi³, JIANG Yutong⁴, PIAO Haiyin⁵, ZHOU Dongsheng⁶, YANG Xin¹

¹ Key Laboratory of Social Computing and Cognitive Intelligence, School of Computer Science, Dalian University of Technology, Dalian Liaoning 116024, China
² School of Automation, Jiangsu University of Science and Technology, Zhenjiang Jiangsu 212100, China
³ Nstitute of Automation, Chinese Academy of Sciences, Beijing 100190, China
⁴ National Key Laboratory of Advanced Off-road System Technology, China North Vehicle Research Institute, Beijing 100072, China
⁵ Shenyang Aircraft Design and Research Institute, Aviation Industry Corporation of China, Shenyang Liaoning 110035, China
⁶ School of Software Engineering, Dalian University, Shenyang Liaoning 116024, China

Received:2024-12-17 Accepted:2025-04-21 Published:2025-10-30 Online:2025-09-10
First author：LIU Bokai (1999-), master student. His main research interest covers graphic image processing, etc. E-mail：lbk2593469678@163.com
Supported by:
National Natural Science Foundation of China(62441216);Major Project of the Ministry of Science and Technology on “Brain Science and Brain-like Research”(2022ZD0210500)

摘要/Abstract

摘要：

近年来，无人机产业规模与应用需求不断扩大，实现无人机的自主化和智能化成为了行业内亟待解决的核心问题。无人机导航作为无人机自主控制领域的基础技术，已然成为无人机应用研究的重中之重。目前大多数无人机导航方法依赖于环境信息的重建，消耗过多的计算和内存，无法满足日益复杂的场景与实时性要求。因此，基于深度学习卓越的表征学习能力与强化学习的自主学习决策能力，提出无人机自主导航方法，通过不断自主学习优化决策策略，更好地完成导航任务。首先构造连续性动作空间以及非稀疏性奖励函数，用来引导无人机的学习过程；并设计特征提取模块与决策模块来提高无人机感知能力和决策能力。实验结果表明，在仿真三维场景下，该算法表现出最优的导航避障性能，在所设计的三维场景下导航成功率可达到87%，平均累计奖励收敛值较同期方法提高33%，同时缩短训练时长，提高训练稳定性。

关键词: 深度强化学习, 注意力机制, 无人机, 导航避障, 三维场景

Abstract:

In recent years, with the UAV industry and application demands expanding, the realization of UAV autonomy and intelligence has been identified as a critical challenge As a foundational technology in the field of autonomous control of UAVs, UAV navigation and exploration have become a top priority in UAV application research. Currently, most UAV navigation and exploration methods rely on the reconstruction of environmental information, consuming excessive computation and memory, thus failing to meet the increasingly complex scenarios and real-time requirements. Therefore, based on the excellent representation learning ability of deep learning and the self-learning decision-making ability of reinforcement learning, an autonomous navigation method for unmanned aerial vehicles was proposed. By continuously optimizing decision-making strategies through self-learning, the navigation task could be better completed. The method first constructed a continuous action space and a non-sparse reward function to guide the learning process of the drone; then designed feature-extraction and decision-making modules to enhance the perception and decision-making capabilities of the UAV. The experimental results demonstrated that the algorithm exhibited the best navigation and obstacle avoidance performance in the simulated 3D scene. The navigation success rate in the designed 3D scene reached 87%, a 33% increase in average cumulative reward convergence value over that of the same period method, reduced the training time, and improved training stability.

Key words: deep reinforcement learning, attention mechanism, unmanned aerial vehicle, navigation and obstacle avoidance, 3D scene

中图分类号:

刘伯凯, 殷雪峰, 孙传昱, 葛慧林, 魏子麒, 姜雨彤, 朴海音, 周东生, 杨鑫. 基于深度强化学习的无人机三维场景导航方法研究[J]. 图学学报, 2025, 46(5): 1010-1017.

LIU Bokai, YIN Xuefeng, SUN Chuanyu, GE Huilin, WEI Ziqi, JIANG Yutong, PIAO Haiyin, ZHOU Dongsheng, YANG Xin. Research on UAV three-dimensional scene navigation based on deep reinforcement learning[J]. Journal of Graphics, 2025, 46(5): 1010-1017.

图/表 10

图1 无人机导航方法总体设计思路

Fig. 1 Pipeline of human-machine navigation method

图2 特征提取模块总体网络设计

Fig. 2 Overall network of feature extraction module

图3 基于注意力机制的自监督图像特征提取网络

Fig. 3 Self-supervised image feature extraction network based on attention mechanism

图4 LSTM-SAC网络架构

Fig. 4 LSTM-SAC network architecture

图5 三维仿真地图((a) 简单三维地图；(b) 初级三维地图；(c) 中级三维地图；(d) 高级三维地图；(e) 验证地图1；(f) 验证地图2)

Fig. 5 3D simulation map ((a) Simple 3D map; (b) Primary 3D map; (c) Middle 3D map; (d) Advanced 3D map; (e) Verification map1; (f) Verification map2)

图6 不同方法的结果对比((a) 平均累计奖励；(b) 成功率)

Fig. 6 Comparison of different methods ((a) Average cumulative rewards; (b) Success rates)

表1 验证场景_1验证实验对比

Table 1 Comparison of verification experiments in scene_1

方法	成功率	奖励	平均步数
Ours	0.90	326.72	523.15
SDDPG	0.65	201.61	526.70
TD3	0.45	125.98	623.00
PPO	0.45	108.31	712.65

表2 验证场景_2验证实验对比

Table 2 Comparison of verification experiments in scene_2

方法	成功率	奖励	平均步数
Ours	0.85	317.04	553.25
SDDPG	0.6	195.46	608.15
TD3	0.35	138.95	781
PPO	0.45	106.30	675.50

图7 不同方法导航路径对比

Fig. 7 Comparison of navigation paths ((a) Ours; (b) SDDPG; (c) TD3; (d) PPO)

图8 消融实验结果对比((a) 平均累计奖励；(b) 成功率)

Fig. 8 Comparison of ablation studies ((a) Average cumulative rewards; (b) Success rates)

参考文献 29

[1]	KLAUSER F. Policing with the drone: towards an aerial geopolitics of security[J]. Security Dialogue, 2022, 53(2): 148-163.
[2]	SRIVASTAVA S K, SENG K P, ANG L M, et al. Drone-based environmental monitoring and image processing approaches for resource estimates of private native forest[J]. Sensors, 2022, 22(20): 7872.
[3]	ROLDÁN-GÓMEZ J J, GONZÁLEZ-GIRONDA E, BARRIENTOS A. A survey on robotic technologies for forest firefighting: applying drone swarms to improve firefighters’ efficiency and safety[J]. Applied Sciences, 2021, 11(1): 363.
[4]	ZHENG Q Q, LIN N, FU D, et al. Smart-contract-based agricultural service platform for drone plant protection operation optimization[J]. IEEE Internet of Things Journal, 2023, 10(24): 21363-21376.
[5]	QI Y B, YANG R H, SU C H. An OSINT-driven security analysis of intelligent construction of water conservancy projects in China[C]// The 7th International Conference on Civil Engineering. Cham: Springer, 2023: 139-150.
[6]	ELMOKADEM T, SAVKIN A V. A hybrid approach for autonomous collision-free UAV navigation in 3D partially unknown dynamic environments[J]. Drones, 2021, 5(3): 57.
[7]	IVERSEN N, SCHOFIELD O B, COUSIN L, et al. Design, integration and implementation of an intelligent and self-recharging drone system for autonomous power line inspection[C]// 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE Press, 2021: 4168-4175.
[8]	BENARBIA T, KYAMAKYA K. A literature review of drone-based package delivery logistics systems and their implementation feasibility[J]. Sustainability, 2021, 14(1): 360.
[9]	ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: a brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38.
[10]	ZHAO X S, CHONG J Z, QI X H, et al. Vision object-oriented augmented sampling-based autonomous navigation for micro aerial vehicles[J]. Drones, 2021, 5(4): 107.
[11]	DAI A N, PAPATHEODOROU S, FUNK N, et al. Fast frontier-based information-driven autonomous exploration with an MAV[C]// 2020 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2020: 9570-9576.
[12]	CHEN S Y, ZHOU W F, YANG A S, et al. An end-to-end UAV simulation platform for visual SLAM and navigation[J]. Aerospace, 2022, 9(2): 48.
[13]	AMER K, SAMY M, SHAKER M, et al. Deep convolutional neural network based autonomous drone navigation[EB/OL]. [2024-10-17]. https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11605/2587105/Deep-convolutional-neural-network-based-autonomous-drone-navigation/10.1117/12.2587105.short.
[14]	ARSHAD M A, KHAN S H, QAMAR S, et al. Drone navigation using region and edge exploitation-based deep CNN[J]. IEEE Access, 2022, 10: 95441-95450.
[15]	CHAPLOT D S, GANDHI D, GUPTA S, et al. Learning to explore using active neural SLAM[EB/OL]. [2024-10-02]. https://arxiv.org/abs/2004.05155.
[16]	SADEGHI F, LEVINE S. CAD²RL:real single-image flight without a single real image[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1611.04201.
[17]	ANWAR A, RAYCHOWDHURY A. Autonomous navigation via deep reinforcement learning for resource constraint edge nodes using transfer learning[J]. IEEE Access, 2020, 8: 26549-26560.
[18]	WANG C, WANG J, WANG J J, et al. Deep-reinforcement- learning-based autonomous UAV navigation with sparse rewards[J]. IEEE Internet of Things Journal, 2020, 7(7): 6180-6190.
[19]	丁建川, 肖金桐, 赵可新, 等. 基于脉冲神经网络的复杂场景导航避障算法[J]. 图学学报, 2023, 44(6): 1121-1129. DOI
	DING J C, XIAO J T, ZHAO K X, et al. Spiking neural network-based navigation and obstacle avoidance algorithm for complex scenes[J]. Journal of Graphics, 2023, 44(6): 1121-1129. (in Chinese)
[20]	ZHANG Z X, DONG B, LI T, et al. Single depth-image 3D reflection symmetry and shape prediction[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 8862-8872.
[21]	伍一鹤, 张振宁, 仇栋, 等. 基于深度强化学习的虚拟手自适应抓取研究[J]. 图学学报, 2021, 42(3): 462-469.
	WU Y H, ZHANG Z N, QIU D, et al. Research on adaptive grasping of virtual hands based on deep reinforcement learning[J]. Journal of Graphics, 2021, 42(3): 462-469. (in Chinese)
[22]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1409.1556.
[23]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// The 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1097-1105.
[24]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[25]	KINGMA D. P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2024-10-17]. https://arxiv.org/abs/1312.6114.
[26]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. [2024-10-17]. http://proceedings.mlr.press/v80/haarnoja18b.html.
[27]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI PMID
[28]	BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]// The 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 41-48.
[29]	ZHANG L J, PENG J B, YI W G, et al. A state-decomposition DDPG algorithm for UAV autonomous navigation in 3-D complex environments[J]. IEEE Internet of Things Journal, 2024, 11(6): 10778-10790.

基于深度强化学习的无人机三维场景导航方法研究

Research on UAV three-dimensional scene navigation based on deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价

[1]	左屿琪, 张云峰, 张秋悦, 徐英城. 基于超图表示学习和Transformer模型优化的知识感知推荐[J]. 图学学报, 2025, 46(5): 1050-1060.
[2]	翟永杰, 翟邦朝, 胡哲东, 杨珂, 王乾铭, 赵晓瑜. 基于自适应特征融合金字塔与注意力机制的输电线路绝缘子缺陷检测方法[J]. 图学学报, 2025, 46(5): 950-959.
[3]	杨佳熙, 于乐天, 包骐瑞, 毕胜, 麻晓斗, 杨晟琦, 姜雨彤, 方建儒, 魏小鹏, 杨鑫. 面向高光子通量环境的目标深度估计方法[J]. 图学学报, 2025, 46(4): 756-762.
[4]	牛杭, 葛鑫雨, 赵晓瑜, 杨珂, 王乾铭, 翟永杰. 基于改进YOLOv8的防振锤缺陷目标检测算法[J]. 图学学报, 2025, 46(3): 532-541.
[5]	于冰, 程广, 黄东晋, 丁友东. 基于双流网络融合的三维人体网格重建[J]. 图学学报, 2025, 46(3): 625-634.
[6]	胡悦, 孙智达, 黄惠. 面向无人机路径规划的可视分析系统[J]. 图学学报, 2025, 46(3): 655-665.
[7]	雷玉林, 刘利刚. 基于深度强化学习的可缓冲的物体运输和装箱[J]. 图学学报, 2025, 46(3): 697-708.
[8]	张立立, 杨康, 张珂, 魏薇, 李晶, 谭洪鑫, 张翔宇. 面向柴油车辆排放黑烟的改进型YOLOv8检测算法研究[J]. 图学学报, 2025, 46(2): 249-258.
[9]	刘高屹, 胡瑞珍, 刘利刚. 基于2D特征蒸馏的3D高斯泼溅语义分割与编辑[J]. 图学学报, 2025, 46(2): 312-321.
[10]	崔克彬, 耿佳昌. 基于EE-YOLOv8s的多场景火灾迹象检测算法[J]. 图学学报, 2025, 46(1): 13-27.
[11]	陈冠豪, 徐丹, 贺康建, 施洪贞, 张浩. 基于转置注意力和CNN的图像超分辨率重建网络[J]. 图学学报, 2025, 46(1): 35-46.
[12]	张文祥, 王夏黎, 王欣仪, 杨宗宝. 一种强化伪造区域关注的深度伪造人脸检测方法[J]. 图学学报, 2025, 46(1): 47-58.
[13]	苑朝, 赵明雪, 张丰羿, 冯晓勇, 李冰, 陈瑞. 基于点云特征增强的复杂室内场景3D目标检测[J]. 图学学报, 2025, 46(1): 59-69.
[14]	卢洋, 陈林慧, 姜晓恒, 徐明亮. SDENet：基于多尺度注意力质量感知的合成缺陷数据评价网络[J]. 图学学报, 2025, 46(1): 94-103.
[15]	李琼, 考月英, 张莹, 徐沛. 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164.