欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (4): 728-738.DOI: 10.11996/JG.j.2095-302X.2023040728

• 图像处理与计算机视觉 • 上一篇    下一篇

融合注意力的拉普拉斯金字塔单目深度估计

余伟群(), 刘佳涛, 张亚萍()   

  1. 云南师范大学信息学院,云南 昆明 650500
  • 收稿日期:2022-11-22 接受日期:2023-03-27 出版日期:2023-08-31 发布日期:2023-08-16
  • 通讯作者: 张亚萍(1979-),女,教授,博士。主要研究方向为计算机视觉、计算机图形学。E-mail:zhangyp@ynnu.edu.cn
  • 作者简介:

    余伟群(1998-),男,硕士研究生。研究方向为计算机视觉、图像处理。E-mail:yudalao888@163.com

  • 基金资助:
    国家自然科学基金项目(61863037);云南省“万人计划”青年拔尖人才专项

Monocular depth estimation based on Laplacian pyramid with attention fusion

YU Wei-qun(), LIU Jia-tao, ZHANG Ya-ping()   

  1. School of Information Science and Technology, Yunnan Normal University, Kunming Yunnan 650500, China
  • Received:2022-11-22 Accepted:2023-03-27 Online:2023-08-31 Published:2023-08-16
  • Contact: ZHANG Ya-ping (1979-), professor, Ph.D. Her main research interests cover computer vision, computer graphic. E-mail:zhangyp@ynnu.edu.cn
  • About author:

    YU Wei-qun (1998-), master student. His main research interests cover computer vision, image processing. E-mail:yudalao888@163.com

  • Supported by:
    National Natural Science Foundation of China(61863037);Ten Thousand Talent Plans for Young Top-Notch Talents of Yunnan Province

摘要:

随着深度神经网络的迅速发展,基于深度学习的单目深度估计研究集中于通过编码器-解码器结构回归深度,并取得了重大成果。针对在大多数传统方法中,解码过程通常重复简单的上采样操作,存在无法充分利用编码器的特性进行单目深度估计的问题,提出一种结合注意力机制的致密特征解码结构,以单张RGB图像作为输入,将编码器各层级的特征图融合到拉普拉斯金字塔分支中,加强特征融合的深度和广度;在解码器中引入注意力机制,进一步提高了深度估计精度;结合数据损失和结构相似性损失,提高模型训练的稳定性及收敛速度,降低模型的训练代价。实验结果表明,在KITTI数据集上与现有的模型相比,均方根误差相较于先进的算法LapDepth降低了4.8%,训练代价降低了36%,深度估计精度和收敛速度均有较显著地提升。

关键词: 深度学习, 单目深度估计, 注意力机制, 拉普拉斯金字塔, 拉普拉斯残差

Abstract:

With the rapid development of deep neural networks, research on deep learning-based monocular depth estimation has centered on regressing depth through encoder-decoder structures and has yielded significant results. However, most traditional methods typically entail the repetition of simple upsampling operations during the decoding process, which fail to take full advantage of the characteristics of the encoder for monocular depth estimation. To address this problem, this study proposed a dense feature decoding structure combined with an attention mechanism. Utilizing a single RGB image as input, the feature map of each level of the encoder was fused into the branch of the Laplace pyramid to heighten the utilization of the feature map at each level. Attention mechanisms were introduced into the decoder to further enhance depth estimation. Finally, data loss and structural similarity loss were combined to reinforce the stability and convergence speed of model training and diminish the training cost of the model. The experimental results demonstrated that compared with the existing model on the KITTI dataset, the root mean square error decreased by 4.8% and the training cost was reduced by 36% relative to the advanced algorithm LapDepth, with a more significant improvement in depth estimation accuracy and convergence speed.

Key words: deep learning, monocular depth estimation, attention mechanism, Laplacian pyramid, Laplacian residuals

中图分类号: