Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (4): 728-738.DOI: 10.11996/JG.j.2095-302X.2023040728

Previous Articles     Next Articles

Monocular depth estimation based on Laplacian pyramid with attention fusion

YU Wei-qun(), LIU Jia-tao, ZHANG Ya-ping()   

  1. School of Information Science and Technology, Yunnan Normal University, Kunming Yunnan 650500, China
  • Received:2022-11-22 Accepted:2023-03-27 Online:2023-08-31 Published:2023-08-16
  • Contact: ZHANG Ya-ping (1979-), professor, Ph.D. Her main research interests cover computer vision, computer graphic. E-mail:zhangyp@ynnu.edu.cn
  • About author:

    YU Wei-qun (1998-), master student. His main research interests cover computer vision, image processing. E-mail:yudalao888@163.com

  • Supported by:
    National Natural Science Foundation of China(61863037);Ten Thousand Talent Plans for Young Top-Notch Talents of Yunnan Province

Abstract:

With the rapid development of deep neural networks, research on deep learning-based monocular depth estimation has centered on regressing depth through encoder-decoder structures and has yielded significant results. However, most traditional methods typically entail the repetition of simple upsampling operations during the decoding process, which fail to take full advantage of the characteristics of the encoder for monocular depth estimation. To address this problem, this study proposed a dense feature decoding structure combined with an attention mechanism. Utilizing a single RGB image as input, the feature map of each level of the encoder was fused into the branch of the Laplace pyramid to heighten the utilization of the feature map at each level. Attention mechanisms were introduced into the decoder to further enhance depth estimation. Finally, data loss and structural similarity loss were combined to reinforce the stability and convergence speed of model training and diminish the training cost of the model. The experimental results demonstrated that compared with the existing model on the KITTI dataset, the root mean square error decreased by 4.8% and the training cost was reduced by 36% relative to the advanced algorithm LapDepth, with a more significant improvement in depth estimation accuracy and convergence speed.

Key words: deep learning, monocular depth estimation, attention mechanism, Laplacian pyramid, Laplacian residuals

CLC Number: