Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 454-463.DOI: 10.11996/JG.j.2095-302X.2024030454

Previous Articles     Next Articles

Monocular depth estimation combining pyramid structure and attention mechanism

LI Tao(), HU Ting, WU Dandan   

  1. School of Electrical Engineering and Electronic Information, Xihua University, Chengdu Sichuan 610039, China
  • Received:2023-12-25 Accepted:2024-02-06 Online:2024-06-30 Published:2024-06-06
  • About author:

    LI Tao (1983-), associate professor, Ph.D. Her main research interests cover image/video compression and restoration, image super-resolution reconstruction, and deep image completion. E-mail:litao@mail.xhu.edu.cn

  • Supported by:
    The Department of Science and Technology of Sichuan Province(2021YJ0109);National Natural Science Foundation of China(61901392);National Natural Science Foundation of China(62041109)

Abstract:

Monocular depth estimation is the prediction of a dense depth image from a single color image. A monocular depth estimation algorithm combining pyramid structure and attention mechanism was proposed to address the issues of boundary ambiguity and insufficient capture of contextual information in current monocular depth estimation algorithms. The algorithm adopted the overall framework of encoder-decoder, in which the encoder selected the PVTv2 network to obtain more adequate global semantic information by taking advantage of the Transformer network in modeling global information. The decoder consisted of a depth estimation main branch and two pyramid sub-branches. The depth estimation main branch adaptively focused on important feature regions and feature channels between the encoder and decoder features through spatial and channel attention mechanisms. The Laplacian pyramid sub-branch and depth residual pyramid sub-branch aimed to learn rich local information from color images and depth estimation main branch depth features, transferring it to the depth estimation main branch to address the problems of missing details and chaotic structures in monocular depth estimation. Experimental results demonstrated that on the indoor public dataset NYU Depth V2, compared with the advanced algorithm P3Depth, the accuracy of δ1.25 threshold was increased by 1.22%, the absolute error and root mean square error were decreased by 5.8% and 2.8%, respectively. On the outdoor public dataset KITTI, the absolute error, root mean square logarithmic error, and root mean square error of the algorithm were decreased by 8.5%, 3.9%, and 0.4%, respectively. The algorithm improved the accuracy of depth estimation and achieved a good visual rendering.

Key words: deep learning, monocular depth estimation, pyramid structure, attention mechanism, Transformer

CLC Number: