Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2022, Vol. 43 ›› Issue (2): 214-222.DOI: 10.11996/JG.j.2095-302X.2022020214

• Image Processing and Computer Vision • Previous Articles     Next Articles

Monocular depth estimation of ASPP networks based on hierarchical compress excitation

  

  1. 1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei Anhui 230009, China;
    2. Anhui Key Laboratory of Industrial Safety and Emergency Technology, Hefei Anhui 230009, China;
    3. Hefei Institute of Physical Science, Chinese Academy of Sciences, Hefei Anhui 230031, China;
    4. School of Software, Hefei University of Technology, Hefei Anhui 230009, China
  • Online:2022-04-30 Published:2022-05-07
  • Supported by:
    The university Synergy Innovation Program of Anhui Province (GXXT-2019-003); The Fundamental Research Funds for the Central Universities of China (PA2021GDSK0069); Provincial Natural Science Fund of Anhui (2108085QF286)

Abstract: Scene depth estimation is a basic task of scene understanding, and its accuracy reflects the degree of
computer’s understanding of scene. Traditional depth estimation employs the atrous spatial pyramid pooling (ASPP)
module to process different pixel features without changing the image resolution. However, this module does not
consider the relationship between different pixel features, leading to inaccurate scene feature extraction. In view of the disadvantages of the ASPP module in depth estimation, an improved ASPP module was proposed to solve the
distortion problem of the ASPP module in image processing. Firstly, the proposed module was added after the
convolution kernel. Combined with the relationship between the features of each pixel, the method of enabling the
network to adaptively learn the part of interest can effectively extract the features accurately according to the given
image. Then the problem of network hierarchy optimization was solved by constructing difference matrix. Finally, the
depth estimation network model was built on the indoor public dataset NYU-Depthv2. Compared with the current
mainstream algorithms, the algorithm can achieve good performance in both qualitative and quantitative indexes.
Under the same evaluation index, compared with the most advanced algorithm, the accuracy of  1 threshold is
improved by nearly 3%, the root mean square error and absolute error are decreased by 1.7%, and the log domain error
(lg) is decreased by about 0.3%. The improved ASPP network model proposed in this paper addresses the problem that
the traditional ASPP modules fail to take into account the relationship between different pixel features. It can
effectively make the model more convergent, significantly improve the ability of feature extraction, and produce more
accurate results of scene depth estimation.

Key words: deep learning, convolutional neural networks, depth estimation, atrous spatial pyramid pooling;
hierarchical design

CLC Number: