欢迎访问《图学学报》 分享到:

图学学报 ›› 2022, Vol. 43 ›› Issue (2): 214-222.DOI: 10.11996/JG.j.2095-302X.2022020214

• 图像处理与计算机视觉 • 上一篇    下一篇

基于分层压缩激励的 ASPP 网络单目深度估计

  

  1. 1. 合肥工业大学计算机与信息学院,安徽 合肥 230009;
    2. 工业安全与应急技术安徽省重点实验室,安徽 合肥 230009;
    3. 中国科学院合肥物质科学研究院,安徽 合肥 230031;
    4. 合肥工业大学软件学院,安徽 合肥 230009
  • 出版日期:2022-04-30 发布日期:2022-05-07
  • 基金资助:
    安徽高校协同创新项目(GXXT-2019-003);中央高校基本科研业务费专项(PA2021GDSK0069);安徽省自然科学基金项目(2108085QF286)

Monocular depth estimation of ASPP networks based on hierarchical compress excitation

  1. 1. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei Anhui 230009, China;
    2. Anhui Key Laboratory of Industrial Safety and Emergency Technology, Hefei Anhui 230009, China;
    3. Hefei Institute of Physical Science, Chinese Academy of Sciences, Hefei Anhui 230031, China;
    4. School of Software, Hefei University of Technology, Hefei Anhui 230009, China
  • Online:2022-04-30 Published:2022-05-07
  • Supported by:
    The university Synergy Innovation Program of Anhui Province (GXXT-2019-003); The Fundamental Research Funds for the Central Universities of China (PA2021GDSK0069); Provincial Natural Science Fund of Anhui (2108085QF286)

摘要: 场景深度估计是场景理解的一项基本任务,其准确率反映了计算机对场景的理解程度。传统的
深度估计利用金字塔池化(ASPP)模块可以在不改变图像分辨率的情况下处理不同像素特征,但该模块未考虑不
同像素特征之间的关系,导致场景特征提取不准确。针对 ASPP 模块在深度估计中出现的弊端,提出了一种改
进型的 ASPP 模块,解决了该模块在图像处理中存在的失真问题。首先在卷积核后添加基于分层压缩激励的
ASPP 结构块,结合各像素特征之间的关系,让网络自适应学习感兴趣部分;再通过构造差值矩阵解决网络层
次优化问题;最后在室内公共数据集 NYU-Depthv2 上进行深度估计网络模型的搭建。与当前主流算法相比,
文中算法在定性、定量指标上均有良好表现。在相同的评估指标下,  1 阈值精度提升近 3%,均方误差(RMSE)、
绝对误差(Abs Rel)下降 1.7%,对数域误差(lg)下降约 0.3%。该方法所训练的网络模型,解决了传统 ASPP 模块
未考虑不同像素特征之间关系的问题,特征提取能力增强,场景深度估计的结果更加准确。

关键词: 深度学习, 卷积神经网络, 深度估计, 空洞空间金字塔池化, 分层设计

Abstract: Scene depth estimation is a basic task of scene understanding, and its accuracy reflects the degree of
computer’s understanding of scene. Traditional depth estimation employs the atrous spatial pyramid pooling (ASPP)
module to process different pixel features without changing the image resolution. However, this module does not
consider the relationship between different pixel features, leading to inaccurate scene feature extraction. In view of the disadvantages of the ASPP module in depth estimation, an improved ASPP module was proposed to solve the
distortion problem of the ASPP module in image processing. Firstly, the proposed module was added after the
convolution kernel. Combined with the relationship between the features of each pixel, the method of enabling the
network to adaptively learn the part of interest can effectively extract the features accurately according to the given
image. Then the problem of network hierarchy optimization was solved by constructing difference matrix. Finally, the
depth estimation network model was built on the indoor public dataset NYU-Depthv2. Compared with the current
mainstream algorithms, the algorithm can achieve good performance in both qualitative and quantitative indexes.
Under the same evaluation index, compared with the most advanced algorithm, the accuracy of  1 threshold is
improved by nearly 3%, the root mean square error and absolute error are decreased by 1.7%, and the log domain error
(lg) is decreased by about 0.3%. The improved ASPP network model proposed in this paper addresses the problem that
the traditional ASPP modules fail to take into account the relationship between different pixel features. It can
effectively make the model more convergent, significantly improve the ability of feature extraction, and produce more
accurate results of scene depth estimation.

Key words: deep learning, convolutional neural networks, depth estimation, atrous spatial pyramid pooling;
hierarchical design

中图分类号: