欢迎访问《图学学报》 分享到:

图学学报 ›› 2021, Vol. 42 ›› Issue (3): 446-453.DOI: 10.11996/JG.j.2095-302X.2021030446

• 图像处理与计算机视觉 • 上一篇    下一篇

基于多任务模型的深度预测算法研究

  

  1. 大连理工大学计算机科学与技术学院,辽宁 大连 116024
  • 出版日期:2021-06-30 发布日期:2021-06-29
  • 基金资助:
    国家自然科学基金项目(91748104,61972067,61632006,U1811463,U1908214,61751203);国家重点研发计划项目(2018AAA0102003)

Research on depth prediction algorithm based on multi-task model 

  1. School of Computer Science and Technology, Dalian University of Technology, Dalian Liaoning 116024, China
  • Online:2021-06-30 Published:2021-06-29
  • Supported by:
    National Natural Science Foundation of China (91748104, 61172007, 61632006, U1811463, U1908214, 61751203); National Key Research and Development Program (2018AAA0102003) 

摘要: 图像的深度值预测是计算机视觉和机器人领域中的一个热门的研究课题。深度图的构建是三维 重建的重要前提,传统方法主要依靠确定固定点深度进行人工标注或是根据相机的位置变化来进行双目定位预 测深度,但这类方法一方面费时费力,另一方面也受到相机位置、定位方式、分布概率性等因素的限制,准确 率很难得到保证,从而导致预测的深度图难以完成后续三维重建等工作。通过引入基于多任务模块的深度学习 方法,可以有效解决这一问题。针对场景图像提出一种基于多任务模型的单目图像深度预测网络,能同时训练 学习深度预测、语义分割、表面向量估计 3 个任务,包括共有特征提取模块和多任务特征融合模块,能在提取 共有特征的同时保证各个特征的独立性,提升各个任务的结构性的同时保证深度预测的准确性。

关键词: 计算机视觉, 单目深度预测, 多任务模型, 语义分割, 表面向量估计 

Abstract: Image depth prediction is a hot research topic in the field of computer vision and robotics. The construction of depth image is an important prerequisite for 3D reconstruction. Traditional methods mainly conduct manual annotation based on the depth of a fixed point, or predict the depth based on binocular positioning according to the position of the camera. However, such methods are time-consuming and labor-intensive and restricted by factors such as camera position, positioning method, and distribution probability. As a result, the difficulty in guaranteeing high accuracy poses a challenge to subsequent tasks following the predicted depth map, such as 3D reconstruction. This problem can be effectively solved by introducing a deep learning method based on multi-task modules. For scene images, a multi-task model-based monocular-image depth-prediction network was proposed, which can simultaneously train and learn three tasks of depth prediction, semantic segmentation, and surface vector estimation. The network includes a common feature extraction module and a multi-task feature fusion module, which can ensure the independence of each feature while extracting common features, and guarantee the accuracy of depth prediction while improving the structure of each task. 

Key words: computer vision, monocular depth prediction, multi-task model, semantic segmentation, surface normal estimation 

中图分类号: