欢迎访问《图学学报》 分享到:

图学学报 ›› 2020, Vol. 41 ›› Issue (6): 922-929.DOI: 10.11996/JG.j.2095-302X.2020060922

• 图像处理与计算机视觉 • 上一篇    下一篇

特征融合网络:多通道信息融合的光场深度估计 

  

  1. (合肥工业大学计算机与信息学院,安徽 合肥 230009)
  • 出版日期:2020-12-31 发布日期:2021-01-08
  • 基金资助:
    基金项目:国家自然科学基金面上项目(61876057,61971177) 

FANET: light field depth estimation with multi-channel information fusion 

  1. (School of Computer and Information, Hefei University of Technology, Hefei Anhui 230009, China) 
  • Online:2020-12-31 Published:2021-01-08
  • Supported by:
    Foundation items:General Project of National Natural Science Foundation of China (61876057, 61971177) 

摘要: 摘 要:光场相机可以仅在一次拍摄中记录场景的空间和角度信息,所生成的图像与传统 二维图像相比包含了更多的信息,在深度估计任务方面更具有优势。为了利用光场图像获取高 质量的场景深度,基于其多视角的表征方式,提出了一种具有多通道信息高效融合结构的特征 融合网络。在人为选择特定视角的基础上,使用不同尺寸卷积核来应对不同的基线变化;同时 针对光场数据的多路输入特点搭建了特征融合模块,并利用双通道的网络结构整合神经网络的 前后层信息,提升网络的学习效率并减少信息损失。在 new HCI 数据集上的实验结果显示,该 网络在训练集上的收敛速度较快,可以在非朗伯场景中实现精确的深度估计,并且在 MSE 指 标的平均值表现上要优于所对比的其他先进的方法。

关键词: 关 键 词:光场, 深度估计, 卷积神经网络, 特征融合, 注意力, 多视角

Abstract: Abstract: Compared with the traditional two-dimensional images, the images, generated by the light field camera capturing the spatial and angular information of the scene in only one shot, contain more information and exhibit more advantages in the depth estimation task. In order to obtain high-quality scene depth using light field images, a feature assigning network, of which the structure can efficiently fuse the multi-channel information, was designed for depth estimation based on its multi-angle representation. On the basis of the artificial selection of specific views, convolution kernels of different sizes were utilized to cope with different baseline changes. Meanwhile, a feature fusion module was established based on the multi-input characteristics of light field data, and the double-channel network structure was used to integrate the front and back layer information, boosting the learning efficiency and performance of the network. Experimental results on the new HCI data set show that the network converges faster on the training set and can achieve accurate depth estimation in non-Lambertian scenes, and that the average performance on the MSE indicator is superior to other advanced methods.

Key words: Keywords: light field, depth estimation, convolutional neural network, feature fusion, attention, multi-view 

中图分类号: