欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (5): 899-906.DOI: 10.11996/JG.j.2095-302X.2023050899

• 图像处理与计算机视觉 • 上一篇    下一篇

融合结构信息的深度学习立体匹配算法

党宏社1(), 许怀彪1, 张选德2   

  1. 1.陕西科技大学电气与控制工程学院,陕西 西安 710021
    2.陕西科技大学电子信息与人工智能学院,陕西 西安 710021
  • 收稿日期:2023-04-28 接受日期:2023-08-01 出版日期:2023-10-31 发布日期:2023-10-31
  • 作者简介:党宏社(1962-),男,教授,博士。主要研究方向为工业智能控制(工业机器人)、无线传感网络和数字图象处理。E-mail:danghs@sust.edu.cn
  • 基金资助:
    国家自然科学基金项目(61871206);陕西省科技厅自然科学基金项目(2020JM-509)

Deep learning stereo matching algorithm fusing structural information

DANG Hong-she1(), XU Huai-biao1, ZHANG Xuan-de2   

  1. 1. School of Electrical and Control Engineering, Shaanxi University of Science & Technology, Xi’an Shaanxi 710021, China
    2. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an Shaanxi 710021, China
  • Received:2023-04-28 Accepted:2023-08-01 Online:2023-10-31 Published:2023-10-31
  • About author:DANG Hong-she (1962-), professor, Ph.D. His main research interests cover industrial intelligent control (industrial robots), wireless sensor networks and digital image processing. E-mail:danghs@sust.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61871206);Natural Science Foundation Project of Shaanxi Provincial Department of Science and Technology(2020JM-509)

摘要:

针对现有立体匹配算法在边缘区域及视差不连续区域匹配精度的不足,提出了一种融合结构信息的深度学习立体匹配算法。通过简化特征提取网络,并使用Inplace-ABN层替换BatchNorm层和激活函数层,提高卷积提取图像特征的效率;利用结合注意力机制的局部相似性模块提取图像结构特征,与卷积提取到的特征进行融合,丰富图像特征信息;计算输出特征对的相关代价和连接代价,利用相关代价卷积生成注意力权重,滤除连接代价的冗余信息,提升匹配代价计算的精确性;使用简化的沙漏网络,提升网络代价聚合的快速性。算法通过Scene Flow数据集、CREStereo数据集和KITTI数据集进行实验,实验结果表明算法的全部区域端点误差为0.45 px,对第一帧图像全部区域预测错误的比例为1.55%,预测误差大于1 px的像素比例仅有6.87%,证明所提算法相比其他算法在匹配精度上表现优秀,同时验证了算法在问题区域匹配的有效性及优势。

关键词: 深度学习, 立体匹配, 结构信息, 局部相似性模块, 匹配代价

Abstract:

To address the limitations of existing stereo matching algorithms in both edge regions and regions of discontinuous disparity, a deep learning stereo matching algorithm fusing structural information was proposed. By limiting the convolution kernel size and replacing the BatchNorm layer and activation function layer with the Inplace-ABN layer, the efficiency of convolution to extract image features was enhanced. The local similarity pattern module combined with an attention mechanism was employed to extract image structural features, and the features extracted by convolution were fused to enrich image feature information. The correlation volume and connection volume of the output feature were calculated. By utilizing the correlation volume to generate attention weights, the algorithm filtered out the redundant information of the connection volume and improved the accuracy of the stereo matching cost volume. In order to expedite network cost aggregation, a simplified hourglass network was employed. The algorithm was tested against the Scene Flow dataset, CREStereo dataset, and KITTI dataset. The experimental results demonstrated that the algorithm had an overall region endpoint error of 0.45 pixels. In the first frame image, only 1.55% of regions were incorrectly predicted, and merely 6.87% of pixels exhibited prediction errors greater than 1 pixel. These results demonstrated the excellent performance of the proposed algorithm compared to other algorithms in terms of matching accuracy. Furthermore, it validated the effectiveness and advantages of the algorithm in matching problematic areas.

Key words: deep learning, stereo matching, structural information, local similarity pattern module, cost volume

中图分类号: