Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (5): 899-906.DOI: 10.11996/JG.j.2095-302X.2023050899

• Image Processing and Computer Vision • Previous Articles     Next Articles

Deep learning stereo matching algorithm fusing structural information

DANG Hong-she1(), XU Huai-biao1, ZHANG Xuan-de2   

  1. 1. School of Electrical and Control Engineering, Shaanxi University of Science & Technology, Xi’an Shaanxi 710021, China
    2. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an Shaanxi 710021, China
  • Received:2023-04-28 Accepted:2023-08-01 Online:2023-10-31 Published:2023-10-31
  • About author:DANG Hong-she (1962-), professor, Ph.D. His main research interests cover industrial intelligent control (industrial robots), wireless sensor networks and digital image processing. E-mail:danghs@sust.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61871206);Natural Science Foundation Project of Shaanxi Provincial Department of Science and Technology(2020JM-509)


To address the limitations of existing stereo matching algorithms in both edge regions and regions of discontinuous disparity, a deep learning stereo matching algorithm fusing structural information was proposed. By limiting the convolution kernel size and replacing the BatchNorm layer and activation function layer with the Inplace-ABN layer, the efficiency of convolution to extract image features was enhanced. The local similarity pattern module combined with an attention mechanism was employed to extract image structural features, and the features extracted by convolution were fused to enrich image feature information. The correlation volume and connection volume of the output feature were calculated. By utilizing the correlation volume to generate attention weights, the algorithm filtered out the redundant information of the connection volume and improved the accuracy of the stereo matching cost volume. In order to expedite network cost aggregation, a simplified hourglass network was employed. The algorithm was tested against the Scene Flow dataset, CREStereo dataset, and KITTI dataset. The experimental results demonstrated that the algorithm had an overall region endpoint error of 0.45 pixels. In the first frame image, only 1.55% of regions were incorrectly predicted, and merely 6.87% of pixels exhibited prediction errors greater than 1 pixel. These results demonstrated the excellent performance of the proposed algorithm compared to other algorithms in terms of matching accuracy. Furthermore, it validated the effectiveness and advantages of the algorithm in matching problematic areas.

Key words: deep learning, stereo matching, structural information, local similarity pattern module, cost volume

CLC Number: