Journal of Graphics ›› 2023, Vol. 44 ›› Issue (5): 899-906.DOI: 10.11996/JG.j.2095-302X.2023050899
• Image Processing and Computer Vision • Previous Articles Next Articles
DANG Hong-she1(), XU Huai-biao1, ZHANG Xuan-de2
Received:
2023-04-28
Accepted:
2023-08-01
Online:
2023-10-31
Published:
2023-10-31
About author:
DANG Hong-she (1962-), professor, Ph.D. His main research interests cover industrial intelligent control (industrial robots), wireless sensor networks and digital image processing. E-mail:danghs@sust.edu.cn
Supported by:
CLC Number:
DANG Hong-she, XU Huai-biao, ZHANG Xuan-de. Deep learning stereo matching algorithm fusing structural information[J]. Journal of Graphics, 2023, 44(5): 899-906.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023050899
Layer | Kernal size,Channel | Output |
---|---|---|
Conv0_1 | 3×3,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv0_2 | 1×1,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv1_x | 3×3,32 1×1,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv2_x+SE | 3×3,64 1×1,64 | $\frac{1}{4}H\times \frac{1}{4}W\times 64$ |
Conv3_x+SE | 3×3,128 1×1,128 | $\frac{1}{4}H\times \frac{1}{4}W\times 128$ |
Conv4_x+SE | 3×3,128 1×1,128 | $\frac{1}{4}H\times \frac{1}{4}W\times 128$ |
Table 1 Feature extraction network structure parameters
Layer | Kernal size,Channel | Output |
---|---|---|
Conv0_1 | 3×3,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv0_2 | 1×1,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv1_x | 3×3,32 1×1,32 | $\frac{1}{2}H\times \frac{1}{2}W\times 32$ |
Conv2_x+SE | 3×3,64 1×1,64 | $\frac{1}{4}H\times \frac{1}{4}W\times 64$ |
Conv3_x+SE | 3×3,128 1×1,128 | $\frac{1}{4}H\times \frac{1}{4}W\times 128$ |
Conv4_x+SE | 3×3,128 1×1,128 | $\frac{1}{4}H\times \frac{1}{4}W\times 128$ |
Method | EPE (px) | D1-all (%) | >1 px (%) | Time (s) |
---|---|---|---|---|
1. ResCNN | 1.07 | 3.89 | 7.68 | 0.73 |
2. Improved ResCNN | 1.02 | 3.54 | 7.65 | 0.67 |
3. Improved ResCNN + LSP | 0.78 | 1.67 | 7.31 | 0.70 |
4. Improved ResCNN + ACV | 0.51 | 1.95 | 7.03 | 0.60 |
5. Improved ResCNN + LSP + ACV | 0.45 | 1.55 | 6.87 | 0.62 |
Table 2 Ablation experimental results of network modules
Method | EPE (px) | D1-all (%) | >1 px (%) | Time (s) |
---|---|---|---|---|
1. ResCNN | 1.07 | 3.89 | 7.68 | 0.73 |
2. Improved ResCNN | 1.02 | 3.54 | 7.65 | 0.67 |
3. Improved ResCNN + LSP | 0.78 | 1.67 | 7.31 | 0.70 |
4. Improved ResCNN + ACV | 0.51 | 1.95 | 7.03 | 0.60 |
5. Improved ResCNN + LSP + ACV | 0.45 | 1.55 | 6.87 | 0.62 |
Method | KITTI2015 | ||
---|---|---|---|
D1-bg | D1-fg | D1-all | |
PSMNet[ | 1.86 | 4.62 | 2.32 |
GwcNet[ | 1.71 | 3.93 | 2.11 |
AANet[ | 1.65 | 3.96 | 2.03 |
LEAStereo[ | 1.40 | 2.91 | 1.65 |
CREStereo[ | 1.45 | 2.86 | 1.69 |
ACVNet[ | 1.37 | 3.07 | 1.65 |
RAFT-Stereo[ | 1.58 | 3.05 | 1.82 |
ILANet | 1.38 | 2.98 | 1.61 |
Table 3 KITTI2015 comparison results (%)
Method | KITTI2015 | ||
---|---|---|---|
D1-bg | D1-fg | D1-all | |
PSMNet[ | 1.86 | 4.62 | 2.32 |
GwcNet[ | 1.71 | 3.93 | 2.11 |
AANet[ | 1.65 | 3.96 | 2.03 |
LEAStereo[ | 1.40 | 2.91 | 1.65 |
CREStereo[ | 1.45 | 2.86 | 1.69 |
ACVNet[ | 1.37 | 3.07 | 1.65 |
RAFT-Stereo[ | 1.58 | 3.05 | 1.82 |
ILANet | 1.38 | 2.98 | 1.61 |
Method | Out-Noc | Out-All | Avg-Noc (px) | Avg-All (px) |
---|---|---|---|---|
PSMNet[ | 1.49 | 1.89 | 0.5 | 0.6 |
GwcNet[ | 1.32 | 1.70 | 0.5 | 0.5 |
AANet[ | 1.91 | 2.42 | 0.5 | 0.6 |
LEAStereo[ | 1.13 | 1.45 | 0.5 | 0.5 |
CREStereo[ | 1.14 | 1.46 | 0.4 | 0.5 |
ACVNet[ | 1.13 | 1.47 | 0.4 | 0.5 |
RAFT-Stereo[ | 1.30 | 1.66 | 0.4 | 0.5 |
ILANet | 1.10 | 1.46 | 0.4 | 0.5 |
Table 4 KITTI2012 comparison results (%)
Method | Out-Noc | Out-All | Avg-Noc (px) | Avg-All (px) |
---|---|---|---|---|
PSMNet[ | 1.49 | 1.89 | 0.5 | 0.6 |
GwcNet[ | 1.32 | 1.70 | 0.5 | 0.5 |
AANet[ | 1.91 | 2.42 | 0.5 | 0.6 |
LEAStereo[ | 1.13 | 1.45 | 0.5 | 0.5 |
CREStereo[ | 1.14 | 1.46 | 0.4 | 0.5 |
ACVNet[ | 1.13 | 1.47 | 0.4 | 0.5 |
RAFT-Stereo[ | 1.30 | 1.66 | 0.4 | 0.5 |
ILANet | 1.10 | 1.46 | 0.4 | 0.5 |
Method | EPE (px) | D1-all (%) | >1 px (%) |
---|---|---|---|
GwcNet | 0.76 | 2.71 | 8.01 |
PSMNet | 1.09 | 3.89 | 7.88 |
ACVNet | 0.48 | 1.59 | 7.06 |
ILANet | 0.44 | 1.53 | 6.89 |
Table 5 Comparison of actual experimental results of different networks
Method | EPE (px) | D1-all (%) | >1 px (%) |
---|---|---|---|
GwcNet | 0.76 | 2.71 | 8.01 |
PSMNet | 1.09 | 3.89 | 7.88 |
ACVNet | 0.48 | 1.59 | 7.06 |
ILANet | 0.44 | 1.53 | 6.89 |
[1] |
李云龙, 卿粼波, 韩龙玫, 等. 视觉可供性研究综述[J]. 计算机工程与应用, 2022, 58(18): 1-15.
DOI |
LI Y L, QING L B, HAN L M, et al. Survey on visual affordance research[J]. Computer Engineering and Applications, 2022, 58(18): 1-15. (in Chinese)
DOI |
|
[2] |
陈炎, 杨丽丽, 王振鹏. 双目视觉的匹配算法综述[J]. 图学学报, 2020, 41(5): 702-708.
DOI |
CHEN Y, YANG L L, WANG Z P. Literature survey on stereo vision matching algorithms[J]. Journal of Graphics, 2020, 41(5): 702-708. (in Chinese)
DOI |
|
[3] |
尹晨阳, 职恒辉, 李慧斌. 基于深度学习的双目立体匹配方法综述[J]. 计算机工程, 2022, 48(10): 1-12.
DOI |
YIN C Y, ZHI H H, LI H B. Survey of binocular stereo-matching methods based on deep learning[J]. Computer Engineering, 2022, 48(10): 1-12. (in Chinese)
DOI |
|
[4] | MAYER N, ILG E, HÄUSSER P, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4040-4048. |
[5] | CHANG J R, CHEN Y S. Pyramid stereo matching network[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5410-5418. |
[6] | GUO X Y, YANG K, YANG W K, et al. Group-wise correlation stereo network[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 3268-3277. |
[7] | XU H F, ZHANG J Y. AANet: adaptive aggregation network for efficient stereo matching[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1956-1965. |
[8] | CHENG X L, ZHONG Y R, HARANDI M, et al. Hierarchical neural architecture search for deep stereo matching[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 22158-22169. |
[9] | LI J K, WANG P S, XIONG P F, et al. Practical stereo matching via cascaded recurrent network with adaptive correlation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 16242-16251. |
[10] | LIPSON L, TEED Z, DENG J. RAFT-stereo: multilevel recurrent field transforms for stereo matching[C]// 2021 International Conference on 3D Vision. New York: IEEE Press, 2022: 218-227. |
[11] |
LIU B Y, YU H M, LONG Y Q. Local similarity pattern and cost self-reassembling for deep stereo matching networks[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1647-1655.
DOI URL |
[12] | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]// Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 483-499. |
[13] | 李瞳, 马伟, 徐士彪, 等. 适应立体匹配任务的端到端深度网络[J]. 计算机研究与发展, 2020, 57(7): 1531-1538. |
LI T, MA W, XU S B, et al. Task-adaptive end-to-end networks for stereo matching[J]. Journal of Computer Research and Development, 2020, 57(7): 1531-1538. (in Chinese) | |
[14] | KOUTINI K, EGHBAL-ZADEH H, DORFER M, et al. The receptive field as a regularizer in deep convolutional neural networks for acoustic scene classification[C]// The 27th European Signal Processing Conference. New York: IEEE Press, 2019: 1-5. |
[15] | TAN M, LE Q V. MixConv: mixed depthwise convolutional kernels"[EB/OL]. [2023-01-18]. https://arxiv.org/abs/1907.09595. |
[16] | BULÒ S R, PORZI L, KONTSCHIEDER P. In-place activated BatchNorm for memory-optimized training of DNNs[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5639-5647. |
[17] | RIDNIK T, LAWEN H, NOY A, et al. TResNet: high performance GPU-dedicated architecture[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 1399-1408. |
[18] |
WANG Y N, GU M J, ZHU Y F, et al. Improvement of AD-census algorithm based on stereo vision[J]. Sensors, 2022, 22(18): 6933.
DOI URL |
[19] | XU G W, CHENG J D, GUO P, et al. Attention concatenation volume for accurate and efficient stereo matching[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12971-12980. |
[20] | KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 66-75. |
[21] |
GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset[J]. International Journal of Robotics Research, 2013, 32(11): 1231-1237.
DOI URL |
[1] |
ZHOU Rui-chuang, TIAN Jin, YAN Feng-ting, ZHU Tian-xiao, ZHANG Yu-jin.
Point cloud classification model incorporating external attention and graph convolution
[J]. Journal of Graphics, 2023, 44(6): 1162-1172.
|
[2] |
WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian.
Zero-shot text-driven avatar generation based on depth-conditioned diffusion model
[J]. Journal of Graphics, 2023, 44(6): 1218-1226.
|
[3] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[4] | ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji. Multi-fitting detection method for transmission lines based on implicit spatial knowledge fusion [J]. Journal of Graphics, 2023, 44(5): 918-927. |
[5] | YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting [J]. Journal of Graphics, 2023, 44(5): 955-965. |
[6] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
[7] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[8] | SHAO Jun-qi, QIAN Wen-hua, XU Qi-hao. Landscape image generation based on conditional residual generative adversarial network [J]. Journal of Graphics, 2023, 44(4): 710-717. |
[9] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[10] | GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints [J]. Journal of Graphics, 2023, 44(4): 739-746. |
[11] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[12] | WANG Jia-jing, WANG Chen, ZHU Yuan-yuan, WANG Xiao-mei. Graph element detection matching based on Republic of China banknotes [J]. Journal of Graphics, 2023, 44(3): 492-501. |
[13] | YANG Liu, WU Xiao-qun. 3D shape completion via deep learning: a method survey [J]. Journal of Graphics, 2023, 44(2): 201-215. |
[14] | ZENG Wu, ZHU Heng-liang, XING Shu-li, LIN Jiang-hong, MAO Guo-jun. Saliency detection-guided for image data augmentation [J]. Journal of Graphics, 2023, 44(2): 260-270. |
[15] | LUO Qi-ming, WU Hao, XIA Xin, YUAN Guo-wu. Prediction of damaged areas in Yunnan murals using Dual Dense U-Net [J]. Journal of Graphics, 2023, 44(2): 304-312. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||