图学学报 ›› 2023, Vol. 44 ›› Issue (4): 728-738.DOI: 10.11996/JG.j.2095-302X.2023040728
收稿日期:
2022-11-22
接受日期:
2023-03-27
出版日期:
2023-08-31
发布日期:
2023-08-16
通讯作者:
张亚萍(1979-),女,教授,博士。主要研究方向为计算机视觉、计算机图形学。E-mail:作者简介:
余伟群(1998-),男,硕士研究生。研究方向为计算机视觉、图像处理。E-mail:yudalao888@163.com
基金资助:
YU Wei-qun(), LIU Jia-tao, ZHANG Ya-ping(
)
Received:
2022-11-22
Accepted:
2023-03-27
Online:
2023-08-31
Published:
2023-08-16
Contact:
ZHANG Ya-ping (1979-), professor, Ph.D. Her main research interests cover computer vision, computer graphic. E-mail:About author:
YU Wei-qun (1998-), master student. His main research interests cover computer vision, image processing. E-mail:yudalao888@163.com
Supported by:
摘要:
随着深度神经网络的迅速发展,基于深度学习的单目深度估计研究集中于通过编码器-解码器结构回归深度,并取得了重大成果。针对在大多数传统方法中,解码过程通常重复简单的上采样操作,存在无法充分利用编码器的特性进行单目深度估计的问题,提出一种结合注意力机制的致密特征解码结构,以单张RGB图像作为输入,将编码器各层级的特征图融合到拉普拉斯金字塔分支中,加强特征融合的深度和广度;在解码器中引入注意力机制,进一步提高了深度估计精度;结合数据损失和结构相似性损失,提高模型训练的稳定性及收敛速度,降低模型的训练代价。实验结果表明,在KITTI数据集上与现有的模型相比,均方根误差相较于先进的算法LapDepth降低了4.8%,训练代价降低了36%,深度估计精度和收敛速度均有较显著地提升。
中图分类号:
余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计[J]. 图学学报, 2023, 44(4): 728-738.
YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion[J]. Journal of Graphics, 2023, 44(4): 728-738.
Encoder | |||||||
---|---|---|---|---|---|---|---|
Block | Filter | Stride | Channel | In | Out | Input | |
layer1 | 7×7 | 2 | 3/64 | S | S/2 | Input RGB | |
Maxpool | 3×3 | 2 | 64/64 | S/2 | S/4 | F(layer1) | |
layer2 | 3×3 | 2 | 64/256 | S/4 | S/4 | F(Maxpool) | |
layer3 | 3×3 | 2 | 256/512 | S/8 | S/8 | F(layer2) | |
layer4 | 3×3 | 2 | 512/1024 | S/16 | S/16 | F(layer3) | |
Decoder | |||||||
Block | Filter size | Up | Channel | In | Out | Input | Lev |
reduction | 1×1 | 1 | 1024/512 | S/16 | S/16 | F(layer4) | - |
ASPP | 3×3 | 1 | 512/512 | S/16 | S/16 | F(reduction) | - |
sa | 1×1 | 1 | 512/512 | S/16 | S/16 | F(ASPP) | - |
dec5 | 3×3 | 1 | 512/1 | S/16 | S/16 | F(sa) | 5th |
dec4up | 3×3 | 2 | 512/256 | S/16 | S/8 | F(sa) | 4th |
dec4ca | 1×1 | 2 | 1024/512 | S/16 | S/8 | F((UP(CA(layer4))©layer3)) | 4th |
dec4reduc | 1×1 | 1 | 768/252 | S/8 | S/8 | F(dec4ca©dec4up) | 4th |
dec4upr | 3×3 | 2 | 2/1 | S/16 | S/8 | F(UP(R5) © UP(CA(R5))) | 4th |
dec4bneck | 3×3 | 1 | 256/256 | S/8 | S/8 | F(dec4reduc© dec4upr ©L4) | 4th |
dec4 | 3×3 | 1 | 256/1 | S/8 | S/8 | F(dec4bneck) | 4th |
dec3up | 3×3 | 2 | 256/128 | S/8 | S/4 | F(dec4bneck) | 3rd |
dec3ca | 1×1 | 2 | 512/128 | S/8 | S/4 | F( (UP(CA(layer3))©layer2)) | 3rd |
dec3reduc | 1×1 | 1 | 384/124 | S/4 | S/4 | F(dec3ca©dec3up) | 3rd |
dec3upr | 3×3 | 2 | 2/1 | S/8 | S/4 | F(UP(R4) © UP(CA(R4))) | 3rd |
dec3bneck | 3×3 | 1 | 128/128 | S/4 | S/4 | F(dec3reduc© dec3upr ©L3) | 3rd |
dec3 | 3×3 | 1 | 128/1 | S/4 | S/4 | F(dec3bneck) | 3rd |
dec2up | 3×3 | 2 | 128/64 | S/4 | S/2 | F(dec3bneck) | 2nd |
dec2ca | 1×1 | 2 | 128/64 | S/4 | S/2 | F((UP(CA(layer2))©Maxpool)) | 2nd |
dec2reduc | 1×1 | 1 | 128/60 | S/2 | S/2 | F(dec2ca©dec2up) | 2nd |
dec2upr | 3×3 | 2 | 2/1 | S/4 | S/2 | F(UP(R3) © UP(CA(R3))) | 2nd |
dec2bneck | 3×3 | 1 | 64/64 | S/2 | S/2 | F(dec2reduc© dec2upr ©L2) | 2nd |
dec2 | 3×3 | 1 | 64/1 | S/2 | S/2 | F(dec2bneck) | 2nd |
dec1up | 3×3 | 2 | 64/60 | S/2 | S | F(dec2bneck) | 1st |
dec1upr | 3×3 | 2 | 2/1 | S/2 | S | F(UP(R2) © UP(CA(R2))) | 1st |
dec1bneck | 3×3 | 1 | 64/64 | S | S | F(dec1reduc© dec1upr ©L1) | 1st |
dec1 | 3×3 | 1 | 64/1 | S | S | F(dec1bneck) | 1st |
表1 网络详细结构
Table 1 Detailed structure of the network
Encoder | |||||||
---|---|---|---|---|---|---|---|
Block | Filter | Stride | Channel | In | Out | Input | |
layer1 | 7×7 | 2 | 3/64 | S | S/2 | Input RGB | |
Maxpool | 3×3 | 2 | 64/64 | S/2 | S/4 | F(layer1) | |
layer2 | 3×3 | 2 | 64/256 | S/4 | S/4 | F(Maxpool) | |
layer3 | 3×3 | 2 | 256/512 | S/8 | S/8 | F(layer2) | |
layer4 | 3×3 | 2 | 512/1024 | S/16 | S/16 | F(layer3) | |
Decoder | |||||||
Block | Filter size | Up | Channel | In | Out | Input | Lev |
reduction | 1×1 | 1 | 1024/512 | S/16 | S/16 | F(layer4) | - |
ASPP | 3×3 | 1 | 512/512 | S/16 | S/16 | F(reduction) | - |
sa | 1×1 | 1 | 512/512 | S/16 | S/16 | F(ASPP) | - |
dec5 | 3×3 | 1 | 512/1 | S/16 | S/16 | F(sa) | 5th |
dec4up | 3×3 | 2 | 512/256 | S/16 | S/8 | F(sa) | 4th |
dec4ca | 1×1 | 2 | 1024/512 | S/16 | S/8 | F((UP(CA(layer4))©layer3)) | 4th |
dec4reduc | 1×1 | 1 | 768/252 | S/8 | S/8 | F(dec4ca©dec4up) | 4th |
dec4upr | 3×3 | 2 | 2/1 | S/16 | S/8 | F(UP(R5) © UP(CA(R5))) | 4th |
dec4bneck | 3×3 | 1 | 256/256 | S/8 | S/8 | F(dec4reduc© dec4upr ©L4) | 4th |
dec4 | 3×3 | 1 | 256/1 | S/8 | S/8 | F(dec4bneck) | 4th |
dec3up | 3×3 | 2 | 256/128 | S/8 | S/4 | F(dec4bneck) | 3rd |
dec3ca | 1×1 | 2 | 512/128 | S/8 | S/4 | F( (UP(CA(layer3))©layer2)) | 3rd |
dec3reduc | 1×1 | 1 | 384/124 | S/4 | S/4 | F(dec3ca©dec3up) | 3rd |
dec3upr | 3×3 | 2 | 2/1 | S/8 | S/4 | F(UP(R4) © UP(CA(R4))) | 3rd |
dec3bneck | 3×3 | 1 | 128/128 | S/4 | S/4 | F(dec3reduc© dec3upr ©L3) | 3rd |
dec3 | 3×3 | 1 | 128/1 | S/4 | S/4 | F(dec3bneck) | 3rd |
dec2up | 3×3 | 2 | 128/64 | S/4 | S/2 | F(dec3bneck) | 2nd |
dec2ca | 1×1 | 2 | 128/64 | S/4 | S/2 | F((UP(CA(layer2))©Maxpool)) | 2nd |
dec2reduc | 1×1 | 1 | 128/60 | S/2 | S/2 | F(dec2ca©dec2up) | 2nd |
dec2upr | 3×3 | 2 | 2/1 | S/4 | S/2 | F(UP(R3) © UP(CA(R3))) | 2nd |
dec2bneck | 3×3 | 1 | 64/64 | S/2 | S/2 | F(dec2reduc© dec2upr ©L2) | 2nd |
dec2 | 3×3 | 1 | 64/1 | S/2 | S/2 | F(dec2bneck) | 2nd |
dec1up | 3×3 | 2 | 64/60 | S/2 | S | F(dec2bneck) | 1st |
dec1upr | 3×3 | 2 | 2/1 | S/2 | S | F(UP(R2) © UP(CA(R2))) | 1st |
dec1bneck | 3×3 | 1 | 64/64 | S | S | F(dec1reduc© dec1upr ©L1) | 1st |
dec1 | 3×3 | 1 | 64/1 | S | S | F(dec1bneck) | 1st |
图6 多种方法的预测深度图结果对比((a)输入的RGB图像;(b)真实深度图;(c)文献[5];(d)文献[18];(e)文献[4];(f)本文方法)
Fig. 6 Comparison of predicted depth maps for multiple methods ((a) The input RGB image; (b) The ground truth; (c) Literature [5]; (d) Literature [18]; (e) Literature [4]; (f) Ours)
图7 多cap深度图对比((a)输入RGB图像;(b)文献[4] (50 m);(c)文献[4] (80 m);(d)本文Cap为50 m;(e)本文Cap为80 m)
Fig. 7 Comparison of multi-cap depth maps ((a) Input rgb image; (b) Literature [4] (50 m); (c) Literature [4] (80 m); (d) Depth map with Cap of 50 m in this paper; (e) Depth map with Cap of 80 m in this paper)
Method | Higher value is better | Lower value is better | |||||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | Total_iter (M) | ||
Cap=80 m | 文献[ | 0.916 | 0.980 | 0.994 | 0.085 | 0.584 | 3.938 | 0.135 | - |
文献[ | 0.932 | 0.984 | 0.994 | 0.072 | 0.307 | 2.727 | 0.120 | - | |
文献[ | 0.950 | 0.993 | 0.999 | 0.064 | 0.254 | 2.815 | 0.100 | - | |
文献[ | 0.962 | 0.994 | 0.999 | 0.059 | 0.212 | 2.446 | 0.091 | 0.734 | |
Ours | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | 0.470 | |
Cap=50 m | 文献[ | 0.861 | 0.949 | 0.976 | 0.114 | 0.898 | 4.935 | 0.206 | - |
文献[ | 0.936 | 0.985 | 0.995 | 0.071 | 0.268 | 2.271 | 0.116 | - | |
文献[ | 0.959 | 0.994 | 0.999 | 0.060 | 0.182 | 2.005 | 0.092 | - | |
文献[ | 0.967 | 0.995 | 0.999 | 0.056 | 0.161 | 1.830 | 0.086 | 0.734 | |
Ours | 0.967 | 0.995 | 0.999 | 0.056 | 0.156 | 1.768 | 0.084 | 0.470 |
表2 与其他模型预测结果的定量比较
Table 2 Quantitative comparison of prediction results with other models
Method | Higher value is better | Lower value is better | |||||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | Total_iter (M) | ||
Cap=80 m | 文献[ | 0.916 | 0.980 | 0.994 | 0.085 | 0.584 | 3.938 | 0.135 | - |
文献[ | 0.932 | 0.984 | 0.994 | 0.072 | 0.307 | 2.727 | 0.120 | - | |
文献[ | 0.950 | 0.993 | 0.999 | 0.064 | 0.254 | 2.815 | 0.100 | - | |
文献[ | 0.962 | 0.994 | 0.999 | 0.059 | 0.212 | 2.446 | 0.091 | 0.734 | |
Ours | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | 0.470 | |
Cap=50 m | 文献[ | 0.861 | 0.949 | 0.976 | 0.114 | 0.898 | 4.935 | 0.206 | - |
文献[ | 0.936 | 0.985 | 0.995 | 0.071 | 0.268 | 2.271 | 0.116 | - | |
文献[ | 0.959 | 0.994 | 0.999 | 0.060 | 0.182 | 2.005 | 0.092 | - | |
文献[ | 0.967 | 0.995 | 0.999 | 0.056 | 0.161 | 1.830 | 0.086 | 0.734 | |
Ours | 0.967 | 0.995 | 0.999 | 0.056 | 0.156 | 1.768 | 0.084 | 0.470 |
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
InceptionV3[ | 18.13 | 30.25 | 0.936 | 0.990 | 0.997 | 0.074 | 0.302 | 2.922 | 0.114 |
Resnet101[ | 44.11 | 98.60 | 0.960 | 0.993 | 0.999 | 0.063 | 0.203 | 2.424 | 0.095 |
Vgg19[ | 14.75 | 104.30 | 0.959 | 0.994 | 0.999 | 0.060 | 0.202 | 2.361 | 0.092 |
DenseNet161[ | 34.19 | 104.59 | 0.960 | 0.995 | 0.999 | 0.059 | 0.202 | 2.374 | 0.090 |
ResNext101[ | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
表3 各种编码器对比实验结果(Cap=80 m)
Table 3 Comparison of experimental results of various encoders (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
InceptionV3[ | 18.13 | 30.25 | 0.936 | 0.990 | 0.997 | 0.074 | 0.302 | 2.922 | 0.114 |
Resnet101[ | 44.11 | 98.60 | 0.960 | 0.993 | 0.999 | 0.063 | 0.203 | 2.424 | 0.095 |
Vgg19[ | 14.75 | 104.30 | 0.959 | 0.994 | 0.999 | 0.060 | 0.202 | 2.361 | 0.092 |
DenseNet161[ | 34.19 | 104.59 | 0.960 | 0.995 | 0.999 | 0.059 | 0.202 | 2.374 | 0.090 |
ResNext101[ | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
图8 CA注意力机制位置的消融对比实验((a) Li+1输入CA后上采样与Li融合;(b)上采样的Li+1输入CA后与Li融合;(c) Li输入CA后与上采样的Li+1融合;(d)上采样的Li+1与Li融合后输入CA;(e)上采样的Li+1与Li分别输入CA后融合)
Fig. 8 Ablation study on the location of CA attention mechanism ((a) Upsampling after Li+1 input to CA, and fusing it with Li; (b) Upsampling Li+1 and inputting it to CA, then fusing it with Li; (c) Upsampling Li+1 and fusing it with Li input to CA; (d) Upsampling Li+1 and fusing it with Li, then feeding them to CA; (e) Upsampling Li+1 and inputting it to CA, then fusing it with Li input to CA)
图9 在深度图和深度残差中应用CA注意力的消融对比实验((a) CA注意力应用于深度图Di;(b) CA注意力应用于深度残差Ri)
Fig. 9 Ablation study on CA attention in depth maps and depth residuals ((a) CA attention applied to the depth map Di; (b) CA attention applied to depth residual Ri)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
73.14 | 127.40 | 0.960 | 0.994 | 0.999 | 0.060 | 0.207 | 2.421 | 0.091 | |
73.14 | 127.40 | 0.961 | 0.994 | 0.999 | 0.059 | 0.198 | 2.371 | 0.091 | |
74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | |
74.40 | 134.79 | 0.963 | 0.994 | 0.999 | 0.058 | 0.202 | 2.334 | 0.089 | |
74.21 | 134.76 | 0.960 | 0.995 | 0.999 | 0.059 | 0.200 | 2.341 | 0.090 | |
74.21 | 134.76 | 0.963 | 0.995 | 0.999 | 0.059 | 0.207 | 2.347 | 0.089 | |
74.40 | 134.80 | 0.962 | 0.995 | 0.999 | 0.059 | 0.205 | 2.343 | 0.090 | |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
表4 拉普拉斯金字塔注意力机制位置实验对比(Cap=80 m)
Table 4 Comparison of experimental results on the location of attentional mechanism in the Laplacian pyramid (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
73.14 | 127.40 | 0.960 | 0.994 | 0.999 | 0.060 | 0.207 | 2.421 | 0.091 | |
73.14 | 127.40 | 0.961 | 0.994 | 0.999 | 0.059 | 0.198 | 2.371 | 0.091 | |
74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | |
74.40 | 134.79 | 0.963 | 0.994 | 0.999 | 0.058 | 0.202 | 2.334 | 0.089 | |
74.21 | 134.76 | 0.960 | 0.995 | 0.999 | 0.059 | 0.200 | 2.341 | 0.090 | |
74.21 | 134.76 | 0.963 | 0.995 | 0.999 | 0.059 | 0.207 | 2.347 | 0.089 | |
74.40 | 134.80 | 0.962 | 0.995 | 0.999 | 0.059 | 0.205 | 2.343 | 0.090 | |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
SE[ | 76.02 | 130.44 | 0.964 | 0.995 | 0.999 | 0.058 | 0.198 | 2.372 | 0.089 |
SE+CA | 77.09 | 137.81 | 0.962 | 0.995 | 0.999 | 0.059 | 0.202 | 2.376 | 0.090 |
CA[ | 74.45 | 134.78 | 0.961 | 0.994 | 0.999 | 0.059 | 0.209 | 2.353 | 0.090 |
SA[ | 73.14 | 127.39 | 0.961 | 0.993 | 0.999 | 0.060 | 0.216 | 2.368 | 0.090 |
Triplet[ | 73.13 | 127.40 | 0.961 | 0.994 | 0.999 | 0.061 | 0.210 | 2.393 | 0.092 |
Triplet+CA | 74.40 | 134.78 | 0.962 | 0.994 | 0.999 | 0.060 | 0.215 | 2.360 | 0.090 |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
表5 注意机制种类消融实验结果对比(Cap=80 m)
Table 5 Comparison of experimental results of ablation on attention mechanism types (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
SE[ | 76.02 | 130.44 | 0.964 | 0.995 | 0.999 | 0.058 | 0.198 | 2.372 | 0.089 |
SE+CA | 77.09 | 137.81 | 0.962 | 0.995 | 0.999 | 0.059 | 0.202 | 2.376 | 0.090 |
CA[ | 74.45 | 134.78 | 0.961 | 0.994 | 0.999 | 0.059 | 0.209 | 2.353 | 0.090 |
SA[ | 73.14 | 127.39 | 0.961 | 0.993 | 0.999 | 0.060 | 0.216 | 2.368 | 0.090 |
Triplet[ | 73.13 | 127.40 | 0.961 | 0.994 | 0.999 | 0.061 | 0.210 | 2.393 | 0.092 |
Triplet+CA | 74.40 | 134.78 | 0.962 | 0.994 | 0.999 | 0.060 | 0.215 | 2.360 | 0.090 |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
[1] | GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 270-279. |
[2] | 蒲正东, 陈姝, 邹北骥, 等. 基于高分辨率网络的自监督单目深度估计方法[J]. 计算机辅助设计与图形学学报, 2023, 35(1): 118-127. |
PU Z D, CHEN S, ZOU B J, et al. A self-supervised monocular depth estimation method based on high resolution convolutional neural network[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(1): 118-127 (in Chinese). | |
[3] | 赵霖, 赵滟, 靳捷. 基于局部注意力和位姿迭代优化的自监督单目深度估计算法[J]. 信号处理, 2022, 38(5): 1088-1097. |
ZHAO L, ZHAO Y, JIN J. A self-supervised monocular depth estimation algorithm based on local attention and iterative pose refinement[J]. Journal of Signal Processing, 2022, 38(5): 1088-1097 (in Chinese). | |
[4] |
SONG M, LIM S, KIM W. Monocular depth estimation using Laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393.
DOI URL |
[5] | FU H, GONG M M, WANG C H, et al. Deep ordinal regression network for monocular depth estimation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2002-2011. |
[6] | YANG M K, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 3684-3692. |
[7] | 张涛, 张晓利, 任彦. Transformer与CNN融合的单目图像深度估计[J]. 哈尔滨理工大学学报, 2022, 27(6): 88-94. |
ZHANG T, ZHANG X L, REN Y. Monocular image depth estimation based on the fusion of transformer and CNN[J]. Journal of Harbin University of Science and Technology, 2022, 27(6): 88-94 (in Chinese). | |
[8] | RANFTL R, BOCHKOVSKIY A, KOLTUN V. Vision transformers for dense prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 12179-12188. |
[9] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19. |
[10] | HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13713-13722. |
[11] | ZHANG Q L, YANG Y B. SA-net: shuffle attention for deep convolutional neural networks[C]// 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2021: 2235-2239. |
[12] | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1492-1500. |
[13] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269. |
[14] |
RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
DOI URL |
[15] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[16] | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2818-2826. |
[17] | EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1406.2283. |
[18] | LEE J H, HAN M K, KO D W, et al. From big to small: multi-scale local planar guidance for monocular depth estimation[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1907.10326. |
[19] | UHRIG J, SCHNEIDER N, SCHNEIDER L, et al. Sparsity invariant CNNs[C]// 2017 International Conference on 3D Vision (3DV). New York: IEEE Press, 2017: 11-20. |
[20] | PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1912.01703. |
[21] | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1711.05101. |
[22] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1409.1556. |
[23] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4700-4708 |
[24] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[25] | MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]// 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 3139-3148. |
[26] | WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]// The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers. New York: IEEE Press, 2004: 1398-1402. |
[1] | 杨陈成 , 董秀成 , 侯兵 , 张党成 , 向贤明 , 冯琪茗 .
基于参考的Transformer纹理迁移深度图像超分辨率重建
[J]. 图学学报, 2023, 44(5): 861-867. |
[2] | 党宏社 , 许怀彪 , 张选德 .
融合结构信息的深度学习立体匹配算法
[J]. 图学学报, 2023, 44(5): 899-906. |
[3] | 翟永杰, 郭聪彬, 王乾铭, 赵宽, 白云山, 张冀 .
基于隐含空间知识融合的输电线路多金具检测方法
[J]. 图学学报, 2023, 44(5): 918-927. |
[4] | 杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元.
一种面向图像修复的局部优化生成模型
[J]. 图学学报, 2023, 44(5): 955-965. |
[5] | 宋焕生, 文雅, 孙士杰, 宋翔宇, 张朝阳, 李旭 .
基于改进教师学生网络的隧道火灾检测
[J]. 图学学报, 2023, 44(5): 978-987. |
[6] | 毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639. |
[7] | 李利霞, 王鑫, 王军, 张又元. 基于特征融合与注意力机制的无人机图像小目标检测算法[J]. 图学学报, 2023, 44(4): 658-666. |
[8] | 曹义亲, 周一纬, 徐露. 基于E-YOLOX的实时金属表面缺陷检测算法[J]. 图学学报, 2023, 44(4): 677-690. |
[9] | 李鑫, 普园媛, 赵征鹏, 徐丹, 钱文华. 内容语义和风格特征匹配一致的艺术风格迁移[J]. 图学学报, 2023, 44(4): 699-709. |
[10] | 邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成[J]. 图学学报, 2023, 44(4): 710-717. |
[11] | 郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配[J]. 图学学报, 2023, 44(4): 739-746. |
[12] | 胡欣, 周运强, 肖剑, 杨杰. 基于改进YOLOv5的螺纹钢表面缺陷检测[J]. 图学学报, 2023, 44(3): 427-437. |
[13] | 毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455. |
[14] | 郝鹏飞, 刘立群, 顾任远. YOLO-RD-Apple果园异源图像遮挡果实检测模型[J]. 图学学报, 2023, 44(3): 456-464. |
[15] | 罗文宇, 傅明月. 基于YoloX-ECA模型的非法野泳野钓现场监测技术[J]. 图学学报, 2023, 44(3): 465-472. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||