Journal of Graphics ›› 2023, Vol. 44 ›› Issue (4): 728-738.DOI: 10.11996/JG.j.2095-302X.2023040728
Previous Articles Next Articles
YU Wei-qun(), LIU Jia-tao, ZHANG Ya-ping(
)
Received:
2022-11-22
Accepted:
2023-03-27
Online:
2023-08-31
Published:
2023-08-16
Contact:
ZHANG Ya-ping (1979-), professor, Ph.D. Her main research interests cover computer vision, computer graphic. E-mail:About author:
YU Wei-qun (1998-), master student. His main research interests cover computer vision, image processing. E-mail:yudalao888@163.com
Supported by:
CLC Number:
YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion[J]. Journal of Graphics, 2023, 44(4): 728-738.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023040728
Encoder | |||||||
---|---|---|---|---|---|---|---|
Block | Filter | Stride | Channel | In | Out | Input | |
layer1 | 7×7 | 2 | 3/64 | S | S/2 | Input RGB | |
Maxpool | 3×3 | 2 | 64/64 | S/2 | S/4 | F(layer1) | |
layer2 | 3×3 | 2 | 64/256 | S/4 | S/4 | F(Maxpool) | |
layer3 | 3×3 | 2 | 256/512 | S/8 | S/8 | F(layer2) | |
layer4 | 3×3 | 2 | 512/1024 | S/16 | S/16 | F(layer3) | |
Decoder | |||||||
Block | Filter size | Up | Channel | In | Out | Input | Lev |
reduction | 1×1 | 1 | 1024/512 | S/16 | S/16 | F(layer4) | - |
ASPP | 3×3 | 1 | 512/512 | S/16 | S/16 | F(reduction) | - |
sa | 1×1 | 1 | 512/512 | S/16 | S/16 | F(ASPP) | - |
dec5 | 3×3 | 1 | 512/1 | S/16 | S/16 | F(sa) | 5th |
dec4up | 3×3 | 2 | 512/256 | S/16 | S/8 | F(sa) | 4th |
dec4ca | 1×1 | 2 | 1024/512 | S/16 | S/8 | F((UP(CA(layer4))©layer3)) | 4th |
dec4reduc | 1×1 | 1 | 768/252 | S/8 | S/8 | F(dec4ca©dec4up) | 4th |
dec4upr | 3×3 | 2 | 2/1 | S/16 | S/8 | F(UP(R5) © UP(CA(R5))) | 4th |
dec4bneck | 3×3 | 1 | 256/256 | S/8 | S/8 | F(dec4reduc© dec4upr ©L4) | 4th |
dec4 | 3×3 | 1 | 256/1 | S/8 | S/8 | F(dec4bneck) | 4th |
dec3up | 3×3 | 2 | 256/128 | S/8 | S/4 | F(dec4bneck) | 3rd |
dec3ca | 1×1 | 2 | 512/128 | S/8 | S/4 | F( (UP(CA(layer3))©layer2)) | 3rd |
dec3reduc | 1×1 | 1 | 384/124 | S/4 | S/4 | F(dec3ca©dec3up) | 3rd |
dec3upr | 3×3 | 2 | 2/1 | S/8 | S/4 | F(UP(R4) © UP(CA(R4))) | 3rd |
dec3bneck | 3×3 | 1 | 128/128 | S/4 | S/4 | F(dec3reduc© dec3upr ©L3) | 3rd |
dec3 | 3×3 | 1 | 128/1 | S/4 | S/4 | F(dec3bneck) | 3rd |
dec2up | 3×3 | 2 | 128/64 | S/4 | S/2 | F(dec3bneck) | 2nd |
dec2ca | 1×1 | 2 | 128/64 | S/4 | S/2 | F((UP(CA(layer2))©Maxpool)) | 2nd |
dec2reduc | 1×1 | 1 | 128/60 | S/2 | S/2 | F(dec2ca©dec2up) | 2nd |
dec2upr | 3×3 | 2 | 2/1 | S/4 | S/2 | F(UP(R3) © UP(CA(R3))) | 2nd |
dec2bneck | 3×3 | 1 | 64/64 | S/2 | S/2 | F(dec2reduc© dec2upr ©L2) | 2nd |
dec2 | 3×3 | 1 | 64/1 | S/2 | S/2 | F(dec2bneck) | 2nd |
dec1up | 3×3 | 2 | 64/60 | S/2 | S | F(dec2bneck) | 1st |
dec1upr | 3×3 | 2 | 2/1 | S/2 | S | F(UP(R2) © UP(CA(R2))) | 1st |
dec1bneck | 3×3 | 1 | 64/64 | S | S | F(dec1reduc© dec1upr ©L1) | 1st |
dec1 | 3×3 | 1 | 64/1 | S | S | F(dec1bneck) | 1st |
Table 1 Detailed structure of the network
Encoder | |||||||
---|---|---|---|---|---|---|---|
Block | Filter | Stride | Channel | In | Out | Input | |
layer1 | 7×7 | 2 | 3/64 | S | S/2 | Input RGB | |
Maxpool | 3×3 | 2 | 64/64 | S/2 | S/4 | F(layer1) | |
layer2 | 3×3 | 2 | 64/256 | S/4 | S/4 | F(Maxpool) | |
layer3 | 3×3 | 2 | 256/512 | S/8 | S/8 | F(layer2) | |
layer4 | 3×3 | 2 | 512/1024 | S/16 | S/16 | F(layer3) | |
Decoder | |||||||
Block | Filter size | Up | Channel | In | Out | Input | Lev |
reduction | 1×1 | 1 | 1024/512 | S/16 | S/16 | F(layer4) | - |
ASPP | 3×3 | 1 | 512/512 | S/16 | S/16 | F(reduction) | - |
sa | 1×1 | 1 | 512/512 | S/16 | S/16 | F(ASPP) | - |
dec5 | 3×3 | 1 | 512/1 | S/16 | S/16 | F(sa) | 5th |
dec4up | 3×3 | 2 | 512/256 | S/16 | S/8 | F(sa) | 4th |
dec4ca | 1×1 | 2 | 1024/512 | S/16 | S/8 | F((UP(CA(layer4))©layer3)) | 4th |
dec4reduc | 1×1 | 1 | 768/252 | S/8 | S/8 | F(dec4ca©dec4up) | 4th |
dec4upr | 3×3 | 2 | 2/1 | S/16 | S/8 | F(UP(R5) © UP(CA(R5))) | 4th |
dec4bneck | 3×3 | 1 | 256/256 | S/8 | S/8 | F(dec4reduc© dec4upr ©L4) | 4th |
dec4 | 3×3 | 1 | 256/1 | S/8 | S/8 | F(dec4bneck) | 4th |
dec3up | 3×3 | 2 | 256/128 | S/8 | S/4 | F(dec4bneck) | 3rd |
dec3ca | 1×1 | 2 | 512/128 | S/8 | S/4 | F( (UP(CA(layer3))©layer2)) | 3rd |
dec3reduc | 1×1 | 1 | 384/124 | S/4 | S/4 | F(dec3ca©dec3up) | 3rd |
dec3upr | 3×3 | 2 | 2/1 | S/8 | S/4 | F(UP(R4) © UP(CA(R4))) | 3rd |
dec3bneck | 3×3 | 1 | 128/128 | S/4 | S/4 | F(dec3reduc© dec3upr ©L3) | 3rd |
dec3 | 3×3 | 1 | 128/1 | S/4 | S/4 | F(dec3bneck) | 3rd |
dec2up | 3×3 | 2 | 128/64 | S/4 | S/2 | F(dec3bneck) | 2nd |
dec2ca | 1×1 | 2 | 128/64 | S/4 | S/2 | F((UP(CA(layer2))©Maxpool)) | 2nd |
dec2reduc | 1×1 | 1 | 128/60 | S/2 | S/2 | F(dec2ca©dec2up) | 2nd |
dec2upr | 3×3 | 2 | 2/1 | S/4 | S/2 | F(UP(R3) © UP(CA(R3))) | 2nd |
dec2bneck | 3×3 | 1 | 64/64 | S/2 | S/2 | F(dec2reduc© dec2upr ©L2) | 2nd |
dec2 | 3×3 | 1 | 64/1 | S/2 | S/2 | F(dec2bneck) | 2nd |
dec1up | 3×3 | 2 | 64/60 | S/2 | S | F(dec2bneck) | 1st |
dec1upr | 3×3 | 2 | 2/1 | S/2 | S | F(UP(R2) © UP(CA(R2))) | 1st |
dec1bneck | 3×3 | 1 | 64/64 | S | S | F(dec1reduc© dec1upr ©L1) | 1st |
dec1 | 3×3 | 1 | 64/1 | S | S | F(dec1bneck) | 1st |
Fig. 6 Comparison of predicted depth maps for multiple methods ((a) The input RGB image; (b) The ground truth; (c) Literature [5]; (d) Literature [18]; (e) Literature [4]; (f) Ours)
Fig. 7 Comparison of multi-cap depth maps ((a) Input rgb image; (b) Literature [4] (50 m); (c) Literature [4] (80 m); (d) Depth map with Cap of 50 m in this paper; (e) Depth map with Cap of 80 m in this paper)
Method | Higher value is better | Lower value is better | |||||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | Total_iter (M) | ||
Cap=80 m | 文献[ | 0.916 | 0.980 | 0.994 | 0.085 | 0.584 | 3.938 | 0.135 | - |
文献[ | 0.932 | 0.984 | 0.994 | 0.072 | 0.307 | 2.727 | 0.120 | - | |
文献[ | 0.950 | 0.993 | 0.999 | 0.064 | 0.254 | 2.815 | 0.100 | - | |
文献[ | 0.962 | 0.994 | 0.999 | 0.059 | 0.212 | 2.446 | 0.091 | 0.734 | |
Ours | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | 0.470 | |
Cap=50 m | 文献[ | 0.861 | 0.949 | 0.976 | 0.114 | 0.898 | 4.935 | 0.206 | - |
文献[ | 0.936 | 0.985 | 0.995 | 0.071 | 0.268 | 2.271 | 0.116 | - | |
文献[ | 0.959 | 0.994 | 0.999 | 0.060 | 0.182 | 2.005 | 0.092 | - | |
文献[ | 0.967 | 0.995 | 0.999 | 0.056 | 0.161 | 1.830 | 0.086 | 0.734 | |
Ours | 0.967 | 0.995 | 0.999 | 0.056 | 0.156 | 1.768 | 0.084 | 0.470 |
Table 2 Quantitative comparison of prediction results with other models
Method | Higher value is better | Lower value is better | |||||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | Total_iter (M) | ||
Cap=80 m | 文献[ | 0.916 | 0.980 | 0.994 | 0.085 | 0.584 | 3.938 | 0.135 | - |
文献[ | 0.932 | 0.984 | 0.994 | 0.072 | 0.307 | 2.727 | 0.120 | - | |
文献[ | 0.950 | 0.993 | 0.999 | 0.064 | 0.254 | 2.815 | 0.100 | - | |
文献[ | 0.962 | 0.994 | 0.999 | 0.059 | 0.212 | 2.446 | 0.091 | 0.734 | |
Ours | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | 0.470 | |
Cap=50 m | 文献[ | 0.861 | 0.949 | 0.976 | 0.114 | 0.898 | 4.935 | 0.206 | - |
文献[ | 0.936 | 0.985 | 0.995 | 0.071 | 0.268 | 2.271 | 0.116 | - | |
文献[ | 0.959 | 0.994 | 0.999 | 0.060 | 0.182 | 2.005 | 0.092 | - | |
文献[ | 0.967 | 0.995 | 0.999 | 0.056 | 0.161 | 1.830 | 0.086 | 0.734 | |
Ours | 0.967 | 0.995 | 0.999 | 0.056 | 0.156 | 1.768 | 0.084 | 0.470 |
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
InceptionV3[ | 18.13 | 30.25 | 0.936 | 0.990 | 0.997 | 0.074 | 0.302 | 2.922 | 0.114 |
Resnet101[ | 44.11 | 98.60 | 0.960 | 0.993 | 0.999 | 0.063 | 0.203 | 2.424 | 0.095 |
Vgg19[ | 14.75 | 104.30 | 0.959 | 0.994 | 0.999 | 0.060 | 0.202 | 2.361 | 0.092 |
DenseNet161[ | 34.19 | 104.59 | 0.960 | 0.995 | 0.999 | 0.059 | 0.202 | 2.374 | 0.090 |
ResNext101[ | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Table 3 Comparison of experimental results of various encoders (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
InceptionV3[ | 18.13 | 30.25 | 0.936 | 0.990 | 0.997 | 0.074 | 0.302 | 2.922 | 0.114 |
Resnet101[ | 44.11 | 98.60 | 0.960 | 0.993 | 0.999 | 0.063 | 0.203 | 2.424 | 0.095 |
Vgg19[ | 14.75 | 104.30 | 0.959 | 0.994 | 0.999 | 0.060 | 0.202 | 2.361 | 0.092 |
DenseNet161[ | 34.19 | 104.59 | 0.960 | 0.995 | 0.999 | 0.059 | 0.202 | 2.374 | 0.090 |
ResNext101[ | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Fig. 8 Ablation study on the location of CA attention mechanism ((a) Upsampling after Li+1 input to CA, and fusing it with Li; (b) Upsampling Li+1 and inputting it to CA, then fusing it with Li; (c) Upsampling Li+1 and fusing it with Li input to CA; (d) Upsampling Li+1 and fusing it with Li, then feeding them to CA; (e) Upsampling Li+1 and inputting it to CA, then fusing it with Li input to CA)
Fig. 9 Ablation study on CA attention in depth maps and depth residuals ((a) CA attention applied to the depth map Di; (b) CA attention applied to depth residual Ri)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
73.14 | 127.40 | 0.960 | 0.994 | 0.999 | 0.060 | 0.207 | 2.421 | 0.091 | |
73.14 | 127.40 | 0.961 | 0.994 | 0.999 | 0.059 | 0.198 | 2.371 | 0.091 | |
74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | |
74.40 | 134.79 | 0.963 | 0.994 | 0.999 | 0.058 | 0.202 | 2.334 | 0.089 | |
74.21 | 134.76 | 0.960 | 0.995 | 0.999 | 0.059 | 0.200 | 2.341 | 0.090 | |
74.21 | 134.76 | 0.963 | 0.995 | 0.999 | 0.059 | 0.207 | 2.347 | 0.089 | |
74.40 | 134.80 | 0.962 | 0.995 | 0.999 | 0.059 | 0.205 | 2.343 | 0.090 | |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Table 4 Comparison of experimental results on the location of attentional mechanism in the Laplacian pyramid (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
73.14 | 127.40 | 0.960 | 0.994 | 0.999 | 0.060 | 0.207 | 2.421 | 0.091 | |
73.14 | 127.40 | 0.961 | 0.994 | 0.999 | 0.059 | 0.198 | 2.371 | 0.091 | |
74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 | |
74.40 | 134.79 | 0.963 | 0.994 | 0.999 | 0.058 | 0.202 | 2.334 | 0.089 | |
74.21 | 134.76 | 0.960 | 0.995 | 0.999 | 0.059 | 0.200 | 2.341 | 0.090 | |
74.21 | 134.76 | 0.963 | 0.995 | 0.999 | 0.059 | 0.207 | 2.347 | 0.089 | |
74.40 | 134.80 | 0.962 | 0.995 | 0.999 | 0.059 | 0.205 | 2.343 | 0.090 | |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
SE[ | 76.02 | 130.44 | 0.964 | 0.995 | 0.999 | 0.058 | 0.198 | 2.372 | 0.089 |
SE+CA | 77.09 | 137.81 | 0.962 | 0.995 | 0.999 | 0.059 | 0.202 | 2.376 | 0.090 |
CA[ | 74.45 | 134.78 | 0.961 | 0.994 | 0.999 | 0.059 | 0.209 | 2.353 | 0.090 |
SA[ | 73.14 | 127.39 | 0.961 | 0.993 | 0.999 | 0.060 | 0.216 | 2.368 | 0.090 |
Triplet[ | 73.13 | 127.40 | 0.961 | 0.994 | 0.999 | 0.061 | 0.210 | 2.393 | 0.092 |
Triplet+CA | 74.40 | 134.78 | 0.962 | 0.994 | 0.999 | 0.060 | 0.215 | 2.360 | 0.090 |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
Table 5 Comparison of experimental results of ablation on attention mechanism types (Cap=80 m)
Method | Param (M) | Flops (B) | Higher value is better | Lower value is better | |||||
---|---|---|---|---|---|---|---|---|---|
δ<1.25 | δ<1.252 | δ<1.253 | Abs Rel | Sq Rel | RMSE | RMSE log | |||
SE[ | 76.02 | 130.44 | 0.964 | 0.995 | 0.999 | 0.058 | 0.198 | 2.372 | 0.089 |
SE+CA | 77.09 | 137.81 | 0.962 | 0.995 | 0.999 | 0.059 | 0.202 | 2.376 | 0.090 |
CA[ | 74.45 | 134.78 | 0.961 | 0.994 | 0.999 | 0.059 | 0.209 | 2.353 | 0.090 |
SA[ | 73.14 | 127.39 | 0.961 | 0.993 | 0.999 | 0.060 | 0.216 | 2.368 | 0.090 |
Triplet[ | 73.13 | 127.40 | 0.961 | 0.994 | 0.999 | 0.061 | 0.210 | 2.393 | 0.092 |
Triplet+CA | 74.40 | 134.78 | 0.962 | 0.994 | 0.999 | 0.060 | 0.215 | 2.360 | 0.090 |
Ours | 74.14 | 134.76 | 0.963 | 0.995 | 0.999 | 0.058 | 0.199 | 2.328 | 0.088 |
[1] | GODARD C, MAC AODHA O, BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 270-279. |
[2] | 蒲正东, 陈姝, 邹北骥, 等. 基于高分辨率网络的自监督单目深度估计方法[J]. 计算机辅助设计与图形学学报, 2023, 35(1): 118-127. |
PU Z D, CHEN S, ZOU B J, et al. A self-supervised monocular depth estimation method based on high resolution convolutional neural network[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(1): 118-127 (in Chinese). | |
[3] | 赵霖, 赵滟, 靳捷. 基于局部注意力和位姿迭代优化的自监督单目深度估计算法[J]. 信号处理, 2022, 38(5): 1088-1097. |
ZHAO L, ZHAO Y, JIN J. A self-supervised monocular depth estimation algorithm based on local attention and iterative pose refinement[J]. Journal of Signal Processing, 2022, 38(5): 1088-1097 (in Chinese). | |
[4] |
SONG M, LIM S, KIM W. Monocular depth estimation using Laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393.
DOI URL |
[5] | FU H, GONG M M, WANG C H, et al. Deep ordinal regression network for monocular depth estimation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2002-2011. |
[6] | YANG M K, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 3684-3692. |
[7] | 张涛, 张晓利, 任彦. Transformer与CNN融合的单目图像深度估计[J]. 哈尔滨理工大学学报, 2022, 27(6): 88-94. |
ZHANG T, ZHANG X L, REN Y. Monocular image depth estimation based on the fusion of transformer and CNN[J]. Journal of Harbin University of Science and Technology, 2022, 27(6): 88-94 (in Chinese). | |
[8] | RANFTL R, BOCHKOVSKIY A, KOLTUN V. Vision transformers for dense prediction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 12179-12188. |
[9] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer International Publishing, 2018: 3-19. |
[10] | HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13713-13722. |
[11] | ZHANG Q L, YANG Y B. SA-net: shuffle attention for deep convolutional neural networks[C]// 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2021: 2235-2239. |
[12] | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 1492-1500. |
[13] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269. |
[14] |
RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
DOI URL |
[15] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[16] | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 2818-2826. |
[17] | EIGEN D, PUHRSCH C, FERGUS R. Depth map prediction from a single image using a multi-scale deep network[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1406.2283. |
[18] | LEE J H, HAN M K, KO D W, et al. From big to small: multi-scale local planar guidance for monocular depth estimation[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1907.10326. |
[19] | UHRIG J, SCHNEIDER N, SCHNEIDER L, et al. Sparsity invariant CNNs[C]// 2017 International Conference on 3D Vision (3DV). New York: IEEE Press, 2017: 11-20. |
[20] | PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1912.01703. |
[21] | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1711.05101. |
[22] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-06-15]. https://arxiv.org/abs/1409.1556. |
[23] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 4700-4708 |
[24] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[25] | MISRA D, NALAMADA T, ARASANIPALAI A U, et al. Rotate to attend: convolutional triplet attention module[C]// 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 3139-3148. |
[26] | WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]// The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers. New York: IEEE Press, 2004: 1398-1402. |
[1] |
YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming.
Reference based transformer texture migrates depth imagessuper resolution reconstruction
[J]. Journal of Graphics, 2023, 44(5): 861-867.
|
[2] |
DANG Hong-she, XU Huai-biao, ZHANG Xuan-de.
Deep learning stereo matching algorithm fusing structural information
[J]. Journal of Graphics, 2023, 44(5): 899-906.
|
[3] |
ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji.
Multi-fitting detection method for transmission lines based onimplicit spatial knowledge fusion
[J]. Journal of Graphics, 2023, 44(5): 918-927.
|
[4] |
YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan.
A local optimization generation model for image inpainting
[J]. Journal of Graphics, 2023, 44(5): 955-965.
|
[5] |
SONG Huan-sheng, WEN Ya, SUN Shi-jie, SONG Xiang-yu, ZHANG Chao-yang, LI Xu.
Tunnel fire detection based on improved student-teacher network
[J]. Journal of Graphics, 2023, 44(5): 978-987.
|
[6] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
[7] | LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism [J]. Journal of Graphics, 2023, 44(4): 658-666. |
[8] | CAO Yi-qin, ZHOU Yi-wei, XU Lu. A real-time metallic surface defect detection algorithm based on E-YOLOX [J]. Journal of Graphics, 2023, 44(4): 677-690. |
[9] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[10] | SHAO Jun-qi, QIAN Wen-hua, XU Qi-hao. Landscape image generation based on conditional residual generative adversarial network [J]. Journal of Graphics, 2023, 44(4): 710-717. |
[11] | GUO Yin-hong, WANG Li-chun, LI Shuang. Image feature matching based on repeatability and specificity constraints [J]. Journal of Graphics, 2023, 44(4): 739-746. |
[12] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[13] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[14] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[15] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||