Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 464-471.DOI: 10.11996/JG.j.2095-302X.2024030464
Previous Articles Next Articles
ZHU Guanghui(), MIAO Jun(
), HU Hongli, SHEN Ji, DU Ronghua
Received:
2023-09-20
Accepted:
2024-03-04
Online:
2024-06-30
Published:
2024-06-06
Contact:
MIAO Jun (1979-), associate professor, Ph.D. His main research interest cover image processing and pattern recognition, as well as 3D scene reconstruction. E-mail:About author:
ZHU Guanghui (1997-), master student. His main research interest covers computer image processing. E-mail:2314045303@qq.com
Supported by:
CLC Number:
ZHU Guanghui, MIAO Jun, HU Hongli, SHEN Ji, DU Ronghua. 3D piece-wise planar reconstruction from a single indoor image based on self-augmented -attention mechanism[J]. Journal of Graphics, 2024, 45(3): 464-471.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024030464
Method | Rel↓ | Rel(sqr)↓ | Log10↓ | RMSEiin↓ | RMSElog↓ | 1.25 | 1.252 | 1.253 |
---|---|---|---|---|---|---|---|---|
PlaneNet | 0.142 | 0.107 | 0.060 | 0.514 | 0.179 | 81.2 | 95.7 | 98.9 |
PlaneAE | 0.141 | 0.107 | 0.061 | 0.529 | 0.184 | 81.0 | 95.7 | 99.0 |
PlaneRecNet | 0.138 | 0.099 | 0.058 | 0.512 | 0.179 | 82.0 | 95.9 | 99.0 |
InvPT | 0.136 | 0.099 | 0.059 | 0.518 | 0.179 | 82.0 | 95.9 | 99.0 |
Ours | 0.134 | 0.098 | 0.058 | 0.506 | 0.177 | 82.1 | 96.2 | 99.0 |
Table 1 Comparison of depth accuracy based on NYU-V2 dataset
Method | Rel↓ | Rel(sqr)↓ | Log10↓ | RMSEiin↓ | RMSElog↓ | 1.25 | 1.252 | 1.253 |
---|---|---|---|---|---|---|---|---|
PlaneNet | 0.142 | 0.107 | 0.060 | 0.514 | 0.179 | 81.2 | 95.7 | 98.9 |
PlaneAE | 0.141 | 0.107 | 0.061 | 0.529 | 0.184 | 81.0 | 95.7 | 99.0 |
PlaneRecNet | 0.138 | 0.099 | 0.058 | 0.512 | 0.179 | 82.0 | 95.9 | 99.0 |
InvPT | 0.136 | 0.099 | 0.059 | 0.518 | 0.179 | 82.0 | 95.9 | 99.0 |
Ours | 0.134 | 0.098 | 0.058 | 0.506 | 0.177 | 82.1 | 96.2 | 99.0 |
Method | ||||||
---|---|---|---|---|---|---|
PlaneAE | 45.40 | 57.62 | 46.26 | 49.52 | 61.77 | 49.83 |
PlaneRCNN | 62.71 | 75.18 | 69.26 | 65.13 | 76.27 | 71.09 |
PlaneRecNet | 63.88 | 78.95 | 73.17 | 64.93 | 79.00 | 69.80 |
InvPT | 64.22 | 79.05 | 72.60 | 64.86 | 78.82 | 71.36 |
Ours | 65.11 | 79.98 | 65.62 | 66.24 | 81.10 | 72.37 |
Table 2 Comparison of segmentation accuracy based on NYU-V2 dataset
Method | ||||||
---|---|---|---|---|---|---|
PlaneAE | 45.40 | 57.62 | 46.26 | 49.52 | 61.77 | 49.83 |
PlaneRCNN | 62.71 | 75.18 | 69.26 | 65.13 | 76.27 | 71.09 |
PlaneRecNet | 63.88 | 78.95 | 73.17 | 64.93 | 79.00 | 69.80 |
InvPT | 64.22 | 79.05 | 72.60 | 64.86 | 78.82 | 71.36 |
Ours | 65.11 | 79.98 | 65.62 | 66.24 | 81.10 | 72.37 |
Method | RI | VI | SC |
---|---|---|---|
原网络 | 0.888 | 1.380 | 0.519 |
SAA | 0.893 | 1.322 | 0.546 |
LED | 0.891 | 1.361 | 0.530 |
SAA+LED | 0.911 | 1.304 | 0.551 |
Table 3 Comparison of segmentation accuracy in ablation experiments
Method | RI | VI | SC |
---|---|---|---|
原网络 | 0.888 | 1.380 | 0.519 |
SAA | 0.893 | 1.322 | 0.546 |
LED | 0.891 | 1.361 | 0.530 |
SAA+LED | 0.911 | 1.304 | 0.551 |
Number | RI | VI | SC |
---|---|---|---|
1 | 0.890 | 1.378 | 0.530 |
2 | 0.895 | 1.322 | 0.546 |
3 | 0.911 | 1.304 | 0.551 |
Table 4 Comparison of segmentation accuracy using different numbers of self-attention enhancement modules
Number | RI | VI | SC |
---|---|---|---|
1 | 0.890 | 1.378 | 0.530 |
2 | 0.895 | 1.322 | 0.546 |
3 | 0.911 | 1.304 | 0.551 |
[1] | LIU C, YANG J M, CEYLAN D, et al. PlaneNet: piece-wise planar reconstruction from a single RGB image[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2579-2588. |
[2] | YANG F T, ZHOU Z H. Recovering 3D planes from a single image via convolutional neural networks[C]// European Conference on Computer Vision. Cham: Springer, 2018: 87-103. |
[3] | YE H R, XU D. Inverted pyramid multi-task transformer for dense scene understanding[C]// European Conference on Computer Vision. Cham: Springer, 2022: 514-530. |
[4] | YU Z H, ZHENG J, LIAN D Z, et al. Single-image piece-wise planar 3D reconstruction via associative embedding[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1029-1037. |
[5] | LIU C, KIM K, GU J W, et al. PlaneRCNN: 3D plane detection and reconstruction from a single image[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4445-4454. |
[6] | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 2980-2988. |
[7] | QIAN Y M, FURUKAWA Y. Learning pairwise inter-plane relations for piecewise planar reconstruction[C]// European Conference on Computer Vision. Cham: Springer, 2020: 330-345. |
[8] | XI W J, CHEN X J. Reconstructing piecewise planar scenes with multi-view regularization[J]. Computational Visual Media, 2019, 5(4): 337-345. |
[9] | XIE Y X, RAMBACH J, SHU F W, et al. PlaneSegNet: fast and robust plane estimation using a single-stage instance segmentation CNN[C]// 2021 IEEE International Conference on Robotics and Automation. New York: IEEE Press, 2021: 13574-13580. |
[10] | TAN B, XUE N, BAI S, et al. PlaneTR: structure-guided transformers for 3D plane recovery[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 4166-4175. |
[11] | ZHANG W D, ZHANG Y M, SONG R, et al. 3D layout estimation via weakly supervised learning of plane parameters from 2D segmentation[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2022, 31: 868-879. |
[12] | LIU J C, JI P, BANSAL N, et al. PlaneMVS: 3D plane reconstruction from multi-view stereo[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8655-8665. |
[13] | 江瑞祥, 缪君, 储珺, 等. 基于多尺度聚焦网络的单图像城市场景3D平面重建[J]. 小型微型计算机系统, 2022, 43(8): 1718-1724. |
JIANG R X, MIAO J, CHU J, et al. 3D planar reconstruction of urban scene from single image based on multi-scale focusing network[J]. Journal of Chinese Computer Systems, 2022, 43(8): 1718-1724 (in Chinese). | |
[14] | XIE Y X, SHU F W, RAMBACH J, et al. PlaneRecNet: multi-task learning with cross-task consistency for piece-wise plane detection and reconstruction from a single RGB image[EB/OL]. (2021-10-21) [2023-08-22]. http://arxiv.org/abs/2110.11219. |
[15] | HERMANN K M, KOČISKÝ T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[EB/OL]. (2015-11-19) [2023-07-18]. http://arxiv.org/abs/1506.03340. |
[16] | GAO Z L, XIE J T, WANG Q L, et al. Global second-order pooling convolutional networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3019-3028. |
[17] | LEE H, KIM H E, NAM H. SRM: a style-based recalibration module for convolutional neural networks[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1854-1862. |
[18] | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141. |
[19] | MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[EB/OL]. (2014-06-24) [2023-06-18]. http://arxiv.org/abs/1406.6247. |
[20] | HU J, SHEN L, ALBANIE S, et al. Gather-excite: exploiting feature context in convolutional neural networks[EB/OL]. (2019-01-12) [2023-07-23]. http://arxiv.org/abs/1810.12348. |
[21] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[22] | WANG Z W, SHE Q, SMOLIC A. ACTION-net: multipath excitation for action recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13209-13218. |
[23] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. (2017-06-12) [2023-07-13]. http://arxiv.org/abs/1706.03762. |
[24] | YANG B S, WANG L Y, WONG D, et al. Convolutional self-attention networks[EB/OL]. [2023-07-13]. http://arxiv.org/abs/1904.03107 |
[1] | HU Xin, CHANG Yashu, QIN Hao, XIAO Jian, CHENG Hongliang. Binocular ranging method based on improved YOLOv8 and GMM image point set matching [J]. Journal of Graphics, 2024, 45(4): 714-725. |
[2] | NIU Weihua, GUO Xun. Rotating target detection algorithm in ship remote sensing images based on YOLOv8 [J]. Journal of Graphics, 2024, 45(4): 726-735. |
[3] | LI Tao, HU Ting, WU Dandan. Monocular depth estimation combining pyramid structure and attention mechanism [J]. Journal of Graphics, 2024, 45(3): 454-463. |
[4] | WANG Zhiru, CHANG Yuan, LU Peng, PAN Chengwei. A review on neural radiance fields acceleration [J]. Journal of Graphics, 2024, 45(1): 1-13. |
[5] | WANG Xinyu, LIU Hui, ZHU Jicheng, SHENG Yurui, ZHANG Caiming. Deep multimodal medical image fusion network based on high-low frequency feature decomposition [J]. Journal of Graphics, 2024, 45(1): 65-77. |
[6] | LI Jiaqi, WANG Hui, GUO Yu. Classification and segmentation network based on Transformer for triangular mesh [J]. Journal of Graphics, 2024, 45(1): 78-89. |
[7] | HAN Yazhen, YIN Mengxiao, MA Weizhao, YANG Shigeng, HU Jinfei, ZHU Congyang. DGOA: point cloud upsampling based on dynamic graph and offset attention [J]. Journal of Graphics, 2024, 45(1): 219-229. |
[8] | WANG Jiang’an, HUANG Le, PANG Dawei, QIN Linzhen, LIANG Wenqian. Dense point cloud reconstruction network based on adaptive aggregation recurrent recursion [J]. Journal of Graphics, 2024, 45(1): 230-239. |
[9] | ZHOU Rui-chuang, TIAN Jin, YAN Feng-ting, ZHU Tian-xiao, ZHANG Yu-jin. Point cloud classification model incorporating external attention and graph convolution [J]. Journal of Graphics, 2023, 44(6): 1162-1172. |
[10] | WANG Ji, WANG Sen, JIANG Zhi-wen, XIE Zhi-feng, LI Meng-tian. Zero-shot text-driven avatar generation based on depth-conditioned diffusion model [J]. Journal of Graphics, 2023, 44(6): 1218-1226. |
[11] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[12] | DANG Hong-she, XU Huai-biao, ZHANG Xuan-de. Deep learning stereo matching algorithm fusing structural information [J]. Journal of Graphics, 2023, 44(5): 899-906. |
[13] | ZHAI Yong-jie, GUO Cong-bin, WANG Qian-ming, ZHAO Kuan, BAI Yun-shan, ZHANG Ji. Multi-fitting detection method for transmission lines based on implicit spatial knowledge fusion [J]. Journal of Graphics, 2023, 44(5): 918-927. |
[14] | YANG Hong-ju, GAO Min, ZHANG Chang-you, BO Wen, WU Wen-jia, CAO Fu-yuan. A local optimization generation model for image inpainting [J]. Journal of Graphics, 2023, 44(5): 955-965. |
[15] | BI Chun-yan, LIU Yue. A survey of video human action recognition based on deep learning [J]. Journal of Graphics, 2023, 44(4): 625-639. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||