Dense point cloud reconstruction network based on adaptive aggregation recurrent recursion

doi:10.11996/JG.j.2095-302X.2024010230

Abstract

Abstract:

To address the problems such as difficulties in weak texture reconstruction, high resource consumption, and long reconstruction time, a multi-stage dense point cloud reconstruction network based on adaptive aggregation cyclic recursive convolution was proposed, namely A²R²-MVSNet (adaptive aggregation recurrent recursive multi view stereo net). This method first introduced a feature extraction module based on multi-scale cyclic recursive residuals to aggregate contextual semantic information, addressing the problem of difficult feature extraction in weakly textured or textureless regions. In the cost body regularization part, a residual regularization module was proposed. This module enhanced the ability of 3D CNN to extract and aggregate contextual semantics under the premise of slightly increasing memory consumption. The experimental results demonstrated that the proposed method ranked high in comprehensive metrics on the DTU dataset, showcasing superior performance in reconstructing details. Additionally, it could generate good depth maps and point cloud results on the BlendedMVS dataset. Furthermore, the network was tested for generalization on self-collected large-scale high-resolution datasets. Thanks to the coarse-to-fine multi-stage idea and our proposed module, the network could not only generate high-accuracy and complete depth maps, but also perform high-resolution reconstructions suitable for practical applications.

Key words: deep learning, computer vision, 3D reconstruction, dense reconstruction, multi-view stereo, recurrent neural network

CLC Number:

TP391

WANG Jiang’an, HUANG Le, PANG Dawei, QIN Linzhen, LIANG Wenqian. Dense point cloud reconstruction network based on adaptive aggregation recurrent recursion[J]. Journal of Graphics, 2024, 45(1): 230-239.

Figures/Tables 10

References 56

[1]	AANAES H, JENSEN R R, VOGIATZIS G, et al. Large-scale data for multiple-view stereopsis[J]. International Journal of Computer Vision, 2016, 120(2): 153-168. DOI URL
[2]	FURUKAWA Y, HERNÁNDEZ C. Multi-view stereo: a tutorial[J]. Foundations and Trends® in Computer Graphics and Vision, 2015, 9(1-2): 1-148. DOI URL
[3]	王思启, 张家强, 李丽圆, 等. MVSNet在空间目标三维重建中的应用[J]. 中国激光, 2022, 49(23): 176-185.
	WANG S Q, ZHANG J Q, LI L Y, et al. Application of MVSNet in 3D reconstruction of space objects[J]. Chinese Journal of Lasers, 2022, 49(23): 176-185 (in Chinese).
[4]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4104-4113.
[5]	KANG S B, SZELISKI R, CHAI J X. Handling occlusions in dense multi-view stereo[C]// The 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR. New York: IEEE Press, 2003:I: 103-I:110..
[6]	SCHÖNBERGER J L, ZHENG E L, FRAHM J M, et al. Pixelwise view selection for unstructured multi-view stereo[C]// European Conference on Computer Vision. Cham: Springer, 2016: 501-518.
[7]	刘万军, 王俊恺, 曲海成. 多尺度代价体信息共享的多视角立体重建网络[J]. 中国图象图形学报, 2022, 27(11): 3331-3342.
	LIU W J, WANG J K, QU H C. Multi-scale cost volumes information sharing based multi-view stereo reconstructed model[J]. Journal of Image and Graphics, 2022, 27(11): 3331-3342 (in Chinese).
[8]	王江安, 庞大为, 黄乐, 等. 基于多尺度特征递归卷积的稠密点云重建网络[J]. 图学学报, 2022, 43(5): 875-883.
	WANG J A, PANG D W, HUANG L, et al. Dense point cloud reconstruction network using multi-scale feature recursive convolution[J]. Journal of Graphics, 2022, 43(5): 875-883 (in Chinese).
[9]	NIRKIN Y, WOLF L, HASSNER T. HyperSeg: patch-wise hypernetwork for real-time semantic segmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 4060-4069.
[10]	罗旭东, 吴一全, 陈金林. 无人机航拍影像目标检测与语义分割的深度学习方法研究进展[J/OL]. 航空学报, 2023: 1-33. [2023-06-12]. https://kns.cnki.net/kcms/detail/11.1929.V.20230609.1350.008.html.
	LUO X D, WU Y Q, CHEN J L. Research progress on deep learning methods for object detection and semantic segmentation in UAV aerial images[J/OL]. Acta Aeronautica et Astronautica Sinica, 2023: 1-33. [2023-06-12]. https://kns.cnki.net/kcms/detail/11.1929.V.20230609.1350.008.html. (in Chinese).
[11]	王艺娴, 胡雨凡, 孔庆群, 等. 三维点云语义分割:现状与挑战[J]. 工程科学学报, 2023, 45(10): 1653-1665.
	WANG Y X, HU Y F, KONG Q Q, et al. 3D point cloud semantic segmentation: state of the art and challenges[J]. Chinese Journal of Engineering, 2023, 45(10): 1653-1665 (in Chinese).
[12]	HAMID M S, MANAP N A, HAMZAH R A, et al. Stereo matching algorithm based on deep learning: a survey[J]. Journal of King Saud University - Computer and Information Sciences, 2022, 34(5): 1663-1673. DOI URL
[13]	张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报: 自然科学版, 2018, 58(4): 438-444.
	ZHANG X Y, GAO H B, ZHAO J H, et al. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University: Science and Technology, 2018, 58(4): 438-444 (in Chinese).
[14]	KNAPITSCH A, PARK J, ZHOU Q Y, et al. Tanks and temples: benchmarking large-scale scene reconstruction[J]. ACM Transactions on Graphics, 36(4): 78:1-78:13.
[15]	ZHU Q T, MIN C, WEI Z Z, et al. Deep learning for multi-view stereo via plane sweep: a survey[EB/OL]. [2023-06-22]. http://arxiv.org/abs/2106.15328v2.
[16]	许允波, 张建兵, 谭宁生. 基于平面扫描的线状缓冲区生成的改进算法[J]. 计算机应用研究, 2012, 29(11): 4364-4366, 4389.
	XU Y B, ZHANG J B, TAN N S. Improved algorithm for line buffering based on plane sweep technique[J]. Application Research of Computers, 2012, 29(11): 4364-4366, 4389 (in Chinese).
[17]	YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo[C]// European Conference on Computer Vision. Cham: Springer, 2018: 785-801.
[18]	YAO Y, LUO Z X, LI S W, et al. Recurrent MVSNet for high-resolution multi-view stereo depth inference[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 5520-5529.
[19]	YU Z H, GAO S H. Fast-MVSNet: sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1946-1955.
[20]	汤建龙, 解佳龙, 薛成均. 利用高斯牛顿迭代的时频差无源定位算法[J]. 西安电子科技大学学报, 2023, 50(1): 19-28, 47.
	TANG J L, XIE J L, XUE C J. TDOA-FDOA passive location algorithm using gauss-newton iteration[J]. Journal of Xidian University, 2023, 50(1): 19-28, 47 (in Chinese).
[21]	ZHANG J Y, YAO Y, LI S W, et al. Visibility-aware multi-view stereo network[EB/OL]. [2023-06-22]. https://arxiv.org/abs/2008.07928.pdf.
[22]	YAN J F, WEI Z Z, YI H W, et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking[C]// European Conference on Computer Vision. Cham: Springer, 2020: 674-689.
[23]	WEI Z Z, ZHU Q T, MIN C, et al. AA-RMVSNet: adaptive aggregation recurrent multi-view stereo network[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 6167-6176.
[24]	SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM Network: a machine learning approach for precipitation nowcasting[C]// The 28th International Conference on Neural Information Processing Systems - Volume 1. New York:ACM, 2015: 802-810.
[25]	GU X D, FAN Z W, ZHU S Y, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2492-2501.
[26]	YANG J Y, MAO W, ALVAREZ J M, et al. Cost volume pyramid based depth inference for multi-view stereo[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4876-4885.
[27]	ZHANG X D, HU Y T, WANG H C, et al. Long-range attention network for multi-view stereo[C]// 2021 IEEE Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2021: 3781-3790.
[28]	WANG F, GALLIANI S, VOGEL C, et al. PatchmatchNet: learned multi-view patchmatch stereo[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14189-14198.
[29]	MA X J, GONG Y, WANG Q R, et al. EPP-MVSNet: epipolar-assembling based depth prediction for multi-view stereo[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 5712-5720.
[30]	WANG F, GALLIANI S, VOGEL C, et al. IterMVS: iterative probability estimation for efficient multi-view stereo[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8596-8605.
[31]	CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2023-06-22]. https://arxiv.org/abs/1406.1078.pdf
[32]	PENG R, WANG R J, WANG Z Y, et al. Rethinking depth estimation for multi-view stereo: a unified representation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8635-8644.
[33]	XI J H, SHI Y F, WANG Y J, et al. Ra_yMVSNet: learning ray-based 1D implicit fields for accurate multi-view stereo[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8585-8595.
[34]	DING Y K, YUAN W T, ZHU Q T, et al. TransMVSNet: global context-aware multi-view stereo network with transformers[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8575-8584.
[35]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[36]	MI Z X, DI C, XU D. Generalized binary search network for highly-efficient multi-view stereo[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 12981-12990.
[37]	YAMASHITA K, ENYO Y, NOBUHARA S, et al. nLMVS-net: deep non-lambertian multi-view stereo[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2023: 3036-3045.
[38]	CHIU C Y, WU Y T, SHEN I C, et al. 360MVSNet: deep multi-view stereo network with 360° images for indoor scene reconstruction[C]// 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2023: 3056-3065.
[39]	ZHANG X D, YANG F Z, CHANG M, et al. MG-MVSNet: multiple granularities feature fusion network for multi-view stereo[J]. Neurocomputing, 2023, 528: 35-47. DOI URL
[40]	ZHANGL Y, ZHU J K, LIN L X. Multi-view stereo representation revist: region-aware MVSNet[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 17376-17385.
[41]	QIAO S Y, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 10208-10219.
[42]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269.
[43]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[44]	鄢化彪, 徐方奇, 黄绿娥, 等. 基于深度学习的多视图立体重建方法综述[J]. 光学精密工程, 2023, 31(16): 2444-2464.
	YAN H B, XU F Q, HUANG L E, et al. Review of multi-view stereo reconstruction methods based on deep learning[J]. Optics and Precision Engineering, 2023, 31(16): 2444-2464 (in Chinese). DOI URL
[45]	RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241.
[46]	杨航, 陈瑞, 安仕鹏, 等. 深度学习背景下的图像三维重建技术进展综述[J]. 中国图象图形学报, 2023, 28(8): 2396-2409.
	YANG H, CHEN R, AN S P, et al. The growth of image-related three dimensional reconstruction techniques in deep learning-driven era: a critical summary[J]. Journal of Image and Graphics, 2023, 28(8): 2396-2409 (in Chinese). DOI URL
[47]	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// The 32nd International Conference on International Conference on Machine Learning - Volume 37. New York:ACM, 2015: 448-456.
[48]	Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks[C]// Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011: 315-323.
[49]	WU Y X, HE K M. Group normalization[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
[50]	XU B, WANG N Y, CHEN T Q, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL]. [2023-06-22]. https://arxiv.org/abs/1505.00853.pdf
[51]	许彪, 董友强, 张力, 等. 分区优化混合SfM方法[J]. 测绘学报, 2022, 51(1): 115-126. DOI
	XU B, DONG Y Q, ZHANG L, et al. A hybrid SfM method based on partition optimization[J]. Acta Geodaetica et Cartographica Sinica, 2022, 51(1): 115-126 (in Chinese). DOI
[52]	袁艺天, 林春雨, 赵耀, 等. 基于边缘校正的深度图像上采样后处理算法[J]. 铁道学报, 2015, 37(12): 67-73.
	YUAN Y T, LIN C Y, ZHAO Y, et al. A post processing algorithm for upsampling depth image based on boundary correction[J]. Journal of the China Railway Society, 2015, 37(12): 67-73 (in Chinese).
[53]	YAO Y, LUO Z X, LI S W, et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 1787-1796.
[54]	GALLIANI S, LASINGER K, SCHINDLER K. Massively parallel multiview stereopsis by surface normal diffusion[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2016: 873-881.
[55]	FURUKAWA Y, PONCE J. Accurate, dense, and robust multiview stereopsis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(8): 1362-1376. DOI PMID
[56]	CHENG S, XU Z X, ZHU S L, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2521-2531.

输入尺寸	结构	输出尺寸
H×W×3	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H×W×8	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H×W×8	Conv+GN+LeakyReLU,3×3, stride=2	H/2×W/2×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=2	H/4×W/4×32
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×32
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=2	H/8×W/8×64
H/8×W/8×64	Conv+GN+LeakyReLU,3×3, stride=1	H/8×W/8×64
H/8×W/8×64	Conv+GN+LeakyReLU,3×3, stride=1	H/8×W/8×64
H/4×W/4×96	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×32
H/2×W/2×48	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H×W×24	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H×W×8	Conv+GN+LeakyReLU,3×3, stride=1	H×W×16

输入尺寸	结构	输出尺寸
H×W×3	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H×W×8	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H×W×8	Conv+GN+LeakyReLU,3×3, stride=2	H/2×W/2×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=2	H/4×W/4×32
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×32
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=2	H/8×W/8×64
H/8×W/8×64	Conv+GN+LeakyReLU,3×3, stride=1	H/8×W/8×64
H/8×W/8×64	Conv+GN+LeakyReLU,3×3, stride=1	H/8×W/8×64
H/4×W/4×96	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×32
H/2×W/2×48	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H×W×24	Conv+GN+LeakyReLU,3×3, stride=1	H×W×8
H/4×W/4×32	Conv+GN+LeakyReLU,3×3, stride=1	H/4×W/4×16
H/2×W/2×16	Conv+GN+LeakyReLU,3×3, stride=1	H/2×W/2×16
H×W×8	Conv+GN+LeakyReLU,3×3, stride=1	H×W×16

方法	Acc	Comp	Overall
Furu^[55]	0.613	0.941	0.777
Gipuma^[54]	0.283	0.873	0.578
COLMAP^[6]	0.400	0.664	0.532
MVSNet^[17]	0.396	0.527	0.462
R-MVSNet^[18]	0.383	0.452	0.417
D2HC-RMVSNet^[22]	0.395	0.378	0.386
IterMVS^[30]	0.373	0.354	0.363
EPP-MVSNet^[29]	0.413	0.296	0.355
Cas-MVSNet^[25]	0.325	0.385	0.355
PatchmatchNet^[28]	0.427	0.277	0.352
CVP-MVSNet^[26]	0.296	0.406	0.351
MG-MVSNet^[39]	0.358	0.338	0.348
UCSNet^[56]	0.338	0.349	0.344
LANet^[27]	0.320	0.349	0.335
UniMVSNet^[32]	0.352	0.278	0.315
Ours	0.321	0.346	0.334

方法	Acc	Comp	Overall
Furu^[55]	0.613	0.941	0.777
Gipuma^[54]	0.283	0.873	0.578
COLMAP^[6]	0.400	0.664	0.532
MVSNet^[17]	0.396	0.527	0.462
R-MVSNet^[18]	0.383	0.452	0.417
D2HC-RMVSNet^[22]	0.395	0.378	0.386
IterMVS^[30]	0.373	0.354	0.363
EPP-MVSNet^[29]	0.413	0.296	0.355
Cas-MVSNet^[25]	0.325	0.385	0.355
PatchmatchNet^[28]	0.427	0.277	0.352
CVP-MVSNet^[26]	0.296	0.406	0.351
MG-MVSNet^[39]	0.358	0.338	0.348
UCSNet^[56]	0.338	0.349	0.344
LANet^[27]	0.320	0.349	0.335
UniMVSNet^[32]	0.352	0.278	0.315
Ours	0.321	0.346	0.334

方法	参数量/M	GPU占用/GB	运行时间/s	Acc/mm	Comp/mm	Overall/mm
Baseline	0.44	7.545	2.872	0.348	0.357	0.353
Baseline+FPN	0.55	7.548	2.885	0.345	0.352	0.349
Baseline+A²R²CNN	0.56	7.548	3.165	0.337	0.342	0.340
Baseline+RU-Net	0.56	8.339	2.911	0.327	0.356	0.342
Ours	0.67	8.342	3.210	0.321	0.346	0.334