ST-Rec3D：基于结构和目标感知的三维重建

doi:10.11996/JG.j.2095-302X.2022030469

图学学报 ›› 2022, Vol. 43 ›› Issue (3): 469-477.DOI: 10.11996/JG.j.2095-302X.2022030469

• 计算机图形学与虚拟现实 • 上一篇下一篇

ST-Rec3D：基于结构和目标感知的三维重建

1. 北方民族大学计算机科学与工程学院，宁夏银川 750021；
2. 国家民委图像图形智能处理实验室，宁夏银川 750021

出版日期:2022-06-30 发布日期:2022-06-28
基金资助:
国家自然科学基金项目(61762003，62162001)；中国科学院“西部之光”人才培养引进计划(JF2012c016-2)；宁夏优秀人才支持计划；宁
夏自然科学基金项目(2022AAC02041)

ST-Rec3D: a structure and target-aware 3D reconstruction

1. School of Computer Science and Engineering, North Minzu University, Yinchuan Ningxia 750021, China;
2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, Yinchuan Ningxia 750021, China

Online:2022-06-30 Published:2022-06-28
Supported by:
National Natural Science Foundation of China (61762003, 62162001); “Light of the West” Talent Training and Introduction Plan of
Chinese Academy of Sciences (JF2012c016-2); Ningxia Excellent Talents Support Program; Natural Science Foundation of Ningxia
Province of China (2022AAC02041)

摘要/Abstract

摘要：

基于视图的三维重建旨在从二维图像恢复出其对应的三维形状。现有方法主要通过编码器-解码器结构，结合二元交叉熵函数及其变形，完成三维重建，取得较好的重建结果。然而，编码器在编码过程中缺乏对输入视图的结构感知能力，造成重建的三维模型几何细节不准确；以二元交叉熵函数为主的损失函数在体素分布不均衡的情况下，目标感知能力较差，导致其重建结果存在断裂、缺失等不完整性问题。针对此类问题，提出了一种具有结构和目标感知能力的三维重建网络(ST-Rec3D)，以单视图或多视图为输入，由粗到细地重建出三维模型；结合注意力机制提出了一种具有空间结构感知能力的编码器，即结构编码器，以充分捕捉输入视图中的空间结构信息，有效感知重建物体的几何细节；将 IoU 损失引入到三维体素模型重建中，在体素分布不均衡的情况下，精准感知目标物体，确保重建物体的完整性和准确性。在 ShapeNet和 Pix3D 数据集上的对比结果表明，ST-Rec3D 在单视图和多视图上重建的三维模型的完整性和准确性均优于当前方法。

关键词: 三维重建, 结构感知, 目标感知, 注意力机制, IoU 损失

Abstract:

Image-based 3D reconstruction is the process of producing 3D representations of an object based on its single or multiple images. Existing methods for 3D reconstruction can directly learn to transform image features into 3D representations, using encoder-decoder structure, combined with binary cross entropy function and its deformation. However, the encoder cannot extract enough information from images to reconstruct high-quality 3D shapes, resulting in inaccurate Geometric details of reconstructed 3D objects. The loss functions based on the binary cross entropy function underperforms in target perception when the voxel distribution is imbalanced, leading to problems of incompleteness such as fractures and missing in the reconstruction results. To address these problems, a structure and target-aware 3D object reconstruction framework was proposed for single-view and multi-view 3D reconstruction, named ST-Rec3D. Combined with attention mechanism, we designed an encoder with a spatial perception structure, namely structure-aware encoder. In doing so, the spatial structure information could be fully captured in the input image and the local details of the reconstructed object could be effectively perceived. The utilization of IoU loss in the 3D voxel reconstruction, in the case of uneven voxel distribution, could accurately perceive the target object to ensure the integrity and accuracy of the reconstructed object. Experimental results demonstrate that ST-Rec3D can give a significant boost to reconstruction quality and outperform state-of-the-art methods on the ShapeNet and Pix3D.

Key words: 3D reconstruction, structure-aware, target-aware, attention mechanism, IoU loss

中图分类号:

TP 391

白静, 孟庆亮, 徐昊, 范有福, 杨瞻源. ST-Rec3D：基于结构和目标感知的三维重建[J]. 图学学报, 2022, 43(3): 469-477.

BAI Jing, MENG Qing-liang, XU Hao, FAN You-fu, YANG Zhan-yuan. ST-Rec3D: a structure and target-aware 3D reconstruction[J]. Journal of Graphics, 2022, 43(3): 469-477.

[1]	张盾, 黄志开, 王欢, 吴义鹏, 王颖, 邹家豪. 基于多尺度特征实现超参进化的野生菌分类研究与应用[J]. 图学学报, 2022, 43(4): 580-589.
[2]	贺琪, 李汶龙, 宋巍, 杜艳玲, 黄冬梅, 耿立佳 . 结合残差时空注意力机制的海面温度预测算法[J]. 图学学报, 2022, 43(4): 677-684.
[3]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.
[4]	李扬科, 宋全博, 周元峰. 用于手势识别的时空融合网络以及虚拟签名系统[J]. 图学学报, 2022, 43(3): 504-512.
[5]	张明, 张芳慧, 宗佳平, 宋治, 岑翼刚, 张琳娜. 基于轻量级网络的人脸检测及嵌入式实现[J]. 图学学报, 2022, 43(2): 239-246.
[6]	苏常保, 龚世才. 基于深度学习的人物肖像全自动抠图算法[J]. 图学学报, 2022, 43(2): 247-253.
[7]	李翠云, 白静, 郑凉. 融合边缘增强注意力机制和 U-Net 网络的医学图像分割[J]. 图学学报, 2022, 43(2): 273-278.
[8]	何国忠, 梁宇. 基于卷积神经网络的 PCB 缺陷检测[J]. 图学学报, 2022, 43(1): 21-27.
[9]	史彩娟, 陈厚儒, 葛录录, 王子雯. 注意力残差多尺度特征增强的显著性实例分割[J]. 图学学报, 2021, 42(6): 883-890.
[10]	黄文明, 阳沐利, 蓝如师, 邓珍荣, 罗笑南. 融合非局部神经网络的行为检测模型 [J]. 图学学报, 2021, 42(3): 439-445.
[11]	杨世强, 杨江涛, 李卓, 王金华, 李德信. 基于 LSTM 神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
[12]	李彬 , 王平 , 赵思逸 . 基于双重注意力机制的图像超分辨重建算法[J]. 图学学报, 2021, 42(2): 206-215.
[13]	常东良 , 尹军辉 , 谢吉洋 , 孙维亚 , 马占宇 . 面向图像分类的基于注意力引导的 Dropout[J]. 图学学报, 2021, 42(1): 32-36.
[14]	张永鹏, 张春梅, 白静. 基于 DenseNet-Attention 模型的高光谱图像分类[J]. 图学学报, 2020, 41(6): 897-904.
[15]	袁建平 1，陈晓龙 1，陈显龙 1，何恩杰 1，张加其 2，高宇豆 2 . 基于文本与视觉信息的细粒度图像分类[J]. 图学学报, 2019, 40(3): 503-512.

ST-Rec3D：基于结构和目标感知的三维重建

ST-Rec3D: a structure and target-aware 3D reconstruction

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价