Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法

doi:10.11996/JG.j.2095-302X.2024061375

图学学报 ›› 2024, Vol. 45 ›› Issue (6): 1375-1388.DOI: 10.11996/JG.j.2095-302X.2024061375

• 计算机图形学与虚拟现实 • 上一篇下一篇

Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法

王宗继¹^,²(), 刘云飞², 陆峰²()

1.中国科学院空天信息创新研究院，目标认知与应用技术重点实验室，网络信息体系技术重点实验室，北京 100190
2.北京航空航天大学计算机学院，虚拟现实技术与系统国家重点实验室，北京 100191

收稿日期:2024-07-04 接受日期:2024-09-24 出版日期:2024-12-31 发布日期:2024-12-24
通讯作者:陆峰(1985-)，男，教授，博士。主要研究方向为计算机视觉、人工智能、人机交互、虚拟/增强现实等。E-mail：lufeng@buaa.edu.cn
第一作者:王宗继(1991-)，男，助理研究员，博士。主要研究方向为三维场景重建与理解。E-mail：wangzongji@aircas.ac.cn
基金资助:
“十四五”共用信息系统装备预研项目(31511060301)

Cloud Sphere: a 3D shape representation method via progressive deformation

WANG Zongji¹^,²(), LIU Yunfei², LU Feng²()

1. Key Laboratory of Target Cognition and Application Technology, Key Laboratory of Network Information System Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China
2. State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China

Received:2024-07-04 Accepted:2024-09-24 Published:2024-12-31 Online:2024-12-24
Contact: LU Feng (1985-), professor, Ph.D. His main research interests cover computer vision, artificial intelligence, human-computer interaction, virtual/augmented reality, etc. E-mail：lufeng@buaa.edu.cn
First author：WANG Zongji (1991-), assistant researcher, Ph.D. His main research interests cover 3D scene reconstruction and understanding. E-mail：wangzongji@aircas.ac.cn
Supported by:
“The 14th Five-Year Plan” Common Information System Equipment Preliminary Research Project(31511060301)

摘要/Abstract

摘要：

针对大数据时代三维模型形状多样性激增的挑战，致力于从形状形成过程中发现独特信息，提出了一种基于球表面逐步变形对三维模型的形状进行统一表征的方法。输入任意三维模型，通过逐步变形自编码网络将一个模板球面点云逐步变形拟合该输入形状。通过深度神经网络建模三维模型变形过程，从多阶段变形中挖掘独特的形状特征，避免了任务驱动学习方法对人工标注的依赖。通过显式编码形状生成过程中的变形残差，不仅捕捉了最终形状，还记录了形状的渐进变化过程。在深度神经网络的训练方面，采用了多阶段信息监督的方式，提高了变形重建的精度。与当前技术水平代表方法的对比实验表明，多阶段监督训练方式能够增强变形重建结果的细节精度。丰富的消融实验验证了多阶段监督方式的有效性。变形表征方法适用于模型分类、形状迁移、共编辑等计算机图形学应用，具有泛用性，可为三维模型几何属性自动解析与高效编辑提供底层的数据表征方法支持。

关键词: 三维表征, 三维模型变形, 球面点云模板, 自编码器, 深度学习

Abstract:

As 3D data proliferates, 3D models are exhibiting increasingly diverse and complex shapes. Dedicated to discovering distinctive information from the shape formation process, a method has been developed to uniformly represent the shapes of 3D models through progressive deformation. For any input 3D model, a spherical point cloud template was gradually deformed to fit the input shape through a coarse-to-fine progressive deformation-based auto-encoder. The 3D shape deformation process was modeled using deep neural networks, extracting unique shape features from the multi-stage deformation process and avoiding the reliance on manual annotations common in general task-driven learning methods. The deformation residuals during the shape generation process were explicitly encoded. It not only captured the final shape but also recorded the progressive deformation process from the initial state to the final shape. In terms of deep neural network training, a multi-stage information supervision approach was developed for feature learning, improving the accuracy of deformation reconstruction. Experimental results showed that the proposed method has the ability to reconstruct 3D shapes with high fidelity, and consistent topology was preserved in the multi-stage deformation process. This deformation representation is applicable to various computer graphics applications such as model classification, shape transfer, and co-editing, demonstrating versatility and providing underlying data representation method support for automatic parsing and efficient editing of 3D model geometric properties.

Key words: 3D representation, 3D deformation, spherical point clouds template, auto-encoder, deep learning

中图分类号:

TP391

王宗继, 刘云飞, 陆峰. Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法[J]. 图学学报, 2024, 45(6): 1375-1388.

WANG Zongji, LIU Yunfei, LU Feng. Cloud Sphere: a 3D shape representation method via progressive deformation[J]. Journal of Graphics, 2024, 45(6): 1375-1388.

图/表 13

图1 由粗到细的逐步变形重建示意图

Fig. 1 The coarse-to-fine progressive deformation process for shape reconstruction from a spherical template

图2 基于多阶段变形的三维模型形状自重构方法流程图

Fig. 2 The coarse-to-fine progressive deformation based auto-encoder framework

图3 多抽象层次数据预处理方法流程图

Fig. 3 Multi-level abstract data preprocessing method flowchart

图4 ShapeNetCore(v1)中的椅子类别统计信息

Fig. 4 Chair category statistics in ShapeNetCore(v1)

图5 从椅子类别中选取四类细粒度子类别的展示

Fig. 5 Display of four fine-grained fubcategories within the Chair category

表1 自重建精度数值比较分析(CD: ×1000, EMD: ×100, 计算IoU使用体素分辨率为32)

Table 1 Comparative analysis of self-reconstruction accuracy metrics (CD: ×1000, EMD: ×100, IoU calculated using voxel resolution of 32)

类别	LOGAN			AtlasNet			CFPDAE (本文方法)
类别	CD↓	EMD↓	IoU↑	CD↓	EMD↓	IoU↑	CD↓	EMD↓	IoU↑
Air plane A	0.23	1.07	0.77	0.17	0.94	0.81	0.15	0.90	0.81
Air plane B	0.82	1.93	0.61	0.64	1.67	0.62	0.40	1.43	0.69
Air plane C	3.20	3.73	0.40	1.31	2.39	0.55	0.68	1.98	0.61
Air plane D	1.60	3.18	0.48	0.92	2.29	0.51	0.78	2.30	0.53
平均	1.46	2.48	0.56	0.76	1.82	0.62	0.50	1.65	0.66
Car A	0.62	1.83	0.56	0.70	1.57	0.66	0.40	1.53	0.67
Car B	0.37	1.50	0.64	0.85	1.82	0.61	0.33	1.42	0.69
Car C	0.99	2.33	0.45	7.46	3.52	0.51	0.62	1.82	0.59
Car D	0.76	2.04	0.54	0.76	1.91	0.55	0.47	1.69	0.63
平均	0.69	1.92	0.55	2.44	2.21	0.58	0.45	1.62	0.65
Chair A	1.59	3.16	0.46	0.92	2.43	0.54	0.51	1.68	0.71
Chair B	5.04	5.15	0.32	2.00	3.43	0.43	1.41	2.71	0.56
Chair C	4.61	5.00	0.29	2.09	3.35	0.43	1.32	2.69	0.59
Chair D	4.50	5.18	0.32	1.86	2.97	0.49	1.66	2.93	0.52
平均	3.94	4.62	0.35	1.72	3.04	0.47	1.22	2.50	0.59
Table A	7.41	6.34	0.25	2.11	3.38	0.41	1.64	2.90	0.50
Table B	5.38	5.64	0.25	2.22	3.54	0.36	1.05	2.41	0.55
Table C	1.57	2.92	0.41	2.21	2.88	0.43	0.53	1.84	0.67
Table D	4.44	4.96	0.32	2.56	3.56	0.40	0.89	2.17	0.62
平均	4.70	4.96	0.31	2.28	3.34	0.40	1.03	2.33	0.59

图6 与当前技术水平代表方法在自重建效果上的定性比较((a) AtlasNet；(b) LOGAN；(c) CFPDAE-本文方法；(d)真值)

Fig. 6 Qualitative comparison of self-reconstruction results with state-of-the-art methods ((a) AtlasNet; (b) LOGAN; (c) CFPDAE-Ours; (d) Ground truth)

图7 对应关系匹配效果定性分析((a)颜色编码；(b) CFPDAE-本文方法；(c) AtlasNet；(d) ShapeFlow)

Fig. 7 Qualitative analysis of correspondence matching results ((a) Color coding; (b) CFPDAE-Ours; (c) AtlasNet; (d) ShapeFlow)

表2 对应关系匹配效果定量比较

Table 2 Qualitative analysis of correspondence matching results

类别	AtlasNet		CFPDAE (本文方法)
类别	$\overline{\text{spread}}\downarrow $	$\overline{\text{shift}}\downarrow $	$\overline{\text{spread}}\downarrow $	$\overline{\text{shift}}\downarrow $
air plane A	0.40	1.21	0.29	0.79
air plane B	0.17	1.13	0.19	0.61
air plane C	0.48	0.98	0.23	0.68
air plane D	0.30	1.07	0.16	0.60
平均	0.34	1.10	0.22	0.67

表3 多阶段监督对深度神经网络学习效果的消融实验分析(CD: ×1000, EMD: ×100)

Table 3 Ablation study analysis of multi-stage supervision on deep neural network learning performance (CD: ×1000, EMD: ×100)

多阶段监督的使用情况	CD↓	EMD↓	IoU↑
{}	1.52	2.73	0.48
{16}	0.96	2.47	0.57
{16,64}	0.78	2.15	0.62
{16,256}	0.76	2.06	0.63
{16,64,256}	0.64	1.97	0.65
{16,64,256,1024}	0.51	1.68	0.71

图8 类别显著区域定位应用示例

Fig. 8 Examples of category-specific salient region localization

图9 形状几何迁移应用示例((a)直背椅迁移到扶手椅；(b)直背椅迁移到旋转椅；(c)沙发迁移到转椅)

Fig. 9 Examples of shape geometry transfer applications ((a) Straight chair to armchair; (b) Straight chair to swivel chair; (c) Sofa to swivel chair)

图10 同类三维模型共编辑应用示例(一类直背椅被批量添加扶手) ((a)原模型；(b)目标模型)

Fig. 10 Example of co-editing application for similar 3D models (armrests are added to a set of Straight Chairs) ((a) Source model; (b) Target model)

参考文献 45

[1]	WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh: generating 3D mesh models from single RGB images[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 55-71.
[2]	LI J, XU K, CHAUDHURI S, et al. Grass: generative recursive autoencoders for shape structures[J]. ACM Transactions on Graphics (TOG), 2017, 36(4): 52.
[3]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5105-5114.
[4]	BEN-CHEN M, GOTSMAN C. Characterizing shape using conformal factors[C]// The 1st Eurographics Conference on 3D Object Retrieval. Goslar: Eurographics Association, 2008: 1-8.
[5]	HUANG Q X, WICKE M, ADAMS B, et al. Shape decomposition using modal analysis[J]. Computer Graphics Forum, 2009, 28(2): 407-416.
[6]	KATZ S, TAL A. Hierarchical mesh decomposition using fuzzy clustering and cuts[J]. ACM Transactions on Graphics, 2003, 22(3): 954-961.
[7]	SHAPIRA L, SHALOM S, SHAMIR A, et al. Contextual part analogies in 3D objects[J]. International Journal of Computer Vision, 2010, 89(2/3): 309-326.
[8]	ZHANG J Y, ZHENG J M, WU C L, et al. Variational mesh decomposition[J]. ACM Transactions on Graphics, 2012, 31(3): 21.
[9]	KALOGERAKIS E, HERTZMANN A, SINGH K. Learning 3D mesh segmentation and labeling[J]. ACM Transactions on Graphics, 2010, 29(4): 102.
[10]	KALOGERAKIS E, AVERKIOU M, MAJI S, et al. 3D shape segmentation with projective convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6630-6639.
[11]	QI C R, SU H, KAICHUN M, et al. PointNet: deep learning on point sets for 3d classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 11-85.
[12]	YI L, SU H, GUO X W, et al. SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6584-6592.
[13]	WU Z R, SONG S R, KHOSLA A, et al.3D ShapeNets: a deep representation for volumetric shapes[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920.
[14]	XIAO Y P, LAI Y K, ZHANG F L, et al. A survey on deep geometry learning: from a representation perspective[J]. Computational Visual Media, 2020, 6(2): 113-133.
[15]	WANG J Y, CHENG Z Y, ZHAO N, et al. On-the-fly point feature representation for point clouds analysis[C]// The 32nd ACM International Conference on Multimedia. New York: ACM, 2024: 9204-9213.
[16]	CHEN Y R, WEI P J, LIU Z H, et al. FASTC: a fast attentional framework for semantic traversability classification using point cloud[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2406.16564.
[17]	YUAN D, FERMÜLLER C, RABBANI T, et al. A linear time and space local point cloud geometry encoder via vectorized kernel mixture (VecKM)[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2404.01568.
[18]	REN D Y, MA Z, CHEN Y P, et al. Spiking PointNet: spiking neural networks for point clouds[C]// The 37th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2023: 1811.
[19]	QIN C, YOU H X, WANG L C, et al. PointDAN: a multi-scale 3D domain adaption network for point cloud representation[EB/OL]. [2024-03-18]. https://arxiv.org/abs/1911.02744.
[20]	LIU X H, HAN Z Z, LIU Y S, et al. Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network[C]// The 33th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 8778-8785.
[21]	GUO Z, FENG C C. Using multi-scale and hierarchical deep convolutional features for 3D semantic classification of TLS point clouds[J]. International Journal of Geographical Information Science, 2020, 34(4): 661-680.
[22]	GADELHA M, WANG R, MAJI S. Multiresolution tree networks for 3D point cloud processing[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 105-122.
[23]	YIN K X, CHEN Z Q, HUANG H, et al. LOGAN: unpaired shape transform in latent overcomplete space[J]. ACM Transactions on Graphics, 2019, 38(6): 198.
[24]	SINHA A, UNMESH A, HUANG Q X, et al. SurfNet: generating 3D shape surfaces using deep residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 791-800.
[25]	WANG W Y, CEYLAN D, MECH R, et al. 3DN: 3D deformation network[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1038-1046.
[26]	YUMER M E, MITRA N J. Learning semantic deformation flows with 3D convolutional networks[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 294-311.
[27]	YANG Y Q, FENG C, SHEN Y R, et al. FoldingNet: point cloud auto-encoder via deep grid deformation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 206-215.
[28]	GROUEIX T, FISHER M, KIM V G, et al. A papier-mâché approach to learning 3D surface generation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 216-224.
[29]	MEHR E, JOURDAN A, THOME N, et al. DiscoNet: shapes learning on disconnected manifolds for 3D editing[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3473-3482.
[30]	YIN K X, HUANG H, COHEN-OR D, et al. P2P-Net: bidirectional point displacement net for shape transform[J]. ACM Transactions on Graphics, 2018, 37(4): 152.
[31]	JIANG C M, HUANG J W, TAGLIASACCHI A, et al. ShapeFlow: learnable deformations among 3D shapes[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 9745-9757.
[32]	NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Occupancy flow: 4D reconstruction by learning particle dynamics[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5378-5388.
[33]	YANG G D, HUANG X, HAO Z K, et al. PointFlow: 3D point cloud generation with continuous normalizing flows[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2019: 4540-4549.
[34]	CHEN R T Q, RUBANOVA Y, BETTENCOURT J, et al. Neural ordinary differential equations[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018, 31: 6572-6583.
[35]	GRATHWOHL W, CHEN R T Q, BETTENCOURT J, et al. FFJORD: free-form continuous dynamics for scalable reversible generative models[EB/OL]. [2024-03-18]. https://arxiv.org/abs/1810.01367.
[36]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99-106.
[37]	GARZIA S, RYGIEL P, DUMMER S, et al. Neural fields for continuous periodic motion estimation in 4D cardiovascular imaging[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2407.20728.
[38]	MARTEL J N P, LINDELL D B, LIN C Z, et al. ACORN: adaptive coordinate networks for neural scene representation[EB/OL]. [2024-03-18]. https://arxiv.org/abs/2105.02788.
[39]	董相涛, 马鑫, 潘成伟, 等. 室外大场景神经辐射场综述[J]. 图学学报, 2024, 45(4): 631-649. DOI
	DONG X T, MA X, PAN C W, et al. A review of neural radiance fields for outdoor large scenes[J]. Journal of Graphics, 2024, 45(4): 631-649. (in Chinese) DOI
[40]	成欢, 王硕, 李孟, 等. 面向自动驾驶场景的神经辐射场综述[J]. 图学学报, 2023, 44(6): 1091-1103. DOI
	CHENG H, WANG S, LI M, et al. A review of neural radiance field for autonomous driving scene[J]. Journal of Graphics, 2023, 44(6): 1091-1103. (in Chinese) DOI
[41]	YI L, KIM V G, CEYLAN D, et al. A scalable active framework for region annotation in 3D shape collections[J]. ACM Transactions on Graphics, 2016, 35(6): 210.
[42]	WANG P S, LIU Y, GUO Y X, et al. O-CNN: octree-based convolutional neural networks for 3D shape analysis[J]. ACM Transactions on Graphics, 2017, 36(4): 72.
[43]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7132-7141.
[44]	KERBL B, KOPANAS G, LEIMKÜHLER T, et al. 3D Gaussian splatting for real-time radiance field rendering[J]. ACM Transactions on Graphics, 2023, 42(4): 139.
[45]	CROITORU F A, HONDRU V, IONESCU R T, et al. Diffusion models in vision: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10850-10869.

Cloud Sphere: 一种基于渐进式变形自编码的三维模型表征方法

Cloud Sphere: a 3D shape representation method via progressive deformation

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 45

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李琼, 考月英, 张莹, 徐沛. 面向无人机航拍图像的目标检测研究综述[J]. 图学学报, 2024, 45(6): 1145-1164.
[2]	刘灿锋, 孙浩, 东辉. 结合Transformer与Kolmogorov Arnold网络的分子扩增时序预测研究[J]. 图学学报, 2024, 45(6): 1256-1265.
[3]	宋思程, 陈辰, 李晨辉, 王长波. 基于密度图多目标追踪的时空数据可视化[J]. 图学学报, 2024, 45(6): 1289-1300.
[4]	许丹丹, 崔勇, 张世倩, 刘雨聪, 林予松. 优化医学影像三维渲染可视化效果：技术综述[J]. 图学学报, 2024, 45(5): 879-891.
[5]	胡凤阔, 叶兰, 谭显峰, 张钦展, 胡志新, 方清, 王磊, 满孝锋. 一种基于改进YOLOv8的轻量化路面病害检测算法[J]. 图学学报, 2024, 45(5): 892-900.
[6]	刘义艳, 郝婷楠, 贺晨, 常英杰. 基于DBBR-YOLO的光伏电池表面缺陷检测[J]. 图学学报, 2024, 45(5): 913-921.
[7]	翟永杰, 李佳蔚, 陈年昊, 王乾铭, 王新颖. 融合改进Transformer的车辆部件检测方法[J]. 图学学报, 2024, 45(5): 930-940.
[8]	姜晓恒, 段金忠, 卢洋, 崔丽莎, 徐明亮. 融合先验知识推理的表面缺陷检测[J]. 图学学报, 2024, 45(5): 957-967.
[9]	熊超, 王云艳, 罗雨浩. 特征对齐与上下文引导的多视图三维重建[J]. 图学学报, 2024, 45(5): 1008-1016.
[10]	胡欣, 常娅姝, 秦皓, 肖剑, 程鸿亮. 基于改进YOLOv8和GMM图像点集匹配的双目测距方法[J]. 图学学报, 2024, 45(4): 714-725.
[11]	牛为华, 郭迅. 基于改进YOLOv8的船舰遥感图像旋转目标检测算法[J]. 图学学报, 2024, 45(4): 726-735.
[12]	李滔, 胡婷, 武丹丹. 结合金字塔结构和注意力机制的单目深度估计[J]. 图学学报, 2024, 45(3): 454-463.
[13]	朱光辉, 缪君, 胡宏利, 申基, 杜荣华. 基于自增强注意力机制的室内单图像分段平面三维重建[J]. 图学学报, 2024, 45(3): 464-471.
[14]	王稚儒, 常远, 鲁鹏, 潘成伟. 神经辐射场加速算法综述[J]. 图学学报, 2024, 45(1): 1-13.
[15]	王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77.