基于检索与变形技术的三维模型重构

doi:10.11996/JG.j.2095-302X.2026020368

图学学报 ›› 2026, Vol. 47 ›› Issue (2): 368-379.DOI: 10.11996/JG.j.2095-302X.2026020368

• 计算机图形学与虚拟现实 • 上一篇下一篇

基于检索与变形技术的三维模型重构

庞敏¹^,², 李振堂¹^,², 张元¹^,², 崔晓康¹^,², 熊风光¹^,²()

¹ 中北大学计算机科学与技术学院，山西太原 030051
² 机器视觉与虚拟现实山西省重点实验室，山西太原 030051

收稿日期:2025-06-16 接受日期:2025-11-04 出版日期:2026-04-30 发布日期:2026-05-20
通讯作者:熊风光，E-mail：hopenxfg@nuc.edu.cn
基金资助:
国家自然科学基金(62272426);山西省科技重大专项计划(202201150401021);山西省青年基金(202303021212189);山西省青年基金(202303021212206)

3D model reconstruction based on retrieval and deformation techniques

PANG Min¹^,², LI Zhentang¹^,², ZHANG Yuan¹^,², CUI Xiaokang¹^,², XIONG Fengguang¹^,²()

¹ School of Computer Science and Technology, North University of China, Taiyuan Shanxi 030051, China
² Shanxi Key Laboratory of Machine Vision & Virtual Reality, Taiyuan Shanxi 030051, China

Received:2025-06-16 Accepted:2025-11-04 Published:2026-04-30 Online:2026-05-20
Contact: XIONG Fengguang，E-mail：hopenxfg@nuc.edu.cn
Supported by:
National Natural Science Foundation of China(62272426);Shanxi Province Science and Technology Major Special Project(202201150401021);Youth Fund of Shanxi Province(202303021212189);Youth Fund of Shanxi Province(202303021212206)

摘要/Abstract

摘要：

随着虚拟现实(VR)、增强现实(AR)技术的快速发展，对高质量三维模型的需求日益增加。传统建模方法存在处理速度慢、复杂形状适应性差等问题。因此，提出一种基于检索与变形的三维模型构建方法。首先构建一个基于语义关键点的三维模型检索框架，以模型具有的稀疏关键点为基础构建变形感知嵌入空间，实现全局特征与局部特征的动态聚合，同时将自适应全局通道注意力(AGCA)嵌入Transformer构建联合注意力机制，以提升模型的表达能力和检索精度；然后针对检索结果模型设计一套基于DGCNN关键点驱动神经笼变形算法，结合自注意力机制计算关键点对局部支撑区域内顶点的影响权重，根据特征关键点与神经笼结构之间的变形映射，驱动神经笼变形，实现精细且受约束的形变控制；最后结合倒角距离和EMD距离约束，改进损失函数，在关注局部特征差异同时，更准确地对齐几何细节，实现更精确的三维模型重建。实验验证在开源数据集Partnet和Scan2CAD上进行，并和U-RED，ShapeFlow和KP-RED等网络进行效果对比。实验结果表明本文提出的三维模型构建方法能够有效应对噪声与遮挡问题，其中Partnet数据集上损失函数的平均值分别降低33.33%和41.67%；在Scan2CAD上，损失函数的平均值基于baseline降低了3.6%。

关键词: 三维模型检索, 深度学习, 神经笼变形, 自注意力机制, 损失函数

Abstract:

As Virtual Reality (VR) and Augmented Reality (AR) technologies advance rapidly, the demand for high-quality 3D models has increased significantly. Traditional 3D modeling methods have drawbacks such as slow processing speed and poor adaptability to complex shapes. Consequently, a novel 3D model construction method based on 3D model retrieval and deformation was proposed. Firstly, a 3D model retrieval framework based on semantic keypoints was constructed, where sparse geometric feature points with semantic consistency were utilized to build a deformation-aware embedding space, enabling dynamic aggregation of global and local features. Meanwhile, Adaptive Global-CHANNEL Attention (AGCA) was embedded into a Transformer to form a joint attention mechanism, thereby enhancing the model’s expressiveness and retrieval accuracy. Then, for the retrieved models, a DGCNN-based keypoint-driven neural cage deformation algorithm was designed. The self-attention mechanism was utilized to calculate the influence weights of keypoints on vertices within local support regions. This process established a deformation mapping between feature keypoints and the neural cage structure, driving neural cage deformation to achieve fine-grained and constrained shape control. Finally, the loss function was improved by incorporating Chamfer distance and EMD distance constraints. This ensured that while focusing on local feature differences, geometric details were more accurately aligned, resulting in more precise 3D model reconstruction. Experiments were conducted on the Partnet and the Scan2CAD datasets to compare the proposed method with existing networks such as U-RED, ShapeFlow, and KP-RED. The results demonstrated that the proposed 3D model construction method could effectively handle noise and occlusion. The average value of the loss function was reduced by 33.33% and 41.67% on the Partnet dataset. moreover, on the Scan2CAD dataset, the average loss value was reduced by 3.6% compared with the baseline.

Key words: 3D model retrieval, deep learning, deformation of nerve cage, self attention mechanism, loss function

中图分类号:

庞敏, 李振堂, 张元, 崔晓康, 熊风光. 基于检索与变形技术的三维模型重构[J]. 图学学报, 2026, 47(2): 368-379.

PANG Min, LI Zhentang, ZHANG Yuan, CUI Xiaokang, XIONG Fengguang. 3D model reconstruction based on retrieval and deformation techniques[J]. Journal of Graphics, 2026, 47(2): 368-379.

图/表 13

图1 本文提出的总体网络架构

Fig. 1 The proposed overall network architecture

图2 TRA网络

Fig. 2 The TRA network

图3 AGCA模块

Fig. 3 AGCA module ((a) AGCA; (b) AGCM)

图4 DG-Cage模块

Fig. 4 DG-Cage module

图5 DGCNN网络结构

Fig. 5 DGCNN network architecture

图6 笼顶点的影响权重表示图

Fig. 6 Illustration of influence weights of cage vertices

表1 PartNet数据集上完整形状的联合检索和变形结果的损失函数度量

Table 1 Loss function metrics of joint retrieval-deformation results on complete shapes in the PartNet dataset

方法	椅子	桌子	柜子	平均
文献[20]	0.638	0.629	0.688	0.637
U-RED^[13]	0.834	0.326	0.474	0.551
ShapeFlow^[14]	0.238	0.400	0.514	0.340
KP-RED^[15]	0.122	0.163	0.141	0.142
Ours	0.095	0.123	0.148	0.122

图7 完整形状上的实验结果

Fig. 7 Experimental results on complete shapes

表2 PartNet数据集中部分形状的检索与变形结果的损失函数度量

Table 2 Loss function metrics of joint retrieval-deformation results on partial shapes in the PartNet dataset

方法	椅子	桌子	柜子	平均
文献[20]	0.158	0.190	0.676	0.210
U-RED^[13]	0.227	0.132	0.316	0.207
ShapeFlow^[14]	0.230	0.302	0.345	0.265
KP-RED^[15]	0.093	0.110	0.069	0.091
Ours	0.062	0.089	0.064	0.072

图8 部分形状上的实验结果

Fig. 8 Experimental results on partial shapes

表3 在Scan2CAD数据集上完整形状的联合检索和变形结果的损失函数度量

Table 3 Loss function metrics of joint retrieval-deformation results on complete shapes in the Scan2CAD dataset

方法	椅子	桌子	柜子	平均
U-RED^[13]	0.221	0.207	0.211	0.213
ShapeFlow^[14]	0.205	0.193	0.196	0.198
KP-RED^[15]	0.173	0.164	0.165	0.167
Ours	0.152	0.161	0.169	0.161

表4 完整形状的消融实验

Table 4 Ablation study on complete shapes

序号	方法			实例			平均
序号	TRA	DG-Cage	EMD-CD	椅子	桌子	柜子	平均
1				0.122	0.163	0.148	0.144
2	√			0.115	0.139	0.140	0.131
3		√		0.104	0.147	0.141	0.130
4			√	0.122	0.146	0.149	0.139
5	√	√		0.103	0.139	0.139	0.127
6	√	√	√	0.095	0.123	0.141	0.120

表5 部分形状的消融实验

Table 5 Ablation study on partial shapes

序号	方法			实例			平均
序号	TRA	DG-Cage	EMD-CD	椅子	桌子	柜子	平均
1				0.093	0.110	0.069	0.091
2	√			0.073	0.089	0.068	0.077
3		√		0.086	0.085	0.069	0.080
4			√	0.072	0.093	0.067	0.077
5	√	√		0.063	0.082	0.067	0.071
6	√	√	√	0.062	0.089	0.064	0.072

参考文献 20

[1]	周伟, 苍慜楠, 程浩宗. 基于AR技术的文物数字化三维图像重建方法[J]. 图学学报, 2025, 46(2): 369-381. DOI
	ZHOU W, CANG M N, CHENG H Z. Research on the method of 3D image reconstruction for cultural relics based on AR technology[J]. Journal of Graphics, 2025, 46(2): 369-381 (in Chinese). DOI
[2]	WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 146.
[3]	王亚, 郑博文, 张欣. 基于多模态融合的三维模型检索算法研究[J]. 计算机应用研究, 2021, 38(3): 685-688, 695.
	WANG Y, ZHENG B W, ZHANG X. 3D model retrieval algorithm based on multimodal fusion[J]. Application Research of Computers, 2021, 38(3): 685-688, 695 (in Chinese).
[4]	ZHAO Y X, JIAO J C, LI N, et al. MANet: multimodal attention network based point-view fusion for 3D shape recognition[C]// The 25th International Conference on Pattern Recognition. New York: IEEE Press, 2021: 134-141.
[5]	SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 945-953.
[6]	GAO Z, LI Y M, WAN S H. Exploring deep learning for view-based 3D model retrieval[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16(1): 18.
[7]	CHARLES R Q, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114.
[8]	陈素雅, 何宏. 基于特征点动态选择的三维人脸点云模型重建[J]. 计算机应用研究, 2024, 41(2): 629-634.
	CHEN S Y, HE H. 3D face point cloud model reconstruction based on dynamic selection of feature points[J]. Application Research of Computers, 2024, 41(2): 629-634 (in Chinese).
[9]	SORKINE O, ALEXA M. As-rigid-as-possible surface modeling[C]// The 5th Eurographics Symposium on Geometry Processing. Goslar: Eurographics Association, 2007: 109-116.
[10]	PANG M, HE L G, XIONG F G, et al. Developing an image-based 3D model editing method[J]. IEEE Access, 2020, 8: 167950-167964. DOI URL
[11]	DENG Y, YANG J L, TONG X. Deformed implicit field: modeling 3D shapes with learned dense correspondence[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 10281-10291.
[12]	陈海龙, 刘璐, 康星火, 等. 基于神经网络关节点估计的人体参数化三维模型变形方法: CN202011046379.1[P]. 2021-01-15.
	CHEN H L, LIU L, KANG X H, et al. Human body parameterized three-dimensional model deformation method based on neural network joint point estimation: CN202011046379.1[P]. 2021-01-15. (in Chinese).
[13]	DI Y, ZHANG C Y, ZHANG R D, et al. U-RED: unsupervised 3D shape retrieval and deformation for partial point clouds[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 8850-8861.
[14]	JIANG C M, HUANG J W, TAGLIASACCHI A, et al. ShapeFlow: learnable deformations among 3D shapes[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 817.
[15]	ZHANG R D, ZHANG C Y G, DI Y, et al. KP-RED: exploiting semantic keypoints for joint 3D shape retrieval and deformation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20540-20550.
[16]	CHARLES R Q, HAO SU, KAICHUN MO, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[17]	LIU M Y, TUZEL O, VEERARAGHAVAN A, et al. Fast directional chamfer matching[C]// 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2010: 1696-1703.
[18]	MO K C, ZHU S L, CHANG A X, et al. PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 909-918.
[19]	AVETISYAN A, DAHNERT M, DAI A, et al. Scan2CAD: learning CAD model alignment in RGB-D scans[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 2609-2618.
[20]	UY M A, KIM V G, SUNG M, et al. Joint learning of 3D shape retrieval and deformation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 11708-11717.

基于检索与变形技术的三维模型重构

3D model reconstruction based on retrieval and deformation techniques

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	闫康, 曾理, 顾晓清. 基于跨域结构化深度字典学习的图像分类方法[J]. 图学学报, 2026, 47(2): 341-350.
[2]	董文益, 杨伟东, 唐冰慧, 王琦, 肖宏宇. 基于深度学习的肝脏局灶性病变检测方法综述[J]. 图学学报, 2026, 47(1): 1-16.
[3]	翟永杰, 王紫萱, 张祯琪, 周迅琪, 王乾铭. 融合双重注意力与加权动态卷积的车辆损伤分类模型[J]. 图学学报, 2026, 47(1): 17-28.
[4]	潘宇轩, 金锐, 刘雨, 张琳. 基于生成模型的无监督多视点立体视觉网络[J]. 图学学报, 2026, 47(1): 29-38.
[5]	酒明远, 吴国伟, 宋旭光, 李书攀, 徐明亮. 基于不确定性引导的智能强化主动学习图像分类方法[J]. 图学学报, 2026, 47(1): 47-56.
[6]	杨彪, 王学, 官铮, 龙萍. BSD-YOLO：基于动态稀疏注意力与自适应检测头的小目标车辆检测方法[J]. 图学学报, 2026, 47(1): 99-110.
[7]	琚晨, 丁嘉欣, 王泽兴, 李广钊, 管振祥, 张常有. 面向有限元法的图神经网络形函数近似方法[J]. 图学学报, 2025, 46(6): 1161-1171.
[8]	易斌, 张立斌, 刘丹楹, 唐军, 方俊俊, 李雯琦. 基于AMTA-Net的卷制过程激光打孔通风率预测模型[J]. 图学学报, 2025, 46(6): 1224-1232.
[9]	薄文, 琚晨, 刘维青, 张焱, 胡晶晶, 程婧晗, 张常有. 基于退化感知时序建模的装备维保时机预测方法[J]. 图学学报, 2025, 46(6): 1233-1246.
[10]	赵振兵, 欧阳文斌, 冯烁, 李浩鹏, 马隽. 基于类内稀疏先验与改进YOLOv8的绝缘子红外图像检测方法[J]. 图学学报, 2025, 46(6): 1247-1256.
[11]	贺蒙蒙, 张小艳, 李洪安. 基于Mamba结构的轻量级皮肤病变图像分割网络[J]. 图学学报, 2025, 46(6): 1257-1266.
[12]	张馨匀, 张力文, 周李, 罗笑南. 基于图像分块交互的咖啡果实成熟度预测模型[J]. 图学学报, 2025, 46(6): 1274-1280.
[13]	李星辰, 李宗民, 杨超智. 基于可信伪标签微调的测试时适应算法[J]. 图学学报, 2025, 46(6): 1292-1303.
[14]	樊乐翔, 马冀, 周登文. 基于退化分离的轻量级盲超分辨率重建网络[J]. 图学学报, 2025, 46(6): 1304-1315.
[15]	王海涵. 基于YOLOv8-OSRA的钢拱塔表观病害多目标检测方法[J]. 图学学报, 2025, 46(6): 1327-1336.