基于Transformer的三角形网格分类分割网络

doi:10.11996/JG.j.2095-302X.2024010078

摘要/Abstract

摘要：

三角形网格是一种重要的几何数据结构，能有效地表达三维模型的形状细节，但三角形网格面元素的分布并不规则，因此将现有的深度神经网络直接应用到网格上较为困难。针对三角形网格不规则的结构问题，直接将网格的面作为Token，提出一种将Transformer应用于三角形网格的深度神经网络。首先，将面的重心坐标或谱域特征作为位置信息，融合其内蕴特征作为输入特征，并对输入特征位置嵌入；其次，利用自注意力模块提取全局特征，利用面卷积模块提取局部特征，以增强网络局部特征的提取能力；最后，融合局部特征和全局特征构建应用于三角形网格上的分类和分割深度神经网络。在SHREC分类数据集和COSEG分割数据集上的实验结果表明，该方法准确率较高且可以有效地提升训练速度。

石家庄铁道大学王辉教授及其学生李佳琦等提出一种将Transformer应用于三角形网格的分类分割网络。首先，将面的重心坐标或谱域特征作为位置信息，融合其内蕴特征作为输入特征，并对输入特征位置嵌入；其次，利用自注意力模块提取全局特征，利用面卷积模块提取局部特征；最后，融合局部特征和全局特征构建深度神经网络。实验结果表明，该方法准确率较高且有效地提升了训练速度。

关键词: 几何深度学习, Transformer, 三角形网格, 三维形状分类, 三维形状分割

Abstract:

Triangular mesh is an important geometric data structure for effectively expressing the shape details of 3D models. However, the irregular distribution of surface elements poses a challenge in directly applying existing neural networks to triangular meshes. To address the irregular structure of triangular meshes, taking the mesh surface as Token directly, a deep neural network based on Transformer for triangular meshes is proposed. Firstly, the coordinates for the center of gravity or spectral domain features of the face are utilized as the position information, incorporating its intrinsic features as the input feature, and followed by the position embedding of the input feature. Secondly, the global feature is extracted through a self-attention module, and a face convolution module was employed to extract local features, thereby enhancing the ability to extract local features. Finally, integrating the local and global features, the classification and segmentation deep neural network for triangular meshes is constructed. The experimental results on the SHREC classification dataset and COSEG segmentation dataset demonstrate the proposed method’s high accuracy and its effectiveness in improving the training speed.

Key words: geometry deep learning, Transformer, triangular mesh, 3D model classification, 3D model segmentation

中图分类号:

TP391

李佳琦, 王辉, 郭宇. 基于Transformer的三角形网格分类分割网络[J]. 图学学报, 2024, 45(1): 78-89.

LI Jiaqi, WANG Hui, GUO Yu. Classification and segmentation network based on Transformer for triangular mesh[J]. Journal of Graphics, 2024, 45(1): 78-89.

图/表 28

图1 基于Transformer的分类分割网络架构

Fig. 1 Classification and segmentation network architecture based on Transformer

图2 相同模型不同位置编码特征通道的可视化

Fig. 2 Visualization of encoded feature channels at different positions of the same model ((a) XYZ; (b) HKS)

图3 不同模型X坐标特征通道的可视化((a)模型1；(b)模型2；(c)模型3)

Fig. 3 Visualization of X-coordinate feature channels of different models ((a) Model 1; (b) Model 2; (c) Model 3)

图4 不同模型使用热核特征的位置信息可视化((a)模型1；(b)模型2；(c)模型3)

Fig. 4 Visualization of position information with heat kernel signature of different models ((a) Model 1; (b) Model 2; (c) Model 3)

图5 面卷积示意图

Fig. 5 Face-based CNN diagram

图6 自注意力模块框架图

Fig. 6 Self-attention module frame diagram

图7 Cube engraving数据集可视化

Fig. 7 Visualization of Cube engraving datasets

图8 SHREC’11数据集可视化((a)兔子；(b)鲨鱼；(c)双球)

Fig. 8 Visualization of SHREC’11 datasets ((a) Rabbit; (b) Shark; (c) Twoballs)

图9 COSEG数据集可视化((a)外星人；(b)花瓶；(c)椅子)

Fig. 9 Visualization of COSEG datasets ((a) Tele-aliens; (b) Vases; (c) Chairs)

图10 人体分割数据集可视化

Fig. 10 Visualization of human body segmentation datasets

表1 在SHREC’11和Cube engraving数据集上的分类结果/%

Table 1 Classification result on SHREC’11 and Cube engraving datasets/%

方法	Split10	Cube
MDC-GCN^[23]	99.2	95.0
MeshNet++^[29]	99.8	98.5
LaplacianNet^[24]	90.3	-
HodgeNet^[21]	94.7	-
FPCNN^[26]	97.1	97.1
SCSL^[27]	97.7	97.0
DiffusionNet^[22]	99.7	85.6
SubdivNet^[31]	100	100
Face-based CNN^[32]	100	99.4
本文方法-XYZ	99.7	97.1
本文方法-HKS	100	98.5

图11 注意力权重可视化

Fig. 11 Visualization of attention weight ((a) HKS; (b) XYZ)

表2 在COSEG数据集上的分割结果/%

Table 2 Segmentation result on COSEG datasets/%

方法	花瓶	椅子	外星人
MeshCNN^[30]	92.4	93.0	96.3
PD-MeshNet^[28]	95.4	97.2	98.1
HodgeNet^[21]	90.3	95.7	96.0
DiffusionNet^[22]	-	96.8	-
Face-based CNN^[32]	95.9	99.2	97.8
本文方法-XYZ	95.9	99.1	96.7
本文方法-HKS	94.9	97.8	94.1

图12 COSEG椅子数据集上的分割结果可视化

Fig. 12 Visualization of segmentation results on the COSEG chair datasets

图13 不同位置编码注意力权重可视化

Fig. 13 Visualization of attention weight encoded in different positions ((a) HKS; (b) XYZ)

图14 不同查询面的注意力权重可视化

Fig. 14 Visualization of attention weight for different query surfaces

表3 在原始网格上的分割结果/%

Table 3 Segmentation results on the original grid/%

方法	外星人
MeshCNN^[30]	94.4
PD-MeshNet^[28]	89.0
LaplacianNet^[24]	93.9
NGD-Transformer^[33]	94.3
SubdivNet^[31]	97.3
Face-based CNN^[32]	96.0
本文方法-XYZ	95.5

图15 投影到原始网格上的分割可视化((a)简化网格；(b)原始网格；(c) Ground truth)

Fig. 15 Segmentation visualization projected onto the original grid ((a) Simplified mesh; (b) Initial mesh; (c) Ground truth)

表4 在人体分割数据集上的结果/%

Table 4 Result on Human body segmentation datasets/%

方法	Human
MeshCNN^[30]	85.4
PD-MeshNet^[28]	85.6
HodgeNet^[21]	85.0
MDC-GCN^[23]	94.0
DiffusionNet^[22]	85.0*
SubdivNet^[31]	91.7
Face-based CNN^[32]	87.4
本文方法-XYZ	85.0
本文方法-HKS	84.5

表5 平均训练时间/ms

Table 5 Mean training time/ms

方法	SHREC’11	Human
Face-based CNN^[32]	56	170
DiffusionNet^[22]	31	62
本文方法	13	46

表6 CPU内存占用/MB

Table 6 CPU memory usage/MB

方法	SHREC’11	Human
Face-based CNN^[32]	875	1249
DiffusionNet^[22]	561	593
本文方法	893	1001

表7 网络模型参数量/MB

Table 7 The number of network model parameters/MB

方法	SHREC’11	Human
Face-based CNN^[32]	0.312	8.572
DiffusionNet^[22]	0.119	0.464
本文方法	1.642	1.637

表8 三维坐标的预处理方法/%

Table 8 Preprocessing method of three-dimensional coordinate/%

方法	花瓶	椅子	外星人
面特征预处理	95.38	92.65	90.00
无	95.13	94.79	93.05
归一化	95.85	99.06	96.68

表9 热核特征HKS的预处理方法/%

Table 9 Preprocessing method of heat kernel signature/%

特征通道	归一化	未归一化
3	94.78	94.85
8	94.35	94.52
16	94.35	94.67

表10 位置编码融合实验/%

Table 10 Location coding fusion experiment/%

位置编码	花瓶
XYZ	95.85
HKS	94.85
XYZ+HKS	95.12

表11 不同位置嵌入的准确率对比

Table 11 Accuracy comparison of embeddings in different positions

特征通道	训练轮数	花瓶/%
KNN	600	95.00
DiffusionNet^[22]	-	-
面卷积	200	95.85

表12 不同自注意力模块数量和输出特征通道准确率/%

Table 12 Different number of self-attention modules and output feature channel accuracy/%

自注意力模块数量	1/4	1
1	95.01	-
2	95.85	94.26
3	95.28	93.98
4	95.10	-

表13 不同特征融合方式准确率/%

Table 13 Accuracy of different feature fusion methods/%

特征融合	花瓶
最大池化^[11] (全局)	94.14
最大池化+平均池化^[16] (全局)	95.00
面卷积(局部)	95.85

参考文献 36

[1]	王金祥, 付立军, 尹鹏滨, 等. 基于CNN与Transformer的医学图像分割[J]. 计算机系统应用, 2023, 32(4): 141-148.
	WANG J X, FU L J, YIN P B, et al. Medical image segmentation based on CNN and transformer[J]. Computer Systems and Applications, 2023, 32(4): 141-148 (in Chinese).
[2]	赵伟, 王文娟, 任彦凝, 等. 基于改进Transformer的生成式文本摘要模型[J]. 重庆邮电大学学报: 自然科学版, 2023, 35(1): 185-192.
	ZHAO W, WANG W J, REN Y N, et al. A generative abstractive summarization method based on the improved Transformer[J]. Journal of Chongqing University of Posts and Telecommunications: Natural Science Edition, 2023, 35(1): 185-192 (in Chinese).
[3]	陈为, 杨鑫. 计算机图形学与人工智能融合引领三维数字中国新变革[J]. 中国计算机学会通讯, 2021, 17(11): 8-10.
	CHEN W, YANG X. The integration of computer graphics and artificial intelligence leads the new transformation of 3D digital China[J]. Communications of the CCF, 2021, 17(11): 8-10 (in Chinese).
[4]	BRONSTEIN M M, BRUNA J, LECUN Y, et al. Geometric deep learning: going beyond euclidean data[J]. IEEE Signal Processing Magazine, 2017, 34(4): 18-42.
[5]	CHEN Y, ZHAO J Y, QIU Q L. A transformer-based capsule network for 3D part-whole relationship learning[J]. Entropy, 2022, 24(5): 678. DOI URL
[6]	LIAN Z, GODIL A, BUSTOS B, et al. SHREC’11 track: shape retrieval on non-rigid 3D watertight meshes[EB/OL]. [2023- 05-18]. http://reuter.mit.edu/blue/papers/shrec11/shrec11.pdf.
[7]	龚思洁, 贺炯臻, 陈小雕. 基于能量优化的三维网格模型分割方法[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 11-18.
	GONG S J, HE J Z, CHEN X D. 3D mesh segmentation based on energy optimization[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 11-18 (in Chinese).
[8]	刘洋, 高林, 韩晓光. 三维几何学习[J]. 中国计算机学会通讯, 2021, 17(11): 53-62.
	LIU Y, GAO L, HAN X G. 3D geometry learning[J]. Communications of the CCF, 2021, 17(11): 53-62 (in Chinese).
[9]	XIAO Y P, LAI Y K, ZHANG F L, et al. A survey on deep geometry learning: from a representation perspective[J]. Computational Visual Media, 2020, 6(2): 113-133. DOI
[10]	WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920.
[11]	SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3D shape recognition[C]// 2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 945-953.
[12]	QI C R, SU H, NIEßNER M, et al. Volumetric and multi-view CNNs for object classification on 3D data[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 5648-5656.
[13]	CHARLES R Q, HAO S, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85.
[14]	QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 5105-5114.
[15]	LI Y, BU R, SUN M, et al. PointCNN: convolution on x-transfor-med points[EB/OL]. [2023-05-18]. https://www.semanticscholar.org/reader/6400c36efdb8a66b401b6aef26c057227266fddd.
[16]	THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds[C]// 2019 IEEE International Conference on Computer Vision. New York: IEEE Press, 2019: 6410-6419.
[17]	GUO M H, CAI J X, LIU Z N, et al. PCT: point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187-199. DOI
[18]	杨世强, 杨江涛, 李卓, 等. 基于LSTM神经网络的人体动作识别[J]. 图学学报, 2021, 42(2): 174-181.
	YANG S Q, YANG J T, LI Z, et al. Human action recognition based on LSTM neural network[J]. Journal of Graphics, 2021, 42(2): 174-181 (in Chinese). DOI
[19]	王辉, 宋佳豪, 丁铂栩, 等. 三角形网格序列表示的人体动作识别[J]. 计算机辅助设计与图形学学报, 2022, 34(11): 1723-1730.
	WANG H, SONG J H, DING B X, et al. Human action recognition of triangle mesh sequence representation[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(11): 1723-1730 (in Chinese).
[20]	LAHAV A, TAL A. Mesh Walker: deep mesh understanding by random walks[J]. ACM Transactions on Graphics, 2020, 39(6): 263:1-263:13.
[21]	SMIRNOV D, SOLOMON J. HodgeNet: learning spectral geometry on triangle meshes[J]. ACM Transactions on Graphics, 2021, 40(4): 166:1-166:11.
[22]	SHARP N, ATTAIKI S, CRANE K, et al. DiffusionNet: discretization agnostic learning on surfaces[J]. ACM Transactions on Graphics, 2022, 41(3): 27:1-27:16.
[23]	QIAO Y L, GAO L, YANG J, et al. Learning on 3D meshes with Laplacian encoding and pooling[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(2): 1317-1327. DOI URL
[24]	MARON H, GALUN M, AIGERMAN N, et al. Convolutional neural networks on surfaces via seamless toric covers[J]. ACM Transactions on Graphics, 2017, 36(4): 71:1-71:10.
[25]	EZUZ D, SOLOMON J, KIM V G, et al. GWCNN: a metric alignment layer for deep shape analysis[J]. Computer Graphics Forum, 2017, 36(5): 49-57.
[26]	MITCHEL T W, KIM V G, KAZHDAN M. Field convolutions for surface CNNs[C]// 2021 IEEE International Conference on Computer Vision. New York: IEEE Press, 2022: 9981-9991.
[27]	TANG W M, QIU G P. Dense graph convolutional neural networks on 3D meshes for 3D object segmentation and classification[J]. Image and Vision Computing, 2021, 114: 104265:1-104265:12. DOI URL
[28]	FENG Y T, FENG Y F, YOU H X, et al. MeshNet: mesh neural network for 3D shape representation[J]. The 33th AAAI Conference on Artificial Intelligence, 2019, 33(1): 8279-8286.
[29]	SINGH V V, SHESHAPPANAVAR S V, KAMBHAMETTU C. MeshNet++: a network with a face[C]// The 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4883-4891.
[30]	HANOCKA R, HERTZ A, FISH N, et al. MeshCNN: a network with an edge[J]. ACM Transactions on Graphics, 2019, 38(4): 90:1-90:12.
[31]	HU S M, LIU Z N, GUO M H, et al. Subdivision-based mesh convolution networks[J]. ACM Transactions on Graphics, 2022, 41(3): 25:1-25:16.
[32]	WANG H, GUO Y, WANG Z Y. Face-based CNN on triangular mesh with arbitrary connectivity[J]. Electronics, 2022, 11(15): 2466. DOI URL
[33]	ZHUANG J F, LIU X F, ZHUANG W. NGD-transformer: navigation geodesic distance positional encoding with self-attention pooling for graph transformer on 3D triangle mesh[J]. Symmetry, 2022, 14(10): 2050:1-2050:15. DOI URL
[34]	CHEN Y, ZHAO J Y, HUANG L F, et al. 3D mesh transformer: a hierarchical neural network with local shape tokens[J]. Neurocomputing, 2022, 514: 328-340. DOI URL
[35]	LIU C M, LUAN W N, FU R H, et al. Attention-embedding mesh saliency[J]. The Visual Computer, 2023, 39(5): 1783-1795. DOI
[36]	MILANO F, LOQUERCIO A, ROSINOL A, et al. Primal-dual mesh convolutional neural networks[C]// The 34th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2020: 952-963.