Journal of Graphics ›› 2026, Vol. 47 ›› Issue (1): 78-89.DOI: 10.11996/JG.j.2095-302X.2026010078
• Image Processing and Computer Vision • Previous Articles Next Articles
ZHU Chenxi1, LU Yinan1, WU Tieru2, GONG Wenyong3, MA Rui2(
)
Received:2025-06-30
Accepted:2025-08-23
Online:2026-02-28
Published:2026-03-16
Contact:
MA Rui
Supported by:CLC Number:
ZHU Chenxi, LU Yinan, WU Tieru, GONG Wenyong, MA Rui. Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification[J]. Journal of Graphics, 2026, 47(1): 78-89.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2026010078
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| ULIP | 86.3 | 83.3 | 80.3 | 75.8 | 72.8 | 62.9 | 65.1 | 24.6 | 9.9 |
| FACT | 82.6 | 77.0 | 72.4 | 69.8 | 68.4 | 67.7 | 67.3 | 18.5 | 5.6 |
| Microshape | 86.9 | 84.6 | 82.8 | 78.3 | 78.5 | 71.5 | 68.6 | 21.1 | 7.0 |
| FILP-3D | 90.5 | 87.1 | 84.5 | 81.8 | 80.9 | 80.2 | 77.6 | 14.3 | 3.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
Table 1 Analysis of experimental results from ShapeNet to ModelNet dataset
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| ULIP | 86.3 | 83.3 | 80.3 | 75.8 | 72.8 | 62.9 | 65.1 | 24.6 | 9.9 |
| FACT | 82.6 | 77.0 | 72.4 | 69.8 | 68.4 | 67.7 | 67.3 | 18.5 | 5.6 |
| Microshape | 86.9 | 84.6 | 82.8 | 78.3 | 78.5 | 71.5 | 68.6 | 21.1 | 7.0 |
| FILP-3D | 90.5 | 87.1 | 84.5 | 81.8 | 80.9 | 80.2 | 77.6 | 14.3 | 3.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |||
| ULIP | 86.3 | 85.6 | 81.7 | 74.0 | 71.7 | 68.1 | 67.6 | 64.5 | 59.5 | 58.4 | 55.2 | 57.5 | 28.8 | 7.7 |
| FACT | 82.4 | 77.2 | 74.5 | 73.1 | 71.3 | 70.4 | 67.2 | 65.2 | 63.8 | 61.8 | 59.9 | 59.8 | 27.4 | 5.2 |
| Microshape | 85.2 | 78.6 | 71.0 | 72.0 | 75.2 | 68.8 | 56.1 | 58.5 | 62.9 | 59.1 | 52.2 | 59.4 | 30.3 | 12.1 |
| FILP-3D | 89.9 | 84.9 | 84.9 | 83.2 | 81.8 | 80.6 | 78.6 | 77.1 | 76.1 | 74.8 | 73.5 | 72.2 | 19.7 | 5.0 |
| 本文方法 | 90.5 | 88.0 | 86.7 | 84.4 | 83.9 | 82.1 | 79.4 | 78.5 | 77.7 | 76.6 | 74.8 | 74.0 | 18.2 | 2.7 |
Table 2 Analysis of experimental results from ShapeNet to CO3D dataset
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |||
| ULIP | 86.3 | 85.6 | 81.7 | 74.0 | 71.7 | 68.1 | 67.6 | 64.5 | 59.5 | 58.4 | 55.2 | 57.5 | 28.8 | 7.7 |
| FACT | 82.4 | 77.2 | 74.5 | 73.1 | 71.3 | 70.4 | 67.2 | 65.2 | 63.8 | 61.8 | 59.9 | 59.8 | 27.4 | 5.2 |
| Microshape | 85.2 | 78.6 | 71.0 | 72.0 | 75.2 | 68.8 | 56.1 | 58.5 | 62.9 | 59.1 | 52.2 | 59.4 | 30.3 | 12.1 |
| FILP-3D | 89.9 | 84.9 | 84.9 | 83.2 | 81.8 | 80.6 | 78.6 | 77.1 | 76.1 | 74.8 | 73.5 | 72.2 | 19.7 | 5.0 |
| 本文方法 | 90.5 | 88.0 | 86.7 | 84.4 | 83.9 | 82.1 | 79.4 | 78.5 | 77.7 | 76.6 | 74.8 | 74.0 | 18.2 | 2.7 |
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无自适应适配器 | 90.5 | 87.9 | 85.5 | 83.0 | 82.7 | 81.9 | 80.8 | 10.7 | 2.6 |
| 无注意力融合 | 90.8 | 87.8 | 85.7 | 83.0 | 82.6 | 77.8 | 77.1 | 15.0 | 4.8 |
| 均无 | 90.5 | 87.1 | 84.5 | 81.8 | 80.9 | 80.2 | 77.6 | 14.3 | 3.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
Table 3 Module effectiveness ablation results
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无自适应适配器 | 90.5 | 87.9 | 85.5 | 83.0 | 82.7 | 81.9 | 80.8 | 10.7 | 2.6 |
| 无注意力融合 | 90.8 | 87.8 | 85.7 | 83.0 | 82.6 | 77.8 | 77.1 | 15.0 | 4.8 |
| 均无 | 90.5 | 87.1 | 84.5 | 81.8 | 80.9 | 80.2 | 77.6 | 14.3 | 3.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无GLU门控机制 | 90.2 | 81.2 | 84.5 | 83.3 | 82.0 | 79.2 | 77.9 | 12.3 | 3.3 |
| 无残差块 | 90.6 | 89.0 | 86.0 | 83.9 | 83.7 | 82.9 | 81.4 | 9.2 | 3.0 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
Table 4 Effectiveness ablation results of GLU gating mechanism and residual blocks
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无GLU门控机制 | 90.2 | 81.2 | 84.5 | 83.3 | 82.0 | 79.2 | 77.9 | 12.3 | 3.3 |
| 无残差块 | 90.6 | 89.0 | 86.0 | 83.9 | 83.7 | 82.9 | 81.4 | 9.2 | 3.0 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无对比学习 | 90.5 | 86.2 | 83.1 | 78.6 | 76.7 | 76.2 | 75.8 | 14.7 | 4.5 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
Table 5 Effectiveness ablation results of contrastive learning
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无对比学习 | 90.5 | 86.2 | 83.1 | 78.6 | 76.7 | 76.2 | 75.8 | 14.7 | 4.5 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无记忆回放 | 90.7 | 84.7 | 79.2 | 79.2 | 77.9 | 76.9 | 69.5 | 21.2 | 7.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
Table 6 Effectiveness ablation results of memory replay
| 方法 | Acc↑ | Δ↓ | Δʹ↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| 无记忆回放 | 90.7 | 84.7 | 79.2 | 79.2 | 77.9 | 76.9 | 69.5 | 21.2 | 7.4 |
| 本文方法 | 90.7 | 88.8 | 87.0 | 84.8 | 84.7 | 83.6 | 82.6 | 8.9 | 2.2 |
| [1] | CHANG A X, FUNKHOUSER T, GUIBAS L, et al. ShapeNet:an information-rich 3D model repository[EB/OL]. [2025- 04-30]. https://arxiv.org/abs/1512.03012.pdf. |
| [2] | DEITKE M, SCHWENK D, SALVADOR J, et al. Objaverse: a universe of annotated 3D objects[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 13142-13153. |
| [3] | REIZENSTEIN J, SHAPOVALOV R, HENZLER P, et al. Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 10881-10891. |
| [4] | UY M A, PHAM Q H, HUA B S, et al. Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1588-1597. |
| [5] | WU T, ZHANG J R, FU X, et al. OmniObject3D: large-vocabulary 3D object dataset for realistic perception, reconstruction and generation[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 803-814. |
| [6] | QI C R, SU H, MO K C, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 77-85. |
| [7] | QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 5105-5114. |
| [8] | TAO X Y, HONG X P, CHANG X Y, et al. Few-shot class-incremental learning[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 12180-12189. |
| [9] | QIN C W, JOTY S. Continual few-shot relation learning via embedding space regularization and data augmentation[C]// The 60th Annual Meeting of the Association for Computational Linguistics. New York: Association for Computational Linguistics, 2022: 2776-2789. |
| [10] |
ZHOU D W, CAI Z W, YE H J, et al. Revisiting class-incremental learning with pre-trained models: generalizability and adaptivity are all you need[J]. International Journal of Computer Vision, 2025, 133(3): 1012-1032.
DOI |
| [11] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// 2019 Conference of the North American Chapter of the Association for Computational Linguistics. New York: Association for Computational Linguistics, 2019: 4171-4186. |
| [12] | HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 15979-15988. |
| [13] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2025-04-30]. http://proceedings.mlr.press/v139/radford21a.html. |
| [14] | HUANG T Y, DONG B W, YANG Y H, et al. CLIP2Point: transfer CLIP to point cloud classification with image-depth pre-training[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 22100-22110. |
| [15] | ZENG Y H, JIANG C H, MAO J G, et al. CLIP2: contrastive language-image-point pretraining from real-world point cloud data[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 15244-15253. |
| [16] | ZHANG R R, GUO Z Y, ZHANG W, et al. PointCLIP: point cloud understanding by clip[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 8542-8552. |
| [17] | CHOWDHURY T, CHERAGHIAN A, RAMASINGHE S, et al. Few-shot class-incremental learning for 3D point cloud objects[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 204-220. |
| [18] |
XU W, HUANG T Y, QU T Y, et al. FILP-3D: enhancing 3D few-shot class-incremental learning with pre-trained vision- language models[J]. Pattern Recognition, 2025, 165: 111558.
DOI URL |
| [19] | LI Y Y, BU R, SUN M C, et al. PointCNN: convolution on X-transformed points[C]// The 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 828-838. |
| [20] | LIU Y C, FAN B, XIANG S M, et al. Relation-shape convolutional neural network for point cloud analysis[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8887-8896. |
| [21] | POULENARD A, RAKOTOSAONA M J, PONTY Y, et al. Effective rotation-invariant point CNN with spherical harmonics kernels[C]// 2019 International Conference on 3D Vision. New York: IEEE Press, 2019: 47-56. |
| [22] | RAO Y M, LU J W, ZHOU J. Spherical fractal convolutional neural networks for point cloud recognition[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 452-460. |
| [23] | WU W X, QI Z G, LI F X. PointConv: deep convolutional networks on 3D point clouds[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9613-9622. |
| [24] | XU Y F, FAN T Q, XU M Y, et al. SpiderCNN: deep learning on point sets with parameterized convolutional filters[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 90-105. |
| [25] | LI G C, MÜLLER M, THABET A, et al. DeepGCNs: can GCNs go as deep as CNNs?[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9266-9275. |
| [26] | WANG Y, SUN Y B, LIU Z W, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): 146. |
| [27] | YU X M, TANG L L, RAO Y M, et al. Point-BERT: pre-training 3D point cloud transformers with masked point modeling[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 19291-19300. |
| [28] | PANG Y T, WANG W X, TAY F E H, et al. Masked autoencoders for point cloud self-supervised learning[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 604-621. |
| [29] | ZHANG R R, GUO Z Y, FANG R Y, et al. Point-M2AE: multi-scale masked autoencoders for hierarchical point cloud pre-training[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 1962. |
| [30] | ZHAO H S, JIANG L, JIA J Y, et al. Point transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 16239-16248. |
| [31] | CHEN K L, LEE C G. Incremental few-shot learning via vector quantization in deep embedded space[EB/OL]. [2025-04-30]. https://dblp.org/db/conf/iclr/iclr2021.html#ChenL21. |
| [32] | CHERAGHIAN A, RAHMAN S, FANG P F, et al. Semantic- aware knowledge distillation for few-shot class-incremental learning[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 2534-2543. |
| [33] | MAZUMDER P, SINGH P, RAI P. Few-shot lifelong learning[C]// The 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 2337-2345. |
| [34] | PENG C, ZHAO K, WANG T R, et al. Few-shot class- incremental learning from an open-set perspective[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 382-397. |
| [35] | XIANG X, TAN Y W, WAN Q, et al. Coarse-to-fine incremental few-shot learning[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 205-222. |
| [36] | ZHANG C, SONG N, LIN G S, et al. Few-shot incremental learning with continually evolved classifiers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12450-12459. |
| [37] | ZHOU D W, WANG F Y, YE H J, et al. Forward compatible few-shot class-incremental learning[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 9036-9046. |
| [38] | HERSCHE M, KARUNARATNE G, CHERUBINI G, et al. Constrained few-shot class-incremental learning[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 9047-9057. |
| [39] | LIU H, GU L, CHI Z X, et al. Few-shot class-incremental learning via entropy-regularized data-free replay[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 146-162. |
| [40] | WANG R Q, DUAN X Y, KANG G L, et al. AttriCLIP: a non-incremental learner for incremental knowledge learning[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 3654-3663. |
| [41] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[EB/OL]. [2025-04-22]. https://arxiv.org/abs/2010.11929.pdf. |
| [42] | WANG H C, LIU Q, YUE X Y, et al. Unsupervised point cloud pre-training via occlusion completion[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9762-9772. |
| [43] | VAN DEN OORD A, LI Y Z, VINYALS O. Representation learning with contrastive predictive coding[EB/OL]. [2025-03- 10]. https://arxiv.org/abs/1807.03748.pdf. |
| [44] | WU Z R, SONG S R, KHOSLA A, et al. 3D ShapeNets: a deep representation for volumetric shapes[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1912-1920. |
| [45] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// The 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. |
| [46] | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2025-02-14]. https://arxiv.org/abs/1711.05101.pdf. |
| [47] | XUE L, GAO M F, XING C, et al. ULIP: learning a unified representation of language, images, and point clouds for 3D understanding[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 1179-1189. |
| [1] | HUANG Yongyu, DU Lin, QIANG Yiming, DING Jun. A lightweight framework for one-stop hull reverse modeling from large-amount point cloud data [J]. Journal of Graphics, 2025, 46(6): 1200-1208. |
| [2] | YUAN Chao, ZHAO Mingxue, ZHANG Fengyi, FENG Xiaoyong, LI Bing, CHEN Rui. Point cloud feature enhanced 3D object detection in complex indoor scenes [J]. Journal of Graphics, 2025, 46(1): 59-69. |
| [3] | WANG Xinyu, LIU Hui, ZHU Jicheng, SHENG Yurui, ZHANG Caiming. Deep multimodal medical image fusion network based on high-low frequency feature decomposition [J]. Journal of Graphics, 2024, 45(1): 65-77. |
| [4] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. |
| [5] | LIN Jia-rui, CHENG Zhi-gang, HAN Yu, YIN Yun-peng. Disaster tweets classification method based on pretrained BERT model [J]. Journal of Graphics, 2022, 43(3): 530-536. |
| [6] | CAI Jing 1,2,3,WANG Wanliang 1,ZHENG Jianwei 1,LI Jiming 3. Incremental Discriminant Non-Negative Matrix Factorization and Its Application to Face Recognition [J]. Journal of Graphics, 2017, 38(5): 715-721. |
| [7] | Lv Shuai, Da Feipeng, Huang Yuan. A Fast and Lossy Compression Algorithm for Point-Cloud Models Based on#br# Data Type Conversion [J]. Journal of Graphics, 2016, 37(2): 199-205. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||