图学学报 ›› 2025, Vol. 46 ›› Issue (6): 1292-1303.DOI: 10.11996/JG.j.2095-302X.2025061292
收稿日期:2025-02-17
接受日期:2025-04-23
出版日期:2025-12-30
发布日期:2025-12-27
通讯作者:李宗民(1965-),男,教授,博士。主要研究方向为计算机视觉、图形图像处理等。E-mail:lizongmin@upc.edu.cn第一作者:李星辰(2003-),男,本科生。主要研究方向为计算机视觉等。E-mail:17852021063@163.com
基金资助:
LI Xingchen1(
), LI Zongmin1,2(
), YANG Chaozhi1
Received:2025-02-17
Accepted:2025-04-23
Published:2025-12-30
Online:2025-12-27
First author:LI Xingchen (2003-), undergraduate student. His main research interest covers computer vision. E-mail:17852021063@163.com
Supported by:摘要:
训练集和测试集之间的分布差距使深度学习模型在泛化方面面临挑战。通过系统分析,可发现2个亟待解决的关键问题:训练数据到测试数据的知识迁移优化不足,以及数据集中类别不均衡的影响。针对这些挑战,提出了一种新颖的测试时适应算法——可信伪标签微调方法(FTP)。通过优化样本选择过程,筛选出熵值较低的测试样本构建微调数据集,并结合原训练集实现模型微调,显著提高图像分类模型在测试集上的泛化性能。在MNIST,FashionMNIST和CIFAR10数据集上进行了广泛实验,结果表明结合FTP的图像分类模型在测试集上普遍获得性能提升,准确率最高提升约3%,F1分数相应提高,且优于TENT,COTTA,EATA和OSTTA等当前常用测试时适应方法。此外,基于梯度的可视化分析证明,经FTP微调的模型在保持高预测准确性的同时,依然维持了良好的可解释性,为实际应用提供了可靠保障。
中图分类号:
李星辰, 李宗民, 杨超智. 基于可信伪标签微调的测试时适应算法[J]. 图学学报, 2025, 46(6): 1292-1303.
LI Xingchen, LI Zongmin, YANG Chaozhi. Test-time adaptation algorithm based on trusted pseudo-label fine-tuning[J]. Journal of Graphics, 2025, 46(6): 1292-1303.
图2 3个数据集的样本可视化,每个数据集随机采样了20个样本
Fig. 2 Sample visualization of three datasets, 20 samples were randomly sampled for each dataset ((a) MNIST; (b) Fashion-MNIST; (c) CIFAR-10)
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.64 | 90.23 | 98.62 | 90.15 |
| ResNet | 97.98 | 90.26 | 97.97 | 90.16 |
| DenseNet | 97.79 | 87.21 | 97.75 | 86.72 |
| ViT | 71.18 | 73.38 | 70.24 | 70.96 |
| Swin-Transform | 81.74 | 71.72 | 80.13 | 68.75 |
表1 增强前的结果
Table 1 Results before enhancement
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.64 | 90.23 | 98.62 | 90.15 |
| ResNet | 97.98 | 90.26 | 97.97 | 90.16 |
| DenseNet | 97.79 | 87.21 | 97.75 | 86.72 |
| ViT | 71.18 | 73.38 | 70.24 | 70.96 |
| Swin-Transform | 81.74 | 71.72 | 80.13 | 68.75 |
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.90 | 88.56 | 98.88 | 88.09 |
| ResNet | 98.24 | 85.97 | 98.22 | 84.49 |
| DenseNet | 98.06 | 86.47 | 98.02 | 85.71 |
| ViT | 26.49 | 26.78 | 21.16 | 18.15 |
| Swin-Transform | 64.80 | 20.69 | 56.69 | 13.78 |
表2 合并的采样(熵阈值为40%)
Table 2 Combined sampling (entropy threshold is 40%)
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.90 | 88.56 | 98.88 | 88.09 |
| ResNet | 98.24 | 85.97 | 98.22 | 84.49 |
| DenseNet | 98.06 | 86.47 | 98.02 | 85.71 |
| ViT | 26.49 | 26.78 | 21.16 | 18.15 |
| Swin-Transform | 64.80 | 20.69 | 56.69 | 13.78 |
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.89 | 90.12 | 98.88 | 90.08 |
| ResNet | 98.32 | 90.15 | 98.30 | 90.13 |
| DenseNet | 98.39 | 87.79 | 98.39 | 87.40 |
| ViT | 68.72 | 70.10 | 68.55 | 70.10 |
| Swin-Transform | 74.95 | 63.76 | 71.28 | 58.64 |
表3 分类的采样(熵阈值为40%)
Table 3 Sampling by category (entropy threshold is 40%)
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 98.89 | 90.12 | 98.88 | 90.08 |
| ResNet | 98.32 | 90.15 | 98.30 | 90.13 |
| DenseNet | 98.39 | 87.79 | 98.39 | 87.40 |
| ViT | 68.72 | 70.10 | 68.55 | 70.10 |
| Swin-Transform | 74.95 | 63.76 | 71.28 | 58.64 |
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 99.30 | 91.99 | 99.29 | 91.99 |
| ResNet | 98.75 | 91.06 | 98.73 | 91.11 |
| DenseNet | 98.40 | 90.35 | 98.39 | 90.26 |
| ViT | 69.87 | 73.20 | 69.56 | 71.67 |
| Swin-Tansform | 11.65 | 39.41 | 7.28 | 34.38 |
表4 分类的采样(熵阈值为40%)+原训练集
Table 4 Sampling by category (entropy threshold is 40%) + original training set
| Methods | Acc-Top 1 | F1-Score | ||
|---|---|---|---|---|
| MNIST | Fashion | MNIST | Fashion | |
| VGG | 99.30 | 91.99 | 99.29 | 91.99 |
| ResNet | 98.75 | 91.06 | 98.73 | 91.11 |
| DenseNet | 98.40 | 90.35 | 98.39 | 90.26 |
| ViT | 69.87 | 73.20 | 69.56 | 71.67 |
| Swin-Tansform | 11.65 | 39.41 | 7.28 | 34.38 |
图8 在ResNet模型中选取不同熵值与正确率Acc的关系图(蓝色和红色部分分别是在MNIST和Fashion上的结果)
Fig. 8 The relationship between different entropy values and accuracy Acc is selected in the ResNet model (blue and red parts are the results on MNIST and Fashion, respectively)
图9 在ViT模型中选取不同熵值与正确率Acc的关系图(蓝色和红色部分分别是在MNIST和Fashion上的结果)
Fig. 9 The relationship between different entropy values and accuracy Acc in the ViT model (blue and red parts are the results of MNIST and Fashion, respectively)
图12 多数据集的不同模型的混淆矩阵((a) MNIST数据集;(b) Fashion数据集)
Fig. 12 Confusion matrix for different models with multiple datasets ((a) MNIST dataset; (b) Fashion dataset)
图13 经过FTP处理前后的混淆矩阵((a) 经过FTP处理前;(b) 经过FTP处理后)
Fig. 13 Confusion matrix before and after FTP processing ((a) Before FTP processing; (b) After FTP processing)
图15 FTP处理前后的t-SNE可视化((a) FTP处理前;(b) FTP处理后)
Fig. 15 Visualization of t-SNE before and after FTP processing ((a) Before FTP processing; (b) After FTP processing)
图16 DenseNet模型在Fashion数据集上经FTP处理前后的t-SNE对比数据(蓝色是经过FTP处理前,绿色是FTP处理后)
Fig. 16 Comparison of t-SNE data of the DenseNet model before and after FTP processing on the Fashion dataset (blue is before FTP processing, and green is after FTP processing)
图17 基于梯度的可解释性结果在MNIST和FashionMNIST数据集上利用CAM可视化((a) MNIST数据集;(b) Fashion数据集)
Fig. 17 Gradient-based interpretability results were visualized using CAM on MNIST and FashionMNIST datasets ((a) MNIST dataset; (b) Fashion dataset)
| [1] |
RAWAT W, WANG Z H. Deep convolutional neural networks for image classification: a comprehensive review[J]. Neural Computation, 2017, 29(9): 2352-2449.
DOI PMID |
| [2] | VOULODIMOS A, DOULAMIS N, DOULAMIS A, et al. Deep learning for computer vision: a brief review[J]. Computational Intelligence and Neuroscience, 2018, 2018(1): 7068349. |
| [3] |
ZHUANG F Z, QI Z Y, DUAN K Y, et al. A comprehensive survey on transfer learning[J]. Proceedings of the IEEE, 2021, 109(1): 43-76.
DOI URL |
| [4] | 张明华, 牛玉莹, 杜艳玲, 等. 基于残差3DCNN和三维Gabor滤波器的高光谱图像分类[J]. 图学学报, 2021, 42(5): 729-737. |
| ZHANG M H, NIU Y Y, DU Y L, et al. Hyperspectral image classification based on residual 3DCNN and 3D Gabor filter[J]. Journal of Graphics, 2021, 42(5): 729-737 (in Chinese). | |
| [5] |
边坤, 梁慧. 基于机器学习的图案分类研究进展[J]. 图学学报, 2023, 44(3): 415-426.
DOI |
|
BIAN K, LIANG H. Research progress of pattern classification based on machine learning[J]. Journal of Graphics, 2023, 44(3): 415-426 (in Chinese).
DOI |
|
| [6] | ZHANG X X, CUI P, XU R Z, et al. Deep stable learning for out-of-distribution generalization[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 5368-5378. |
| [7] |
TANG Z C, CHEN G X, YANG H L, et al. DSIL-DDI: a domain-invariant substructure interaction learning for generalizable drug-drug interaction prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 10552-10560.
DOI URL |
| [8] | TANG Z C, HUANG J H, CHEN G X, et al. Comprehensive view embedding learning for single-cell multimodal integration[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 15292-15300. |
| [9] |
LI Y K, WANG S X, TAN G. ID-NeRF: indirect diffusion- guided neural radiance fields for generalizable view synthesis[J]. Expert Systems with Applications, 2025, 266: 126068.
DOI URL |
| [10] |
TANG Z C, CHEN G X, CHEN S Z, et al. Modal-nexus auto-encoder for multi-modality cellular data integration and imputation[J]. Nature Communications, 2024, 15(1): 9021.
DOI PMID |
| [11] | 周锐闯, 田瑾, 闫丰亭, 等. 融合外部注意力和图卷积的点云分类模型[J]. 图学学报, 2023, 44(6): 1162-1172. |
| ZHOU R C, TIAN J, YAN F T, et al. Point cloud classification model incorporating external attention and graph convolution[J]. Journal of Graphics, 2023, 44(6): 1162-1172 (in Chinese). | |
| [12] |
LIN Z M, AKIN H, RAO R O S, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model[J]. Science, 2023, 379(6637): 1123-1130.
DOI PMID |
| [13] |
HE H H, CHEN G X, TANG Z C, et al. Dual modality feature fused neural network integrating binding site information for drug target affinity prediction[J]. NPJ Digital Medicine, 2025, 8(1): 67.
DOI PMID |
| [14] |
SCHÖLKOPF B, LOCATELLO F, BAUER S, et al. Toward causal representation learning[J]. Proceedings of the IEEE, 2021, 109(5): 612-634.
DOI URL |
| [15] |
LIANG J, HE R, TAN T N. A comprehensive survey on test-time adaptation under distribution shifts[J]. International Journal of Computer Vision, 2025, 133(1): 31-64.
DOI |
| [16] | PARK H, GUPTA A, WONG A. Test- time adaptation for depth completion[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 20519-20529. |
| [17] | SU Y Y, XU X, JIA K. Towards real-world test-time adaptation: tri-net self-training with balanced normalization[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 15126-15135. |
| [18] |
SHORTEN C, KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(1): 60.
DOI |
| [19] |
GANAIE M A, HU M H, MALIK A K, et al. Ensemble deep learning: a review[J]. Engineering Applications of Artificial Intelligence, 2022, 115: 105151.
DOI URL |
| [20] | NIU S C, WU J X, ZHANG Y F, et al. Efficient test-time model adaptation without forgetting[EB/OL]. [2025-02-16]. https://proceedings.mlr.press/v162/niu22a.html. |
| [21] | LEE J, JUNG D, LEE S, et al. Entropy is not enough for test-time adaptation: from the perspective of disentangled factors[EB/OL]. [2025-02-16]. https://openreview.net/forum?id=9w3iw8wDuE. |
| [22] |
MA W J, LU J Y, WU H. Cellcano: supervised cell type identification for single cell ATAC-seq data[J]. Nature Communications, 2023, 14(1): 1864.
DOI PMID |
| [23] |
TANG Z C, CHEN G X, CHEN S Z, et al. Knowledge-based inductive bias and domain adaptation for cell type annotation[J]. Communications Biology, 2024, 7(1): 1440.
DOI PMID |
| [24] | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 618-626. |
| [25] | O'SHEA K, NASH P. An introduction to convolutional neural networks[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1511.08458. |
| [26] | KHAN S, NASEER M, HAYAT M, et al. Transformers in vision: a survey[J]. ACM Computing Surveys (CSUR), 2022, 54(10s): 200. |
| [27] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1409.1556. |
| [28] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
| [29] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269. |
| [30] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2025-02-16]. https://openreview.net/forum?id=YicbFdNTTy. |
| [31] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
| [32] |
BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
DOI |
| [33] | CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794. |
| [34] |
FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
DOI URL |
| [35] | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255. |
| [36] | AMODEI D, OLAH C, STEINHARDT J, et al. Concrete problems in AI safety[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1606.06565. |
| [37] | GRAVES A. Generating sequences with recurrent neural networks[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1308.0850. |
| [38] | ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation[C]// The 34th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2020: 13001-13008. |
| [39] | ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1710.09412. |
| [40] | DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2025-02-16]. https://arxiv.org/abs/1708.04552. |
| [41] | HENDRYCKS D, MU N, CUBUK E D, et al. AugMix: a simple data processing method to improve robustness and uncertainty[EB/OL]. [2025-02-16]. https://openreview.net/forum?id=S1gmrxHFvB. |
| [42] | LYZHOV A, MOLCHANOVA Y, ASHUKHA A, et al. Greedy policy search: a simple baseline for learnable test-time augmentation[EB/OL]. [2025-02-16]. https://proceedings.mlr.press/v124/lyzhov20a.html. |
| [43] | KIMURA M. Understanding test-time augmentation[C]// The 28th International Conference on Neural Information Processing. Cham: Springer, 2021: 558-569. |
| [44] |
TURSUN O, DENMAN S, SRIDHARAN S, et al. Learning test-time augmentation for content-Based image retrieval[J]. Computer Vision and Image Understanding, 2022, 222: 103494.
DOI URL |
| [45] | CONDE P, PREMEBIDA C. Adaptive-TTA: accuracy-consistent weighted test time augmentation method for the uncertainty calibration of deep learning classifiers[EB/OL]. [2025-02-16]. https://bmvc2022.mpi-inf.mpg.de/0869.pdf. |
| [46] | WANG D Q, SHELHAMER E, LIU S T, et al. Tent: fully test-time adaptation by entropy minimization[EB/OL]. [2025-02-16]. https://openreview.net/forum?id=uXl3bZLkr3c. |
| [47] | WANG Q, FINK O, VAN GOOL L, et al. Continual test-time domain adaptation[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 7191-7201. |
| [48] | SREENIVAS M, BISWAS S. Efficient open-world test time adaptation of vision language models[EB/OL]. [2025-02-16]. https://openreview.net/forum?id=lF9QXpfNHm. |
| [49] | VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. |
| [1] | 琚晨, 丁嘉欣, 王泽兴, 李广钊, 管振祥, 张常有. 面向有限元法的图神经网络形函数近似方法[J]. 图学学报, 2025, 46(6): 1161-1171. |
| [2] | 易斌, 张立斌, 刘丹楹, 唐军, 方俊俊, 李雯琦. 基于AMTA-Net的卷制过程激光打孔通风率预测模型[J]. 图学学报, 2025, 46(6): 1224-1232. |
| [3] | 薄文, 琚晨, 刘维青, 张焱, 胡晶晶, 程婧晗, 张常有. 基于退化感知时序建模的装备维保时机预测方法[J]. 图学学报, 2025, 46(6): 1233-1246. |
| [4] | 赵振兵, 欧阳文斌, 冯烁, 李浩鹏, 马隽. 基于类内稀疏先验与改进YOLOv8的绝缘子红外图像检测方法[J]. 图学学报, 2025, 46(6): 1247-1256. |
| [5] | 贺蒙蒙, 张小艳, 李洪安. 基于Mamba结构的轻量级皮肤病变图像分割网络[J]. 图学学报, 2025, 46(6): 1257-1266. |
| [6] | 樊乐翔, 马冀, 周登文. 基于退化分离的轻量级盲超分辨率重建网络[J]. 图学学报, 2025, 46(6): 1304-1315. |
| [7] | 王海涵. 基于YOLOv8-OSRA的钢拱塔表观病害多目标检测方法[J]. 图学学报, 2025, 46(6): 1327-1336. |
| [8] | 朱泓淼, 钟国杰, 张严辞. 基于均值漂移与深度学习融合的小语义点云语义分割[J]. 图学学报, 2025, 46(5): 998-1009. |
| [9] | 汪子宇, 曹维维, 曹玉柱, 刘猛, 陈俊, 刘兆邦, 郑健. 基于类内区域动态解耦的半监督肺气管分割[J]. 图学学报, 2025, 46(4): 763-774. |
| [10] | 王道累, 丁子健, 杨君, 郑劭恺, 朱瑞, 赵文彬. 基于体素网格特征的NeRF大场景重建方法[J]. 图学学报, 2025, 46(3): 502-509. |
| [11] | 孙浩, 谢滔, 何龙, 郭文忠, 虞永方, 吴其军, 王建伟, 东辉. 多模态文本视觉大模型机器人地形感知算法研究[J]. 图学学报, 2025, 46(3): 558-567. |
| [12] | 翟永杰, 王璐瑶, 赵晓瑜, 胡哲东, 王乾铭, 王亚茹. 基于级联查询-位置关系的输电线路多金具检测方法[J]. 图学学报, 2025, 46(2): 288-299. |
| [13] | 潘树焱, 刘立群. MSFAFuse:基于多尺度特征信息与注意力机制的SAR和可见光图像融合模型[J]. 图学学报, 2025, 46(2): 300-311. |
| [14] | 张天圣, 朱闽峰, 任怡雯, 王琛涵, 张立冬, 张玮, 陈为. BPA-SAM:面向工笔画数据的SAM边界框提示增强方法[J]. 图学学报, 2025, 46(2): 322-331. |
| [15] | 孙禾衣, 李艺潇, 田希, 张松海. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||