基于不确定性引导的智能强化主动学习图像分类方法

doi:10.11996/JG.j.2095-302X.2026010047

图学学报 ›› 2026, Vol. 47 ›› Issue (1): 47-56.DOI: 10.11996/JG.j.2095-302X.2026010047

• 图像处理与计算机视觉 • 上一篇下一篇

基于不确定性引导的智能强化主动学习图像分类方法

酒明远¹^,²^,³, 吴国伟¹, 宋旭光¹, 李书攀¹^,²^,³, 徐明亮¹^,²^,³()

¹ 郑州大学计算机与人工智能学院，河南郑州 450001
² 郑州大学智能集群系统教育部工程研究中心，河南郑州 450001
³ 国家超级计算郑州中心，河南郑州 450001

收稿日期:2025-06-13 接受日期:2025-10-10 出版日期:2026-02-28 发布日期:2026-03-16
通讯作者:徐明亮，E-mail：iexumingliang@zzu.edu.cn
基金资助:
国家自然科学基金(62272422);国家自然科学基金(U22B2051);国家自然科学基金(62325602);河南省优秀青年基金(252300421225);郑州大学有组织科研团队培育项目(35220549)

Image classification method based on uncertainty-driven smart reinforcement active learning

JIU Mingyuan¹^,²^,³, WU Guowei¹, SONG Xuguang¹, LI Shupan¹^,²^,³, XU Mingliang¹^,²^,³()

¹ School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou Henan 450001, China
² Engineering Research Center of Intelligent Swarm Systems, Ministry of Education, Zhengzhou University, Zhengzhou Henan 450001, China
³ National Supercomputing Center in Zhengzhou, Zhengzhou Henan 450001, China

Received:2025-06-13 Accepted:2025-10-10 Published:2026-02-28 Online:2026-03-16
Supported by:
National Natural Science Foundation of China(62272422);National Natural Science Foundation of China(U22B2051);National Natural Science Foundation of China(62325602);Natural Science Foundation of Henan Province(252300421225);Organized Young Scientific Research Team Cultivation Foundation of Zhengzhou University(35220549)

摘要/Abstract

摘要：

随着深度学习技术的快速发展，其在图像分类等任务中取得了显著成果。然而，这些模型的成功往往依赖于大量高质量的标注数据，而在实际应用中，标注数据通常稀缺，人工标注过程又极为耗时、费力，限制了模型的推广与应用。近年来，主动学习因其能够在有限标注预算下提升模型性能而受到广泛关注，其核心思想是根据样本的不确定性、多样性或代表性等指标，挑选最有价值的数据进行标注。针对传统主动学习方法多依赖手动设计的启发式采样策略，难以适应不同任务场景，且选择策略难以动态优化等问题，提出一种基于智能强化主动学习(SRAL)的图像分类方法，通过将样本选择过程建模为马尔科夫决策过程，利用强化学习的自适应策略优化能力，引导模型从未标注样本中动态挑选最具价值的样本用于标注。其中，状态由未标注样本提取的特征构成，动作表示是否选择样本进行标注，奖励函数则定义为当前样本加入训练集后模型准确率的变化差值。采用演员-评论家(Actor-Critic)算法进行策略优化，并引入不确定性启发式排序作为辅助信息以提升学习效率。实验结果表明，在CIFAR-10，SVHN和FASHION-MNIST等数据集上，所提出的SRAL方法在相同标注预算下，相比于其他主动学习方法，能够显著提高分类准确率，且在各数据集上均展现出较好的稳定性和泛化能力，验证了SRAL方法在提高图像分类模型性能方面的有效性与优势。

关键词: 深度学习, 强化学习, 主动学习, 图像分类, 策略优化

Abstract:

With the rapid development of deep learning, remarkable achievements have been made in image classification and related tasks. However, the success of these models heavily relies on large amounts of high-quality labeled data. In real-world applications, labeled data is often scarce, and manual annotation is time-consuming, labor-intensive, and costly, which limits the scalability and deployment of deep learning models. In recent years, active learning has gained significant attention due to its ability to improve model performance under limited annotation budgets. The core idea of active learning is to select the most valuable data for labeling based on certain criteria such as uncertainty, diversity, or representativeness. To address the limitations of traditional active learning methods, which often rely on manually designed heuristic sampling strategies that struggle to adapt to different task scenarios and are difficult to dynamically optimize, a Smart Reinforcement Active Learning (SRAL) approach for image classification is proposed. The sample selection process is modeled as a MARKOV DECISION PRocess (MDP), leveraging reinforcement learning’s adaptive strategy optimization ability to guide the model in dynamically selecting the most valuable samples from the unlabeled data for labeling. In this framework, the state is represented by features extracted from the unlabeled samples, the action indicates whether a sample should be selected for labeling, and the reward function is defined as the change in model accuracy after incorporating the selected sample into the training set. The Actor-Critic algorithm is adopted to optimize the sampling policy, and uncertainty-based heuristic ranking is incorporated as auxiliary information to improve the learning efficiency. Experimental results demonstrate that the proposed SRAL method significantly improves classification accuracy under the same labeling budget compared to other active learning approaches on datasets such as CIFAR-10, SVHN, and FASHION-MNIST. Furthermore, SRAL exhibits robust stability and strong generalization ability across these datasets. This confirms the effectiveness and advantages of SRAL in enhancing the performance of image classification models.

Key words: deep learning, reinforcement learning, active learning, image classification, policy optimization

中图分类号:

酒明远, 吴国伟, 宋旭光, 李书攀, 徐明亮. 基于不确定性引导的智能强化主动学习图像分类方法[J]. 图学学报, 2026, 47(1): 47-56.

JIU Mingyuan, WU Guowei, SONG Xuguang, LI Shupan, XU Mingliang. Image classification method based on uncertainty-driven smart reinforcement active learning[J]. Journal of Graphics, 2026, 47(1): 47-56.

图/表 8

参考文献 35

[1]	CHEN L Y, LI S B, BAI Q, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote Sensing, 2021, 13(22): 4712. DOI URL
[2]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2025-04-13]. https://dl.acm.org/doi/ 10.5555/3295222.3295349.https://dl.acm.org/doi/10.5555/3295222.3295349.
[3]	WANG M, MIN F, ZHANG Z H, et al. Active learning through density clustering[J]. Expert Systems with Applications, 2017, 85: 305-317. DOI URL
[4]	LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[M]//CROFT B W, RIJSBERGEN C J. SIGIR’94. London: Springer, 1994: 3-12.
[5]	KHAN A, HAQ I U, HUSSAIN T, et al. PMAL: a proxy model active learning approach for vision based industrial applications[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(2s): 123.
[6]	TANG S G, YU X Y, CHEANG C F, et al. Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images[J]. Computers in Biology and Medicine, 2023, 157: 106723. DOI URL
[7]	DEMIR B, PERSELLO C, BRUZZONE L. Batch-mode active-learning methods for the interactive classification of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(3): 1014-1031. DOI URL
[8]	YANG Y Z, LOOG M. Single shot active learning using pseudo annotators[J]. Pattern Recognition, 2019, 89: 22-31. DOI
[9]	NGUYEN H T, SMEULDERS A. Active learning using pre-clustering[C]// The 21st International Conference on Machine Learning. New York: ACM, 2004: 79.
[10]	DASGUPTA S. Analysis of a greedy active learning strategy[EB/OL]. [2025-04-13]. https://dl.acm.org/doi/abs/10.5555/2976040.2976083.
[11]	DESCHAMPS S, SAHBI H. Reinforcement-based display selection for frugal learning[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1186-1193.
[12]	CARAMALAU R, BHATTARAI B, KIM T K. Visual transformer for task-aware active learning[EB/OL]. [2025-04- 13]. https://arxiv.org/abs/2106.03801.
[13]	GISSIN D, SHALEV-SHWARTZ S. Discriminative active learning[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1907.06347.
[14]	HUANG S J, JIN R, ZHOU Z H. Active learning by querying informative and representative examples[EB/OL]. [2025- 04-13]. https://dl.acm.org/doi/10.5555/2997189.2997289.
[15]	YIN C Y, CHEN S S, YIN Z C. Clustering-based active learning classification towards data stream[J]. ACM Transactions on Intelligent Systems and Technology, 2023, 14(2): 38.
[16]	SHEN W J, LI Y H, CHEN L, et al. Multiple-boundary clustering and prioritization to promote neural network retraining[C]// The 35th IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE Press, 2020: 410-422.
[17]	DESCHAMPS S, SAHBI H. Reinforcement-based frugal learning for interactive satellite image change detection[C]// IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. New York: IEEE Press, 2022: 627-630.
[18]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1312.5602.
[19]	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]// The 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2016: 2094-2100.
[20]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1511.05952.
[21]	FANG M, LI Y, COHN T. Learning how to active learn: a deep reinforcement learning approach[EB/OL]. [2025-04-13]. https://aclanthology.org/D17-1063/.
[22]	HAUSSMANN M, HAMPRECHT F, KANDEMIR M. Deep active learning with adaptive acquisition[EB/OL]. [2025-04- 13]. https://dl.acm.org/doi/10.5555/3367243.3367383.
[23]	LIU Z M, WANG J Y, GONG S G, et al. Deep reinforcement active learning for human-in-the-loop person re-identification[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6121-6130.
[24]	SUN L, GONG Y H. Active learning for image classification: a deep reinforcement learning approach[C]// The 2nd China Symposium on Cognitive Computing and Hybrid Intelligence. New York: IEEE Press, 2019: 71-76.
[25]	WANG J W, YAN Y G, ZHANG Y B, et al. Deep reinforcement active learning for medical image classification[C]// The 23rd International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Cham: Springer, 2020: 33-42.
[26]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. DOI
[27]	TANG X, WU S, CHEN G, et al. Learning to label with active learning and reinforcement learning[C]// The 26th International Conference on Database Systems for Advanced Applications. Cham: Springer, 2021: 549-557.
[28]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[29]	CARAMALAU R, BHATTARAI B, KIM T K. Sequential graph convolutional network for active learning[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9578-9587.
[30]	PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[EB/OL]. [2025-04-13].https://dl.acm.org/doi/10.5555/3454287.3455008.
[31]	SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423. DOI URL
[32]	SETTLES B. Active learning literature survey[R]. Madison: University of Wisconsin-Madison, 2009.
[33]	SCHEFFER T, DECOMAIN C, WROBEL S. Active hidden Markov models for information extraction[C]// The 4th International Conference on Advances in Intelligent Data Analysis. Cham: Springer, 2001: 309-318.
[34]	LI T, ZHOU P, HE Z B, et al. Friendly sharpness-aware minimization[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 5631-5640.
[35]	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.

方法	样本数量
方法	1 000	20 00	3 000	4 000	5 000	6 000	7 000
Random	49.44±0.86	56.90±0.49	61.60±0.28	64.73±0.20	69.96±0.39	67.76±0.32	69.25±0.15
EN^[31]	50.02±0.19	57.29±0.19	61.58±0.17	64.55±0.20	68.99±0.33	67.39±0.20	68.80±0.19
LC^[32]	49.50±0.43	57.77±0.26	61.35±0.28	64.61±0.47	69.72±0.20	67.75±0.31	69.19±0.24
MS^[33]	49.94±0.39	57.23±0.64	61.42±0.40	64.50±0.23	70.25±0.07	68.42±0.21	69.72±0.12
DRAL^[25]	50.21±0.19	57.77±0.40	61.64±0.42	65.18±0.010	70.10±0.22	68.54±0.17	69.75±0.15
SRAL+Random	50.40±0.04	57.83±0.36	61.83±0.07	65.60±0.49	69.90±0.67	68.62±0.17	70.00±0.11
SRAL	50.67±0.10	58.45±0.11	62.36±0.55	65.77±0.44	70.49±0.19	68.98±0.26	70.26±0.15

方法	样本数量
方法	1 000	20 00	3 000	4 000	5 000	6 000	7 000
Random	49.44±0.86	56.90±0.49	61.60±0.28	64.73±0.20	69.96±0.39	67.76±0.32	69.25±0.15
EN^[31]	50.02±0.19	57.29±0.19	61.58±0.17	64.55±0.20	68.99±0.33	67.39±0.20	68.80±0.19
LC^[32]	49.50±0.43	57.77±0.26	61.35±0.28	64.61±0.47	69.72±0.20	67.75±0.31	69.19±0.24
MS^[33]	49.94±0.39	57.23±0.64	61.42±0.40	64.50±0.23	70.25±0.07	68.42±0.21	69.72±0.12
DRAL^[25]	50.21±0.19	57.77±0.40	61.64±0.42	65.18±0.010	70.10±0.22	68.54±0.17	69.75±0.15
SRAL+Random	50.40±0.04	57.83±0.36	61.83±0.07	65.60±0.49	69.90±0.67	68.62±0.17	70.00±0.11
SRAL	50.67±0.10	58.45±0.11	62.36±0.55	65.77±0.44	70.49±0.19	68.98±0.26	70.26±0.15

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	73.65±0.17	85.06±0.70	87.44±0.05	88.92±0.13	89.46±0.58	90.23±0.13	90.63±0.41
EN^[31]	73.62±1.07	84.55±0.48	88.55±0.14	89.99±0.46	91.24±0.14	92.50±0.09	92.93±0.21
LC^[32]	75.02±0.53	85.40±0.26	88.14±0.19	90.08±0.28	91.32±0.06	92.16±0.29	93.11±0.16
MS^[33]	72.60±0.45	85.28±0.90	88.51±0.31	90.55±0.19	91.52±0.19	92.41±0.14	93.13±0.05
DRAL^[25]	74.63±0.93	86.23±0.40	89.19±0.19	91.00±0.18	91.56±0.12	92.92±0.10	93.44±0.08
SRAL+Random	75.21±0.16	86.58±0.37	89.40±0.14	90.96±0.23	91.36±0.20	92.72±0.29	93.15±0.37
SRAL	75.45±0.46	86.94±0.16	89.66±0.22	91.30±0.13	91.80±0.13	93.05±0.10	93.60±0.10

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	73.65±0.17	85.06±0.70	87.44±0.05	88.92±0.13	89.46±0.58	90.23±0.13	90.63±0.41
EN^[31]	73.62±1.07	84.55±0.48	88.55±0.14	89.99±0.46	91.24±0.14	92.50±0.09	92.93±0.21
LC^[32]	75.02±0.53	85.40±0.26	88.14±0.19	90.08±0.28	91.32±0.06	92.16±0.29	93.11±0.16
MS^[33]	72.60±0.45	85.28±0.90	88.51±0.31	90.55±0.19	91.52±0.19	92.41±0.14	93.13±0.05
DRAL^[25]	74.63±0.93	86.23±0.40	89.19±0.19	91.00±0.18	91.56±0.12	92.92±0.10	93.44±0.08
SRAL+Random	75.21±0.16	86.58±0.37	89.40±0.14	90.96±0.23	91.36±0.20	92.72±0.29	93.15±0.37
SRAL	75.45±0.46	86.94±0.16	89.66±0.22	91.30±0.13	91.80±0.13	93.05±0.10	93.60±0.10

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	76.28±0.81	78.30±0.60	79.64±0.82	79.84±0.95	81.64±0.68	81.71±0.71	82.84±0.07
EN^[31]	77.51±1.85	82.77±0.43	84.26±0.67	86.16±0.51	87.14±0.18	87.78±0.55	88.17±0.46
LC^[32]	77.22±0.94	82.96±0.27	84.90±0.18	86.38±0.07	86.95±0.29	87.97±0.42	88.72±0.38
MS^[33]	78.33±1.32	83.54±0.54	88.56±0.28	86.84±0.40	87.35±0.53	88.10±0.43	88.79±0.23
DRAL^[25]	79.81±0.31	84.79±0.29	86.25±0.17	86.97±0.23	87.55±0.43	88.67±0.15	88.93±0.24
SRAL+Random	80.07±0.25	84.44±0.17	86.24±0.24	86.78±0.26	87.67±0.16	88.25±0.26	88.89±0.24
SRAL	80.47±0.18	85.01±0.13	86.71±0.20	87.40±0.17	87.95±0.20	88.83±0.14	89.25±0.20

基于不确定性引导的智能强化主动学习图像分类方法

Image classification method based on uncertainty-driven smart reinforcement active learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 35

相关文章 15

编辑推荐

Metrics

本文评价

[1]	董文益, 杨伟东, 唐冰慧, 王琦, 肖宏宇. 基于深度学习的肝脏局灶性病变检测方法综述[J]. 图学学报, 2026, 47(1): 1-16.
[2]	翟永杰, 王紫萱, 张祯琪, 周迅琪, 王乾铭. 融合双重注意力与加权动态卷积的车辆损伤分类模型[J]. 图学学报, 2026, 47(1): 17-28.
[3]	潘宇轩, 金锐, 刘雨, 张琳. 基于生成模型的无监督多视点立体视觉网络[J]. 图学学报, 2026, 47(1): 29-38.
[4]	杨彪, 王学, 官铮, 龙萍. BSD-YOLO：基于动态稀疏注意力与自适应检测头的小目标车辆检测方法[J]. 图学学报, 2026, 47(1): 99-110.
[5]	琚晨, 丁嘉欣, 王泽兴, 李广钊, 管振祥, 张常有. 面向有限元法的图神经网络形函数近似方法[J]. 图学学报, 2025, 46(6): 1161-1171.
[6]	张立祥, 胡耀光. 面向智能制造的柔性作业车间自适应实时调度方法[J]. 图学学报, 2025, 46(6): 1191-1199.
[7]	易斌, 张立斌, 刘丹楹, 唐军, 方俊俊, 李雯琦. 基于AMTA-Net的卷制过程激光打孔通风率预测模型[J]. 图学学报, 2025, 46(6): 1224-1232.
[8]	薄文, 琚晨, 刘维青, 张焱, 胡晶晶, 程婧晗, 张常有. 基于退化感知时序建模的装备维保时机预测方法[J]. 图学学报, 2025, 46(6): 1233-1246.
[9]	赵振兵, 欧阳文斌, 冯烁, 李浩鹏, 马隽. 基于类内稀疏先验与改进YOLOv8的绝缘子红外图像检测方法[J]. 图学学报, 2025, 46(6): 1247-1256.
[10]	贺蒙蒙, 张小艳, 李洪安. 基于Mamba结构的轻量级皮肤病变图像分割网络[J]. 图学学报, 2025, 46(6): 1257-1266.
[11]	李星辰, 李宗民, 杨超智. 基于可信伪标签微调的测试时适应算法[J]. 图学学报, 2025, 46(6): 1292-1303.
[12]	樊乐翔, 马冀, 周登文. 基于退化分离的轻量级盲超分辨率重建网络[J]. 图学学报, 2025, 46(6): 1304-1315.
[13]	王海涵. 基于YOLOv8-OSRA的钢拱塔表观病害多目标检测方法[J]. 图学学报, 2025, 46(6): 1327-1336.
[14]	刘伯凯, 殷雪峰, 孙传昱, 葛慧林, 魏子麒, 姜雨彤, 朴海音, 周东生, 杨鑫. 基于深度强化学习的无人机三维场景导航方法研究[J]. 图学学报, 2025, 46(5): 1010-1017.
[15]	朱泓淼, 钟国杰, 张严辞. 基于均值漂移与深度学习融合的小语义点云语义分割[J]. 图学学报, 2025, 46(5): 998-1009.