Image classification method based on uncertainty-driven smart reinforcement active learning

doi:10.11996/JG.j.2095-302X.2026010047

Abstract

Abstract:

With the rapid development of deep learning, remarkable achievements have been made in image classification and related tasks. However, the success of these models heavily relies on large amounts of high-quality labeled data. In real-world applications, labeled data is often scarce, and manual annotation is time-consuming, labor-intensive, and costly, which limits the scalability and deployment of deep learning models. In recent years, active learning has gained significant attention due to its ability to improve model performance under limited annotation budgets. The core idea of active learning is to select the most valuable data for labeling based on certain criteria such as uncertainty, diversity, or representativeness. To address the limitations of traditional active learning methods, which often rely on manually designed heuristic sampling strategies that struggle to adapt to different task scenarios and are difficult to dynamically optimize, a Smart Reinforcement Active Learning (SRAL) approach for image classification is proposed. The sample selection process is modeled as a MARKOV DECISION PRocess (MDP), leveraging reinforcement learning’s adaptive strategy optimization ability to guide the model in dynamically selecting the most valuable samples from the unlabeled data for labeling. In this framework, the state is represented by features extracted from the unlabeled samples, the action indicates whether a sample should be selected for labeling, and the reward function is defined as the change in model accuracy after incorporating the selected sample into the training set. The Actor-Critic algorithm is adopted to optimize the sampling policy, and uncertainty-based heuristic ranking is incorporated as auxiliary information to improve the learning efficiency. Experimental results demonstrate that the proposed SRAL method significantly improves classification accuracy under the same labeling budget compared to other active learning approaches on datasets such as CIFAR-10, SVHN, and FASHION-MNIST. Furthermore, SRAL exhibits robust stability and strong generalization ability across these datasets. This confirms the effectiveness and advantages of SRAL in enhancing the performance of image classification models.

Key words: deep learning, reinforcement learning, active learning, image classification, policy optimization

CLC Number:

JIU Mingyuan, WU Guowei, SONG Xuguang, LI Shupan, XU Mingliang. Image classification method based on uncertainty-driven smart reinforcement active learning[J]. Journal of Graphics, 2026, 47(1): 47-56.

Figures/Tables 8

References 35

[1]	CHEN L Y, LI S B, BAI Q, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote Sensing, 2021, 13(22): 4712. DOI URL
[2]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2025-04-13]. https://dl.acm.org/doi/ 10.5555/3295222.3295349.https://dl.acm.org/doi/10.5555/3295222.3295349.
[3]	WANG M, MIN F, ZHANG Z H, et al. Active learning through density clustering[J]. Expert Systems with Applications, 2017, 85: 305-317. DOI URL
[4]	LEWIS D D, GALE W A. A sequential algorithm for training text classifiers[M]//CROFT B W, RIJSBERGEN C J. SIGIR’94. London: Springer, 1994: 3-12.
[5]	KHAN A, HAQ I U, HUSSAIN T, et al. PMAL: a proxy model active learning approach for vision based industrial applications[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2022, 18(2s): 123.
[6]	TANG S G, YU X Y, CHEANG C F, et al. Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images[J]. Computers in Biology and Medicine, 2023, 157: 106723. DOI URL
[7]	DEMIR B, PERSELLO C, BRUZZONE L. Batch-mode active-learning methods for the interactive classification of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2011, 49(3): 1014-1031. DOI URL
[8]	YANG Y Z, LOOG M. Single shot active learning using pseudo annotators[J]. Pattern Recognition, 2019, 89: 22-31. DOI
[9]	NGUYEN H T, SMEULDERS A. Active learning using pre-clustering[C]// The 21st International Conference on Machine Learning. New York: ACM, 2004: 79.
[10]	DASGUPTA S. Analysis of a greedy active learning strategy[EB/OL]. [2025-04-13]. https://dl.acm.org/doi/abs/10.5555/2976040.2976083.
[11]	DESCHAMPS S, SAHBI H. Reinforcement-based display selection for frugal learning[C]// 2022 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1186-1193.
[12]	CARAMALAU R, BHATTARAI B, KIM T K. Visual transformer for task-aware active learning[EB/OL]. [2025-04- 13]. https://arxiv.org/abs/2106.03801.
[13]	GISSIN D, SHALEV-SHWARTZ S. Discriminative active learning[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1907.06347.
[14]	HUANG S J, JIN R, ZHOU Z H. Active learning by querying informative and representative examples[EB/OL]. [2025- 04-13]. https://dl.acm.org/doi/10.5555/2997189.2997289.
[15]	YIN C Y, CHEN S S, YIN Z C. Clustering-based active learning classification towards data stream[J]. ACM Transactions on Intelligent Systems and Technology, 2023, 14(2): 38.
[16]	SHEN W J, LI Y H, CHEN L, et al. Multiple-boundary clustering and prioritization to promote neural network retraining[C]// The 35th IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE Press, 2020: 410-422.
[17]	DESCHAMPS S, SAHBI H. Reinforcement-based frugal learning for interactive satellite image change detection[C]// IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. New York: IEEE Press, 2022: 627-630.
[18]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1312.5602.
[19]	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]// The 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2016: 2094-2100.
[20]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2025-04-13]. https://arxiv.org/abs/1511.05952.
[21]	FANG M, LI Y, COHN T. Learning how to active learn: a deep reinforcement learning approach[EB/OL]. [2025-04-13]. https://aclanthology.org/D17-1063/.
[22]	HAUSSMANN M, HAMPRECHT F, KANDEMIR M. Deep active learning with adaptive acquisition[EB/OL]. [2025-04- 13]. https://dl.acm.org/doi/10.5555/3367243.3367383.
[23]	LIU Z M, WANG J Y, GONG S G, et al. Deep reinforcement active learning for human-in-the-loop person re-identification[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6121-6130.
[24]	SUN L, GONG Y H. Active learning for image classification: a deep reinforcement learning approach[C]// The 2nd China Symposium on Cognitive Computing and Hybrid Intelligence. New York: IEEE Press, 2019: 71-76.
[25]	WANG J W, YAN Y G, ZHANG Y B, et al. Deep reinforcement active learning for medical image classification[C]// The 23rd International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. Cham: Springer, 2020: 33-42.
[26]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. DOI
[27]	TANG X, WU S, CHEN G, et al. Learning to label with active learning and reinforcement learning[C]// The 26th International Conference on Database Systems for Advanced Applications. Cham: Springer, 2021: 549-557.
[28]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[29]	CARAMALAU R, BHATTARAI B, KIM T K. Sequential graph convolutional network for active learning[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 9578-9587.
[30]	PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[EB/OL]. [2025-04-13].https://dl.acm.org/doi/10.5555/3454287.3455008.
[31]	SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423. DOI URL
[32]	SETTLES B. Active learning literature survey[R]. Madison: University of Wisconsin-Madison, 2009.
[33]	SCHEFFER T, DECOMAIN C, WROBEL S. Active hidden Markov models for information extraction[C]// The 4th International Conference on Advances in Intelligent Data Analysis. Cham: Springer, 2001: 309-318.
[34]	LI T, ZHOU P, HE Z B, et al. Friendly sharpness-aware minimization[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 5631-5640.
[35]	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605.

方法	样本数量
方法	1 000	20 00	3 000	4 000	5 000	6 000	7 000
Random	49.44±0.86	56.90±0.49	61.60±0.28	64.73±0.20	69.96±0.39	67.76±0.32	69.25±0.15
EN^[31]	50.02±0.19	57.29±0.19	61.58±0.17	64.55±0.20	68.99±0.33	67.39±0.20	68.80±0.19
LC^[32]	49.50±0.43	57.77±0.26	61.35±0.28	64.61±0.47	69.72±0.20	67.75±0.31	69.19±0.24
MS^[33]	49.94±0.39	57.23±0.64	61.42±0.40	64.50±0.23	70.25±0.07	68.42±0.21	69.72±0.12
DRAL^[25]	50.21±0.19	57.77±0.40	61.64±0.42	65.18±0.010	70.10±0.22	68.54±0.17	69.75±0.15
SRAL+Random	50.40±0.04	57.83±0.36	61.83±0.07	65.60±0.49	69.90±0.67	68.62±0.17	70.00±0.11
SRAL	50.67±0.10	58.45±0.11	62.36±0.55	65.77±0.44	70.49±0.19	68.98±0.26	70.26±0.15

方法	样本数量
方法	1 000	20 00	3 000	4 000	5 000	6 000	7 000
Random	49.44±0.86	56.90±0.49	61.60±0.28	64.73±0.20	69.96±0.39	67.76±0.32	69.25±0.15
EN^[31]	50.02±0.19	57.29±0.19	61.58±0.17	64.55±0.20	68.99±0.33	67.39±0.20	68.80±0.19
LC^[32]	49.50±0.43	57.77±0.26	61.35±0.28	64.61±0.47	69.72±0.20	67.75±0.31	69.19±0.24
MS^[33]	49.94±0.39	57.23±0.64	61.42±0.40	64.50±0.23	70.25±0.07	68.42±0.21	69.72±0.12
DRAL^[25]	50.21±0.19	57.77±0.40	61.64±0.42	65.18±0.010	70.10±0.22	68.54±0.17	69.75±0.15
SRAL+Random	50.40±0.04	57.83±0.36	61.83±0.07	65.60±0.49	69.90±0.67	68.62±0.17	70.00±0.11
SRAL	50.67±0.10	58.45±0.11	62.36±0.55	65.77±0.44	70.49±0.19	68.98±0.26	70.26±0.15

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	73.65±0.17	85.06±0.70	87.44±0.05	88.92±0.13	89.46±0.58	90.23±0.13	90.63±0.41
EN^[31]	73.62±1.07	84.55±0.48	88.55±0.14	89.99±0.46	91.24±0.14	92.50±0.09	92.93±0.21
LC^[32]	75.02±0.53	85.40±0.26	88.14±0.19	90.08±0.28	91.32±0.06	92.16±0.29	93.11±0.16
MS^[33]	72.60±0.45	85.28±0.90	88.51±0.31	90.55±0.19	91.52±0.19	92.41±0.14	93.13±0.05
DRAL^[25]	74.63±0.93	86.23±0.40	89.19±0.19	91.00±0.18	91.56±0.12	92.92±0.10	93.44±0.08
SRAL+Random	75.21±0.16	86.58±0.37	89.40±0.14	90.96±0.23	91.36±0.20	92.72±0.29	93.15±0.37
SRAL	75.45±0.46	86.94±0.16	89.66±0.22	91.30±0.13	91.80±0.13	93.05±0.10	93.60±0.10

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	73.65±0.17	85.06±0.70	87.44±0.05	88.92±0.13	89.46±0.58	90.23±0.13	90.63±0.41
EN^[31]	73.62±1.07	84.55±0.48	88.55±0.14	89.99±0.46	91.24±0.14	92.50±0.09	92.93±0.21
LC^[32]	75.02±0.53	85.40±0.26	88.14±0.19	90.08±0.28	91.32±0.06	92.16±0.29	93.11±0.16
MS^[33]	72.60±0.45	85.28±0.90	88.51±0.31	90.55±0.19	91.52±0.19	92.41±0.14	93.13±0.05
DRAL^[25]	74.63±0.93	86.23±0.40	89.19±0.19	91.00±0.18	91.56±0.12	92.92±0.10	93.44±0.08
SRAL+Random	75.21±0.16	86.58±0.37	89.40±0.14	90.96±0.23	91.36±0.20	92.72±0.29	93.15±0.37
SRAL	75.45±0.46	86.94±0.16	89.66±0.22	91.30±0.13	91.80±0.13	93.05±0.10	93.60±0.10

方法	样本数量
方法	1 000	2 000	3 000	4 000	5 000	6 000	7 000
Random	76.28±0.81	78.30±0.60	79.64±0.82	79.84±0.95	81.64±0.68	81.71±0.71	82.84±0.07
EN^[31]	77.51±1.85	82.77±0.43	84.26±0.67	86.16±0.51	87.14±0.18	87.78±0.55	88.17±0.46
LC^[32]	77.22±0.94	82.96±0.27	84.90±0.18	86.38±0.07	86.95±0.29	87.97±0.42	88.72±0.38
MS^[33]	78.33±1.32	83.54±0.54	88.56±0.28	86.84±0.40	87.35±0.53	88.10±0.43	88.79±0.23
DRAL^[25]	79.81±0.31	84.79±0.29	86.25±0.17	86.97±0.23	87.55±0.43	88.67±0.15	88.93±0.24
SRAL+Random	80.07±0.25	84.44±0.17	86.24±0.24	86.78±0.26	87.67±0.16	88.25±0.26	88.89±0.24
SRAL	80.47±0.18	85.01±0.13	86.71±0.20	87.40±0.17	87.95±0.20	88.83±0.14	89.25±0.20