A review of image data augmentation based on generative models

doi:10.11996/JG.j.2095-302X.2026020235

Abstract

Abstract:

Deep learning has shown great potential in the field of computer vision, but its performance in practical applications relies heavily on large amounts of high-quality labeled data. Generative models, with their ability to generate diverse data, have become an effective solution to the problem of data scarcity, aiming to provide training data for computer vision efficiently and effectively. Consequently, image data augmentation techniques based on generative models have become a popular research direction in recent years. To this end, a comprehensive literature review was conducted on image data augmentation methods based on generative models. Through a three-stage retrieval process, 37 relevant studies were collected. The methodological processes of these studies were summarized into four main steps, with each step categorized and described in detail. First, various generative models suitable for image data augmentation were introduced, focusing on model selection. Next, generative image data augmentation methods were classified, with elaborations on the workflow, representative studies, existing challenges, and areas in need of optimization for each category. Considering that generated data may contain noise, methods were also discussed for the selection and processing of generated data to better utilize them in downstream tasks. Furthermore, evaluation methods were categorized and described to comprehensively verify the effectiveness and robustness of data augmentation approaches. Finally, the opportunities and challenges faced by generative image data augmentation in aspects were elaborated upon, such as maintaining semantic consistency, ensuring diversity, improving generation efficiency, and applying to black-box models, and pointed out potential directions for future exploration.

Key words: data scarcity, generative model, data augmentation, generated image data processing, effect evaluation

CLC Number:

XIANG Ting, TANG Zhuo, ZHENG Jiali, CHEN Changjian, LYU Fei, LI Kenli. A review of image data augmentation based on generative models[J]. Journal of Graphics, 2026, 47(2): 235-250.

Figures/Tables 12

References 114

[1]	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255.
[2]	QI G J, LUO J B. Small data challenges in big data era: a survey of recent progress on unsupervised and semi-supervised methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 2168-2187. DOI URL
[3]	DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1708.04552.
[4]	ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 13001-13008.
[5]	HENDRYCKS D, MU N, CUBUK E D, et al. AugMix: a simple data processing method to improve robustness and uncertainty[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2020.html#HendrycksMCZGL20.
[6]	ZHANG L J, DENG Z, KAWAGUCHI K, et al. How does mixup help with robustness and generalization?[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2010.04819.
[7]	CUBUK E D, ZOPH B, MANÉ D, et al. AutoAugment: learning augmentation strategies from data[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 113-123.
[8]	CUBUK E D, ZOPH B, SHLENS J, et al. Randaugment: practical automated data augmentation with a reduced search space[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 3008-3017.
[9]	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6022-6031.
[10]	ZHANG Y F, ZHOU D Q, HOOI B, et al. Expanding small-scale datasets with guided imagination[C]// The 37th International Conference on Neural Information Processing Systems. New York: IEEE Press, 2023: 3346.
[11]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685.
[12]	HUANG L H, CHEN D, LIU Y, et al. Composer: creative and controllable image synthesis with composable conditions[C]// The 40th International Conference on Machine Learning. New York: ACM, 2023: 558.
[13]	ZHENG C Y, WU G Q, LI C X. Toward understanding generative data augmentation[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2352.
[14]	杨锁荣, 杨洪朝, 申富饶, 等. 面向深度学习的图像数据增强综述[J]. 软件学报, 2025, 36(3): 1390-1412.
	YANG S R, YANG H C, SHEN F R, et al. Image data augmentation for deep learning: a survey[J]. Journal of Software, 2025, 36(3): 1390-1412 (in Chinese).
[15]	CHEN Y H, YAN Z H, ZHU Y J. A comprehensive survey for generative data augmentation[J]. Neurocomputing, 2024, 600: 128167. DOI URL
[16]	KUMAR T, BRENNAN R, MILEO A, et al. Image data augmentation approaches: a comprehensive survey and future directions[J]. IEEE Access, 2024, 12: 187536-187571. DOI URL
[17]	ZHOU Y, GUO C L, WANG X, et al. A survey on data augmentation in large model era[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2401.15422v2.
[18]	KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1312.6114.
[19]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[EB/OL]. [2025-05-09]. https://arxiv.org/pdf/1406.02661.
[20]	RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1511.06434.
[21]	KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2018.html#KarrasALL18.
[22]	KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4396-4405.
[23]	SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#SongME21.
[24]	ZHANG G Q, NIWA K, KLEIJN W B. On accelerating diffusion-based sampling processes via improved integration approximation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#0003NK24.
[25]	XIA M F, SHEN Y J, LEI C S, et al. Towards more accurate diffusion model acceleration with a timestep tuner[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 5736-5745.
[26]	SONG Y, DHARIWAL P, CHEN M, et al. Consistency models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2023.html#SongD0S23.
[27]	PODELL D, ENGLISH Z, LACEY K, et al. SDXL: improving latent diffusion models for high-resolution image synthesis[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#PodellELBDMPR24.
[28]	BETKER J, GOH G, JING L, et al. Improving image generation with better captions[EB/OL]. [2025-05-09]. https://readwise-assets.s3.amazonaws.com/media/wisereads/articles/improving-image-generation-wit/Dalle3_DkCZRcG.pdf.
[29]	HO J, SALIMANS T. Classifier-free diffusion guidance[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2207.12598.
[30]	WALLACE B, GOKUL A, ERMON S, et al. End-to-end diffusion latent optimization improves classifier guidance[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 7246-7256.
[31]	WU S, LIN Y T, ZHANG F H, et al. Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 3873.
[32]	WOLF R, SHI Y T, LIU S, et al. Diffusion models for robotic manipulation: a survey[J]. Frontiers in Robotics and AI, 2025, 12: 1606247. DOI URL
[33]	PAN C Y, YI Z J, SHI G Y, et al. Model-based diffusion for trajectory optimization[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 1846.
[34]	HO J, SALIMANS T, GRITSENKO A, et al. Video diffusion models[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 628.
[35]	HUANG Z Q, HE Y N, YU J S, et al. VBench: comprehensive benchmark suite for video generative models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 21807-21818.
[36]	BLATTMANN A, ROMBACH R, LING H, et al. Align your latents: high-resolution video synthesis with latent diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22563-22575.
[37]	TIAN K Y, JIANG Y, YUAN Z H, et al. Visual autoregressive modeling: scalable image generation via next-scale prediction[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 2694.
[38]	SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 2643.
[39]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 574.
[40]	TRABUCCO B, DOHERTY K, GURINAS M, et al. Effective data augmentation with diffusion models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#TrabuccoDGS24.
[41]	HE R F, SUN S Y, YU X, et al. Is synthetic data from generative models ready for image recognition?[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HeS0XZTBQ23.
[42]	TIAN Y L, FAN L J, ISOLA P, et al. StableRep: synthetic images from text-to-image models make strong visual representation learners[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2098.
[43]	LI B H, XU X, WANG X H, et al. Semantic-guided generative image augmentation method with diffusion models for image classification[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 3018-3027.
[44]	SINGH K, NAVARATNAM T, HOLMER J, et al. Is synthetic data all we need? Benchmarking the robustness of models trained with synthetic images[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 2505-2515.
[45]	RAHAT F, HOSSAIN M S, AHMED M R, et al. Data augmentation for image classification using generative AI[C]// 2025 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2025: 4173-4182.
[46]	JUNG K, SEO Y, CHO S, et al. DALDA: data augmentation leveraging diffusion model and LLM with adaptive guidance scaling[C]// European Conference on Computer Vision. Cham: Springer, 2025: 182-200.
[47]	BENIGMIM Y, ROY S, ESSID S, et al. One-shot unsupervised domain adaptation with personalized diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 698-708.
[48]	BANSAL H, GROVER A. Leaving reality to imagination: Robust classification via generated datasets[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2302.02503.
[49]	SARIYILDIZ M B, ALAHARI K, LARLUS D, et al. Fake it till you make it: learning transferable representations from synthetic ImageNet clones[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8011-8021.
[50]	AZIZIS, KORNBLITH S, SAHARIA C, et al. Synthetic data from diffusion models improves ImageNet classification[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2304.08466.
[51]	SAMUEL D, BEN-ARI R, RAVIV S, et al. Generating images of rare concepts using pre-trained diffusion models[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 4695-4703.
[52]	ZHU H W, YANG L, YONG J H, et al. Distribution-aware data expansion with diffusion models[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 3264.
[53]	ISLAM K, ZAHEER M Z, MAHMOOD A, et al. Diffusemix: label-preserving data augmentation with diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 27611-27620.
[54]	WANG Z C, WEI L H, WANG T, et al. Enhance image classification via inter-class image mixup with diffusion model[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 17223-17233.
[55]	SASTRY C S, DUMPALA S H, OORE S. DiffAug: a diffuse-and-denoise augmentation for training robust classifiers[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 654.
[56]	ZHOU Y C, SAHAK H, BA J. Using synthetic data for data augmentation to improve classification accuracy[EB/OL]. [2025-05-09]. https://openreview.net/pdf?id=42xAKgIb2P.
[57]	FU Y X, CHEN C Q, QIAO Y, et al. DreamDA: generative data augmentation with diffusion models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2403.12803.
[58]	DING L, ZHANG J, CLUNE J, et al. Quality diversity through human feedback: towards open-ended diversity-driven optimization[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2024.html#0010ZCSL24.
[59]	HÖFERLIN B, NETZEL R, HÖFERLIN M, et al. Inter-active learning of ad-hoc classifiers for video visual analytics[C]// 2012 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2012: 23-32.
[60]	CHEN C J, WANG Z W, WU J, et al. Interactive graph construction for graph-based semi-supervised learning[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(9): 3701-3716. DOI URL
[61]	YANG W K, YE X, ZHANG X X, et al. Diagnosing ensemble few-shot classifiers[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(9): 3292-3306. DOI URL
[62]	CHEN C J, WU U, WANG X H, et al. Towards better caption supervision for object detection[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(4): 1941-1954. DOI URL
[63]	CHEN C J, CHEN J S, YANG W K, et al. Enhancing single-frame supervision for better temporal action localization[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2903-2915. DOI URL
[64]	HOQUE M N, HE W B, SHEKAR A K, et al. Visual concept programming: a visual analytics approach to injecting human intelligence at scale[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(1): 74-83.
[65]	HE J B, WANG X B, WONG K K, et al. VideoPro: a visual analytics approach for interactive video programming[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 87-97. DOI URL
[66]	LI S S, LIU G Z, WEI T X, et al. EvoVis: a visual analytics method to understand the labeling iterations in data programming[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(3): 1802-1817. DOI URL
[67]	FENG Y C J, WANG X B, WONG K K, et al. PromptMagician: interactive prompt engineering for text-to-image creation[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 295-305.
[68]	WANG Y L, SHEN S Y, LIM B Y. RePrompt: automatic prompt editing to refine AI-generative art towards precise expressions[C]// The 2023 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2023: 22.
[69]	WANG Z J, HUANG Y H, SONG D, et al. PromptCharm: text-to-image generation through multi-modal prompting and refinement[C]// The 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 185.
[70]	BRADE S, WANG B, SOUSA M, et al. Promptify: text-to-image generation through interactive prompt exploration with large language models[C]// The 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 96.
[71]	CHUNG J J Y, ADAR E. PromptPaint: steering text-to-image generation through paint medium-like interactions[C]// The 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 6.
[72]	GUO Y H, SHAO H N, LIU C, et al. PrompTHis: visualizing the process and influence of prompt editing during text-to-image creation[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(9): 4547-4559. DOI URL
[73]	GOU L, ZOU L C, LI N X, et al. VATLD: a visual analytics system to assess, understand and improve traffic light detection[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 261-271. DOI URL
[74]	ENDERT A, HAN C, MAITI D, et al. Observation-level interaction with statistical models for visual analytics[C]// 2011 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2011: 121-130.
[75]	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2302.13971.
[76]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre- training[EB/OL]. [2025-05-09]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[77]	RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. [2025-05-09]. https://gwern.net/doc/ai/nn/transformer/gpt/2/2019-radford.pdf.
[78]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 159.
[79]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.
[80]	WANG R, LIU T, HSIEH C J, et al. On discrete prompt optimization for diffusion models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2024.html#Wang0HG24.
[81]	MAHAJAN S, RAHMAN T, YI K M, et al. Prompting hard or hardly prompting: prompt inversion for text-to-image diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 6808-6817.
[82]	HAO Y R, CHI Z W, DONG L, et al. Optimizing prompts for text-to-image generation[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2923.
[83]	LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2022.html#0001LXH22.
[84]	GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#GalAAPBCC23.
[85]	HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to- prompt image editing with cross-attention control[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HertzMTAPC23.
[86]	RUIZ N, LI Y Z, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22500-22510.
[87]	LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 1516.
[88]	PENG Z L, WANG W H, DONG L, et al. Kosmos-2:grounding multimodal large language models to the world[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2306.14824.
[89]	WU C H, DE LA TORRE F. Unifying diffusion models’ latent space, with applications to CycleDiffusion and guidance[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2210.05559.
[90]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2025-05-09]. https://proceedings.mlr.press/v139/radford21a.
[91]	ASOKAN M, WU K B, ALBREIKI F. FineLIP: extending CLIP's reach via fine-grained alignment with longer text inputs[C]// The Computer Vision and Pattern Recognition Conference. New York: IEEE Press, 2025: 14495-14504.
[92]	YUAN J, CHEN C J, YANG W K, et al. A survey of visual analytics techniques for machine learning[J]. Computational Visual Media, 2021, 7(1): 3-36. DOI URL
[93]	SETTLES B. Active learning literature survey[R]. Madison: University of Wisconsin-Madison, 2009.
[94]	ESSER P, KULAL S, BLATTMANN A, et al. Scaling rectified flow transformers for high-resolution image synthesis[EB/OL]. [2025-05-09]. https://icml.cc/virtual/2024/oral/35548.
[95]	BAKER J. Using style ambiguity loss to improve aesthetics of diffusion models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2410.02055v1.
[96]	YAMAGUCHI S, FUKUDA T. On the limitation of diffusion models for synthesizing training datasets[EB/OL]. [2025-05-09]. https://nips.cc/virtual/2023/78391.
[97]	CHEN P G, LIU S, ZHAO H S, et al. GridMask data augmentation[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2001.04086.
[98]	SANCHO J C, BARKER K J, KERBYSON D J, et al. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications[C]// The 2006 ACM/IEEE Conference on Supercomputing. New York: IEEE Press, 2006: 17.
[99]	PARKHI O M, VEDALDI A, ZISSERMAN A, et al. Cats and dogs[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3498-3505.
[100]	NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]// The 6th Indian Conference on Computer Vision, Graphics & Image Processing. New York: IEEE Press, 2008: 722-729.
[101]	KRIZHEVSKY A. Learning multiple layers of features from tiny images[R]. Toronto: University of Toronto, 2009.
[102]	BARRATT S, SHARMA R. A note on the inception score[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1801.01973.
[103]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640.
[104]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[105]	XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5987-5995.
[106]	ZAGORUYKO S, KOMODAKIS N. Wide residual networks[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1605.07146.
[107]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4510-4520.
[108]	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.
[109]	TENENBAUM J B, DE SILVA V, LANGFORD J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323. DOI PMID
[110]	MCINNES L, HEALY J, MELVILLE J. UMAP: uniform manifold approximation and projection for dimension reduction[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1802.03426.
[111]	SHLENS J. A tutorial on principal component analysis[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1404.1100.
[112]	吕斐, 陈长建, 张嘉鹏, 等. 面向超级计算机系统的可视化综述[J]. 计算机辅助设计与图形学学报, 2024, 36(3): 321-335.
	LYU F, CHEN C J, ZHANG J P, et al. Visualization for supercomputer system: a survey[J]. Journal of Computer- Aided Design & Computer Graphics, 2024, 36(3): 321-335 (in Chinese).
[113]	XIE S R, XIAO Z S, KINGMA D P, et al. EM distillation for one-step diffusion models[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 1432.
[114]	DON-YEHIYA S, CHOSHEN L, ABEND O. Human learning by model feedback: the dynamics of iterative prompting with midjourney[EB/OL]. [2025-05-09]. https://aclanthology.org/2023.emnlp-main.253/.

生成模型	适用任务	计算效率	准确性
VAE	低资源场景的简单数据增强	高(一次编码-解码)	低，生成图像模型，细节不清晰
GAN	自然图像增强，风格迁移	较高，需要对抗训练，不稳定收敛	高，但易模式崩溃
扩散模型	小规模数据集增强，医学数据增强	低，需要多次采样	最高，实现语义一致性和生成图像多样性
VAR	实时图像生成	高，单步采样	较高，接近扩散模型的准确性

生成模型	适用任务	计算效率	准确性
VAE	低资源场景的简单数据增强	高(一次编码-解码)	低，生成图像模型，细节不清晰
GAN	自然图像增强，风格迁移	较高，需要对抗训练，不稳定收敛	高，但易模式崩溃
扩散模型	小规模数据集增强，医学数据增强	低，需要多次采样	最高，实现语义一致性和生成图像多样性
VAR	实时图像生成	高，单步采样	较高，接近扩散模型的准确性

方法	特点	文献
基于提示优化的生成式图像数据增强	利用大语言模型重定义多样化的提示，从而实现多样化的图像生成	[40-49]
基于潜在空间扰动的生成式图像数据增强	通过扰动生成模型的潜在空间，同时为了实现生成的可控性，加入一系列约束(如语义一致性，分布一致性等)，从而实现多样且类别已知的图像生成	[50-57]
基于人机交互的生成式图像数据增强	在图像生成的过程中引入人类反馈 (Human feedback)，使生成的图像更符合人类的视觉要求	[58-74]

方法	特点	文献
基于提示优化的生成式图像数据增强	利用大语言模型重定义多样化的提示，从而实现多样化的图像生成	[40-49]
基于潜在空间扰动的生成式图像数据增强	通过扰动生成模型的潜在空间，同时为了实现生成的可控性，加入一系列约束(如语义一致性，分布一致性等)，从而实现多样且类别已知的图像生成	[50-57]
基于人机交互的生成式图像数据增强	在图像生成的过程中引入人类反馈 (Human feedback)，使生成的图像更符合人类的视觉要求	[58-74]

方法	方法简述	文献	优缺点
分布对齐过滤	生成过程中进行图像过滤，旨在对齐生成数据与原始数据的整体分布	[13,42,50,52,56]	优点：避免分布偏移缺点：限制生成数据的多样性，难以处理边缘分布
语义与类别的选择过滤	生成过程或生成后对图像进行过滤，旨在使生成图像的语义与原始图像保持一致	[10,41,40,43-49,51-55,57]	优点：确保了语义的一致性缺点：无法处理语言歧义问题，受限于使用模型的token限制
基于人机交互的选择方法	生成后对图像进行选择编辑，通过可视化的人机交互系统对图像进行多次优化以生成符合人类审美的图像	[58-74]	优点：生成结果的可控性强，更符合人类的视觉需求缺点：时间和人力成本太高