基于生成模型的图像数据增强方法综述

doi:10.11996/JG.j.2095-302X.2026020235

图学学报 ›› 2026, Vol. 47 ›› Issue (2): 235-250.DOI: 10.11996/JG.j.2095-302X.2026020235

基于生成模型的图像数据增强方法综述

向婷¹^,², 唐卓¹^,²^,³, 郑佳丽⁴, 陈长建¹^,²(), 吕斐¹^,², 李肯立¹^,²

¹ 湖南大学信息科学与工程学院，湖南长沙 410082
² 超算与人工智能融合计算教育部重点实验室，湖南长沙 410082
³ 湖南大学深圳研究院，广东深圳 518000
⁴ 复杂系统控制与智能协同全国重点实验室，北京 100074

收稿日期:2025-07-10 接受日期:2025-10-20 出版日期:2026-04-30 发布日期:2026-05-20
通讯作者:陈长建，E-mail：changjianchen@hnu.edu.cn
基金资助:
国家自然科学基金(62225205);国家自然科学基金(62402167);深圳市自然科学基金面上项目(JCYJ20210324140002006);长沙市科技重大专项(kh2301011);湖南省科技创新计划(2023ZJ1080);湖南省自然科学基金(2025JJ60419);湖南省重大科技专项(2024QK2010);湖南省重大科技专项(2024QK2009);云南省重大科技专项计划项目(202502AD080009);岳麓山实验室种业专项(YLS-2025-ZY01015)

A review of image data augmentation based on generative models

XIANG Ting¹^,², TANG Zhuo¹^,²^,³, ZHENG Jiali⁴, CHEN Changjian¹^,²(), LYU Fei¹^,², LI Kenli¹^,²

¹ College of Computer Science and Electronic Engineering, Hunan University, Changsha Hunan 410082, China
² The Ministry of Education Key Laboratory of Fusion Computing of Supercomputing and Artificial Intelligence, Changsha Hunan 410082, China
³ Shenzhen Research Institute, Hunan University, Shenzhen Guangdong 518000, China
⁴ National Key Laboratory of Complex System Control and Intelligent Agent Cooperation, Beijing 100074, China

Received:2025-07-10 Accepted:2025-10-20 Published:2026-04-30 Online:2026-05-20
Contact: CHEN Changjian，E-mail：changjianchen@hnu.edu.cn
Supported by:
National Natural Science Foundation of China(62225205);National Natural Science Foundation of China(62402167);Shenzhen Basic Research Project(JCYJ20210324140002006);Science and Technology Program of Changsha(kh2301011);Science and Technology Innovation Program of Hunan Province(2023ZJ1080);Hunan Natural Science Foundation(2025JJ60419);Major Science and Technology Research Projects of Hunan Province(2024QK2010);Major Science and Technology Research Projects of Hunan Province(2024QK2009);Yunnan Provincial Major Science and Technology Special Plan Projects(202502AD080009);Yuelushan Laboratory Breeding Program(YLS-2025-ZY01015)

摘要/Abstract

摘要：

深度学习在计算机视觉领域展现出巨大的潜力，但其在实际应用中的表现依赖于大量高质量的标注数据。生成模型因其具有生成多样化数据的能力，成为解决数据稀缺问题的有效方法，旨在高效率且有效地为计算机视觉提供训练数据。进而，基于生成模型的图像数据增强技术成为近年来的热点方向。为此，对基于生成模型的图像数据增强方法进行了全面的文献调研，通过三阶段检索方法得到相关的37篇文献，将其方法过程总结归纳为4个步骤，并对每一个步骤采用的方法进行分类与详细描述。首先，从生成模型的选择出发，介绍可用于图像数据增强的各类生成模型；然后，对生成式图像数据增强方法进行分类，并详细介绍每个类别的方法流程和代表性论文，以及存在的问题和亟需优化的方向；考虑到生成数据存在噪声的问题，还介绍了对生成数据的选择和处理方法，以更好地在下游任务中利用生成数据；接着，对数据增强效果验证方法进行分类与描述，以全面验证方法的有效性和鲁棒性。最后，详细阐述了生成式图像数据增强在生成图像的语义一致性、多样性、生成效率和对黑盒模型的应用等方向面临的机遇与挑战，并指出未来潜在的探索方向。

关键词: 数据稀缺, 生成模型, 数据增强, 生成图像数据处理, 效果验证

Abstract:

Deep learning has shown great potential in the field of computer vision, but its performance in practical applications relies heavily on large amounts of high-quality labeled data. Generative models, with their ability to generate diverse data, have become an effective solution to the problem of data scarcity, aiming to provide training data for computer vision efficiently and effectively. Consequently, image data augmentation techniques based on generative models have become a popular research direction in recent years. To this end, a comprehensive literature review was conducted on image data augmentation methods based on generative models. Through a three-stage retrieval process, 37 relevant studies were collected. The methodological processes of these studies were summarized into four main steps, with each step categorized and described in detail. First, various generative models suitable for image data augmentation were introduced, focusing on model selection. Next, generative image data augmentation methods were classified, with elaborations on the workflow, representative studies, existing challenges, and areas in need of optimization for each category. Considering that generated data may contain noise, methods were also discussed for the selection and processing of generated data to better utilize them in downstream tasks. Furthermore, evaluation methods were categorized and described to comprehensively verify the effectiveness and robustness of data augmentation approaches. Finally, the opportunities and challenges faced by generative image data augmentation in aspects were elaborated upon, such as maintaining semantic consistency, ensuring diversity, improving generation efficiency, and applying to black-box models, and pointed out potential directions for future exploration.

Key words: data scarcity, generative model, data augmentation, generated image data processing, effect evaluation

中图分类号:

向婷, 唐卓, 郑佳丽, 陈长建, 吕斐, 李肯立. 基于生成模型的图像数据增强方法综述[J]. 图学学报, 2026, 47(2): 235-250.

XIANG Ting, TANG Zhuo, ZHENG Jiali, CHEN Changjian, LYU Fei, LI Kenli. A review of image data augmentation based on generative models[J]. Journal of Graphics, 2026, 47(2): 235-250.

图/表 12

参考文献 114

[1]	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 248-255.
[2]	QI G J, LUO J B. Small data challenges in big data era: a survey of recent progress on unsupervised and semi-supervised methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 2168-2187. DOI URL
[3]	DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1708.04552.
[4]	ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation[C]// The 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 13001-13008.
[5]	HENDRYCKS D, MU N, CUBUK E D, et al. AugMix: a simple data processing method to improve robustness and uncertainty[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2020.html#HendrycksMCZGL20.
[6]	ZHANG L J, DENG Z, KAWAGUCHI K, et al. How does mixup help with robustness and generalization?[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2010.04819.
[7]	CUBUK E D, ZOPH B, MANÉ D, et al. AutoAugment: learning augmentation strategies from data[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 113-123.
[8]	CUBUK E D, ZOPH B, SHLENS J, et al. Randaugment: practical automated data augmentation with a reduced search space[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2020: 3008-3017.
[9]	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6022-6031.
[10]	ZHANG Y F, ZHOU D Q, HOOI B, et al. Expanding small-scale datasets with guided imagination[C]// The 37th International Conference on Neural Information Processing Systems. New York: IEEE Press, 2023: 3346.
[11]	ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10674-10685.
[12]	HUANG L H, CHEN D, LIU Y, et al. Composer: creative and controllable image synthesis with composable conditions[C]// The 40th International Conference on Machine Learning. New York: ACM, 2023: 558.
[13]	ZHENG C Y, WU G Q, LI C X. Toward understanding generative data augmentation[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2352.
[14]	杨锁荣, 杨洪朝, 申富饶, 等. 面向深度学习的图像数据增强综述[J]. 软件学报, 2025, 36(3): 1390-1412.
	YANG S R, YANG H C, SHEN F R, et al. Image data augmentation for deep learning: a survey[J]. Journal of Software, 2025, 36(3): 1390-1412 (in Chinese).
[15]	CHEN Y H, YAN Z H, ZHU Y J. A comprehensive survey for generative data augmentation[J]. Neurocomputing, 2024, 600: 128167. DOI URL
[16]	KUMAR T, BRENNAN R, MILEO A, et al. Image data augmentation approaches: a comprehensive survey and future directions[J]. IEEE Access, 2024, 12: 187536-187571. DOI URL
[17]	ZHOU Y, GUO C L, WANG X, et al. A survey on data augmentation in large model era[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2401.15422v2.
[18]	KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1312.6114.
[19]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[EB/OL]. [2025-05-09]. https://arxiv.org/pdf/1406.02661.
[20]	RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1511.06434.
[21]	KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2018.html#KarrasALL18.
[22]	KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 4396-4405.
[23]	SONG J M, MENG C L, ERMON S. Denoising diffusion implicit models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2021.html#SongME21.
[24]	ZHANG G Q, NIWA K, KLEIJN W B. On accelerating diffusion-based sampling processes via improved integration approximation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#0003NK24.
[25]	XIA M F, SHEN Y J, LEI C S, et al. Towards more accurate diffusion model acceleration with a timestep tuner[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 5736-5745.
[26]	SONG Y, DHARIWAL P, CHEN M, et al. Consistency models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2023.html#SongD0S23.
[27]	PODELL D, ENGLISH Z, LACEY K, et al. SDXL: improving latent diffusion models for high-resolution image synthesis[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#PodellELBDMPR24.
[28]	BETKER J, GOH G, JING L, et al. Improving image generation with better captions[EB/OL]. [2025-05-09]. https://readwise-assets.s3.amazonaws.com/media/wisereads/articles/improving-image-generation-wit/Dalle3_DkCZRcG.pdf.
[29]	HO J, SALIMANS T. Classifier-free diffusion guidance[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2207.12598.
[30]	WALLACE B, GOKUL A, ERMON S, et al. End-to-end diffusion latent optimization improves classifier guidance[C]// The IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 7246-7256.
[31]	WU S, LIN Y T, ZHANG F H, et al. Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 3873.
[32]	WOLF R, SHI Y T, LIU S, et al. Diffusion models for robotic manipulation: a survey[J]. Frontiers in Robotics and AI, 2025, 12: 1606247. DOI URL
[33]	PAN C Y, YI Z J, SHI G Y, et al. Model-based diffusion for trajectory optimization[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 1846.
[34]	HO J, SALIMANS T, GRITSENKO A, et al. Video diffusion models[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 628.
[35]	HUANG Z Q, HE Y N, YU J S, et al. VBench: comprehensive benchmark suite for video generative models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 21807-21818.
[36]	BLATTMANN A, ROMBACH R, LING H, et al. Align your latents: high-resolution video synthesis with latent diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22563-22575.
[37]	TIAN K Y, JIANG Y, YUAN Z H, et al. Visual autoregressive modeling: scalable image generation via next-scale prediction[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 2694.
[38]	SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[C]// The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 2643.
[39]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 574.
[40]	TRABUCCO B, DOHERTY K, GURINAS M, et al. Effective data augmentation with diffusion models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2024.html#TrabuccoDGS24.
[41]	HE R F, SUN S Y, YU X, et al. Is synthetic data from generative models ready for image recognition?[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HeS0XZTBQ23.
[42]	TIAN Y L, FAN L J, ISOLA P, et al. StableRep: synthetic images from text-to-image models make strong visual representation learners[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2098.
[43]	LI B H, XU X, WANG X H, et al. Semantic-guided generative image augmentation method with diffusion models for image classification[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 3018-3027.
[44]	SINGH K, NAVARATNAM T, HOLMER J, et al. Is synthetic data all we need? Benchmarking the robustness of models trained with synthetic images[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 2505-2515.
[45]	RAHAT F, HOSSAIN M S, AHMED M R, et al. Data augmentation for image classification using generative AI[C]// 2025 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2025: 4173-4182.
[46]	JUNG K, SEO Y, CHO S, et al. DALDA: data augmentation leveraging diffusion model and LLM with adaptive guidance scaling[C]// European Conference on Computer Vision. Cham: Springer, 2025: 182-200.
[47]	BENIGMIM Y, ROY S, ESSID S, et al. One-shot unsupervised domain adaptation with personalized diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 698-708.
[48]	BANSAL H, GROVER A. Leaving reality to imagination: Robust classification via generated datasets[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2302.02503.
[49]	SARIYILDIZ M B, ALAHARI K, LARLUS D, et al. Fake it till you make it: learning transferable representations from synthetic ImageNet clones[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 8011-8021.
[50]	AZIZIS, KORNBLITH S, SAHARIA C, et al. Synthetic data from diffusion models improves ImageNet classification[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2304.08466.
[51]	SAMUEL D, BEN-ARI R, RAVIV S, et al. Generating images of rare concepts using pre-trained diffusion models[C]// The 38th AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2024: 4695-4703.
[52]	ZHU H W, YANG L, YONG J H, et al. Distribution-aware data expansion with diffusion models[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 3264.
[53]	ISLAM K, ZAHEER M Z, MAHMOOD A, et al. Diffusemix: label-preserving data augmentation with diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 27611-27620.
[54]	WANG Z C, WEI L H, WANG T, et al. Enhance image classification via inter-class image mixup with diffusion model[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 17223-17233.
[55]	SASTRY C S, DUMPALA S H, OORE S. DiffAug: a diffuse-and-denoise augmentation for training robust classifiers[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 654.
[56]	ZHOU Y C, SAHAK H, BA J. Using synthetic data for data augmentation to improve classification accuracy[EB/OL]. [2025-05-09]. https://openreview.net/pdf?id=42xAKgIb2P.
[57]	FU Y X, CHEN C Q, QIAO Y, et al. DreamDA: generative data augmentation with diffusion models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2403.12803.
[58]	DING L, ZHANG J, CLUNE J, et al. Quality diversity through human feedback: towards open-ended diversity-driven optimization[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2024.html#0010ZCSL24.
[59]	HÖFERLIN B, NETZEL R, HÖFERLIN M, et al. Inter-active learning of ad-hoc classifiers for video visual analytics[C]// 2012 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2012: 23-32.
[60]	CHEN C J, WANG Z W, WU J, et al. Interactive graph construction for graph-based semi-supervised learning[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(9): 3701-3716. DOI URL
[61]	YANG W K, YE X, ZHANG X X, et al. Diagnosing ensemble few-shot classifiers[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(9): 3292-3306. DOI URL
[62]	CHEN C J, WU U, WANG X H, et al. Towards better caption supervision for object detection[J]. IEEE Transactions on Visualization and Computer Graphics, 2022, 28(4): 1941-1954. DOI URL
[63]	CHEN C J, CHEN J S, YANG W K, et al. Enhancing single-frame supervision for better temporal action localization[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(6): 2903-2915. DOI URL
[64]	HOQUE M N, HE W B, SHEKAR A K, et al. Visual concept programming: a visual analytics approach to injecting human intelligence at scale[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(1): 74-83.
[65]	HE J B, WANG X B, WONG K K, et al. VideoPro: a visual analytics approach for interactive video programming[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 87-97. DOI URL
[66]	LI S S, LIU G Z, WEI T X, et al. EvoVis: a visual analytics method to understand the labeling iterations in data programming[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(3): 1802-1817. DOI URL
[67]	FENG Y C J, WANG X B, WONG K K, et al. PromptMagician: interactive prompt engineering for text-to-image creation[J]. IEEE Transactions on Visualization and Computer Graphics, 2024, 30(1): 295-305.
[68]	WANG Y L, SHEN S Y, LIM B Y. RePrompt: automatic prompt editing to refine AI-generative art towards precise expressions[C]// The 2023 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2023: 22.
[69]	WANG Z J, HUANG Y H, SONG D, et al. PromptCharm: text-to-image generation through multi-modal prompting and refinement[C]// The 2024 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2024: 185.
[70]	BRADE S, WANG B, SOUSA M, et al. Promptify: text-to-image generation through interactive prompt exploration with large language models[C]// The 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 96.
[71]	CHUNG J J Y, ADAR E. PromptPaint: steering text-to-image generation through paint medium-like interactions[C]// The 36th Annual ACM Symposium on User Interface Software and Technology. New York: ACM, 2023: 6.
[72]	GUO Y H, SHAO H N, LIU C, et al. PrompTHis: visualizing the process and influence of prompt editing during text-to-image creation[J]. IEEE Transactions on Visualization and Computer Graphics, 2025, 31(9): 4547-4559. DOI URL
[73]	GOU L, ZOU L C, LI N X, et al. VATLD: a visual analytics system to assess, understand and improve traffic light detection[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 261-271. DOI URL
[74]	ENDERT A, HAN C, MAITI D, et al. Observation-level interaction with statistical models for visual analytics[C]// 2011 IEEE Conference on Visual Analytics Science and Technology. New York: IEEE Press, 2011: 121-130.
[75]	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2302.13971.
[76]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre- training[EB/OL]. [2025-05-09]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[77]	RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. [2025-05-09]. https://gwern.net/doc/ai/nn/transformer/gpt/2/2019-radford.pdf.
[78]	BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// The 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 159.
[79]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140): 1-67.
[80]	WANG R, LIU T, HSIEH C J, et al. On discrete prompt optimization for diffusion models[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2024.html#Wang0HG24.
[81]	MAHAJAN S, RAHMAN T, YI K M, et al. Prompting hard or hardly prompting: prompt inversion for text-to-image diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2024: 6808-6817.
[82]	HAO Y R, CHI Z W, DONG L, et al. Optimizing prompts for text-to-image generation[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 2923.
[83]	LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/icml/icml2022.html#0001LXH22.
[84]	GAL R, ALALUF Y, ATZMON Y, et al. An image is worth one word: personalizing text-to-image generation using textual inversion[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#GalAAPBCC23.
[85]	HERTZ A, MOKADY R, TENENBAUM J, et al. Prompt-to- prompt image editing with cross-attention control[EB/OL]. [2025-05-09]. https://dblp.uni-trier.de/db/conf/iclr/iclr2023.html#HertzMTAPC23.
[86]	RUIZ N, LI Y Z, JAMPANI V, et al. DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22500-22510.
[87]	LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C]// The 37th International Conference on Neural Information Processing Systems. New York: ACM, 2023: 1516.
[88]	PENG Z L, WANG W H, DONG L, et al. Kosmos-2:grounding multimodal large language models to the world[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2306.14824.
[89]	WU C H, DE LA TORRE F. Unifying diffusion models’ latent space, with applications to CycleDiffusion and guidance[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2210.05559.
[90]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2025-05-09]. https://proceedings.mlr.press/v139/radford21a.
[91]	ASOKAN M, WU K B, ALBREIKI F. FineLIP: extending CLIP's reach via fine-grained alignment with longer text inputs[C]// The Computer Vision and Pattern Recognition Conference. New York: IEEE Press, 2025: 14495-14504.
[92]	YUAN J, CHEN C J, YANG W K, et al. A survey of visual analytics techniques for machine learning[J]. Computational Visual Media, 2021, 7(1): 3-36. DOI URL
[93]	SETTLES B. Active learning literature survey[R]. Madison: University of Wisconsin-Madison, 2009.
[94]	ESSER P, KULAL S, BLATTMANN A, et al. Scaling rectified flow transformers for high-resolution image synthesis[EB/OL]. [2025-05-09]. https://icml.cc/virtual/2024/oral/35548.
[95]	BAKER J. Using style ambiguity loss to improve aesthetics of diffusion models[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2410.02055v1.
[96]	YAMAGUCHI S, FUKUDA T. On the limitation of diffusion models for synthesizing training datasets[EB/OL]. [2025-05-09]. https://nips.cc/virtual/2023/78391.
[97]	CHEN P G, LIU S, ZHAO H S, et al. GridMask data augmentation[EB/OL]. [2025-05-09]. https://arxiv.org/abs/2001.04086.
[98]	SANCHO J C, BARKER K J, KERBYSON D J, et al. Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications[C]// The 2006 ACM/IEEE Conference on Supercomputing. New York: IEEE Press, 2006: 17.
[99]	PARKHI O M, VEDALDI A, ZISSERMAN A, et al. Cats and dogs[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2012: 3498-3505.
[100]	NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]// The 6th Indian Conference on Computer Vision, Graphics & Image Processing. New York: IEEE Press, 2008: 722-729.
[101]	KRIZHEVSKY A. Learning multiple layers of features from tiny images[R]. Toronto: University of Toronto, 2009.
[102]	BARRATT S, SHARMA R. A note on the inception score[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1801.01973.
[103]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640.
[104]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[105]	XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// The IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5987-5995.
[106]	ZAGORUYKO S, KOMODAKIS N. Wide residual networks[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1605.07146.
[107]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4510-4520.
[108]	VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605.
[109]	TENENBAUM J B, DE SILVA V, LANGFORD J C. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323. DOI PMID
[110]	MCINNES L, HEALY J, MELVILLE J. UMAP: uniform manifold approximation and projection for dimension reduction[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1802.03426.
[111]	SHLENS J. A tutorial on principal component analysis[EB/OL]. [2025-05-09]. https://arxiv.org/abs/1404.1100.
[112]	吕斐, 陈长建, 张嘉鹏, 等. 面向超级计算机系统的可视化综述[J]. 计算机辅助设计与图形学学报, 2024, 36(3): 321-335.
	LYU F, CHEN C J, ZHANG J P, et al. Visualization for supercomputer system: a survey[J]. Journal of Computer- Aided Design & Computer Graphics, 2024, 36(3): 321-335 (in Chinese).
[113]	XIE S R, XIAO Z S, KINGMA D P, et al. EM distillation for one-step diffusion models[C]// The 38th International Conference on Neural Information Processing Systems. New York: ACM, 2024: 1432.
[114]	DON-YEHIYA S, CHOSHEN L, ABEND O. Human learning by model feedback: the dynamics of iterative prompting with midjourney[EB/OL]. [2025-05-09]. https://aclanthology.org/2023.emnlp-main.253/.

生成模型	适用任务	计算效率	准确性
VAE	低资源场景的简单数据增强	高(一次编码-解码)	低，生成图像模型，细节不清晰
GAN	自然图像增强，风格迁移	较高，需要对抗训练，不稳定收敛	高，但易模式崩溃
扩散模型	小规模数据集增强，医学数据增强	低，需要多次采样	最高，实现语义一致性和生成图像多样性
VAR	实时图像生成	高，单步采样	较高，接近扩散模型的准确性

生成模型	适用任务	计算效率	准确性
VAE	低资源场景的简单数据增强	高(一次编码-解码)	低，生成图像模型，细节不清晰
GAN	自然图像增强，风格迁移	较高，需要对抗训练，不稳定收敛	高，但易模式崩溃
扩散模型	小规模数据集增强，医学数据增强	低，需要多次采样	最高，实现语义一致性和生成图像多样性
VAR	实时图像生成	高，单步采样	较高，接近扩散模型的准确性

方法	特点	文献
基于提示优化的生成式图像数据增强	利用大语言模型重定义多样化的提示，从而实现多样化的图像生成	[40-49]
基于潜在空间扰动的生成式图像数据增强	通过扰动生成模型的潜在空间，同时为了实现生成的可控性，加入一系列约束(如语义一致性，分布一致性等)，从而实现多样且类别已知的图像生成	[50-57]
基于人机交互的生成式图像数据增强	在图像生成的过程中引入人类反馈 (Human feedback)，使生成的图像更符合人类的视觉要求	[58-74]

方法	特点	文献
基于提示优化的生成式图像数据增强	利用大语言模型重定义多样化的提示，从而实现多样化的图像生成	[40-49]
基于潜在空间扰动的生成式图像数据增强	通过扰动生成模型的潜在空间，同时为了实现生成的可控性，加入一系列约束(如语义一致性，分布一致性等)，从而实现多样且类别已知的图像生成	[50-57]
基于人机交互的生成式图像数据增强	在图像生成的过程中引入人类反馈 (Human feedback)，使生成的图像更符合人类的视觉要求	[58-74]

方法	方法简述	文献	优缺点
分布对齐过滤	生成过程中进行图像过滤，旨在对齐生成数据与原始数据的整体分布	[13,42,50,52,56]	优点：避免分布偏移缺点：限制生成数据的多样性，难以处理边缘分布
语义与类别的选择过滤	生成过程或生成后对图像进行过滤，旨在使生成图像的语义与原始图像保持一致	[10,41,40,43-49,51-55,57]	优点：确保了语义的一致性缺点：无法处理语言歧义问题，受限于使用模型的token限制
基于人机交互的选择方法	生成后对图像进行选择编辑，通过可视化的人机交互系统对图像进行多次优化以生成符合人类审美的图像	[58-74]	优点：生成结果的可控性强，更符合人类的视觉需求缺点：时间和人力成本太高

基于生成模型的图像数据增强方法综述

A review of image data augmentation based on generative models

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 114

相关文章 15

编辑推荐

Metrics

本文评价

方法	语义一致性	多样性	计算开销	典型评估指标
基于提示优化的生成式图像数据增强	中	较高，通过提示控制	低	Top-1准确率
基于潜在空间扰动的生成式图像数据增强	中	高	低	Top-1准确率，FID，KL散度
基于人机交互的生成式图像数据增强	高	高	最高，需要多轮反馈优化	准确率，人工评价指标

[1]	赵振兵, 张靖梁, 唐辰康, 毕雨轩, 李浩鹏. 面向积水干扰的变电设备渗漏油精准分割方法[J]. 图学学报, 2026, 47(2): 296-310.
[2]	陈梦琪, 赵俊莉, 邓晓丹. 基于大模型的皮肤病图像掩膜生成与分割[J]. 图学学报, 2026, 47(2): 322-331.
[3]	刘德丰, 陈伟政, 白亚强, 刘凯, 王琦. 基于条件生成模型的船型概念方案正向设计方法探索[J]. 图学学报, 2025, 46(6): 1209-1215.
[4]	黄敬, 时瑞浩, 宋文明, 郭和攀, 魏璜, 魏小松, 姚剑. 自动驾驶图像合成方法综述：从模拟器到新范式[J]. 图学学报, 2025, 46(5): 931-949.
[5]	方程浩, 王康侃. 基于半监督学习的单视角点云三维人体姿态与形状估计[J]. 图学学报, 2025, 46(2): 393-401.
[6]	杨红菊, 高敏, 张常有, 薄文, 武文佳, 曹付元. 一种面向图像修复的局部优化生成模型[J]. 图学学报, 2023, 44(5): 955-965.
[7]	曹义亲, 周一纬, 徐露. 基于E-YOLOX的实时金属表面缺陷检测算法[J]. 图学学报, 2023, 44(4): 677-690.
[8]	王道累, 康博, 朱瑞. 基于深度学习的电力设备铭牌文本检测方法[J]. 图学学报, 2023, 44(4): 691-698.
[9]	史彩娟, 石泽, 闫巾玮, 毕阳阳. 基于双语义双向对齐VAE的广义零样本学习[J]. 图学学报, 2023, 44(3): 521-530.
[10]	曾武, 朱恒亮, 邢树礼, 林江宏, 毛国君. 显著性检测引导的图像数据增强方法[J]. 图学学报, 2023, 44(2): 260-270.
[11]	范震, 刘晓静, 李小波, 崔亚超. 一种对光照和遮挡鲁棒的单应性估计方法[J]. 图学学报, 2023, 44(1): 166-176.
[12]	赵辉 , 赵尧 , 金林林 , 董兰芳 , 肖潇 . 基于 YOLOX 的小目标烟火检测技术研究与实现[J]. 图学学报, 2022, 43(5): 783-790.
[13]	孙宗康 , 饶睦敏 , 曹裕灵 , 史艳丽 . 基于小样本不均衡数据的供水管道泄漏智能检测算法 [J]. 图学学报, 2022, 43(5): 825-831.
[14]	陈昭俊, 储珺, 曾伦杰. 基于动态加权类别平衡损失的多类别口罩佩戴检测[J]. 图学学报, 2022, 43(4): 590-598.
[15]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.