显著性检测引导的图像数据增强方法

doi:10.11996/JG.j.2095-302X.2023020260

图学学报 ›› 2023, Vol. 44 ›› Issue (2): 260-270.DOI: 10.11996/JG.j.2095-302X.2023020260

• 图像处理与计算机视觉 • 上一篇下一篇

显著性检测引导的图像数据增强方法

曾武¹(), 朱恒亮¹, 邢树礼¹, 林江宏¹, 毛国君¹^,²()

1.福建工程学院计算机科学与数学学院，福建福州 350118
2.福建省大数据挖掘与应用重点实验室，福建福州 350118

收稿日期:2022-06-02 接受日期:2022-08-21 出版日期:2023-04-30 发布日期:2023-05-01
通讯作者: 毛国君(1966-)，男，教授，博士。主要研究方向为数据挖掘、大数据和分布式计算。E-mail：19662090@fjut.edu.cn
作者简介:曾武(1997-)，男，硕士研究生。主要研究方向为图像数据增强和小样本学习。E-mail：2201905122@smail.fjut.edu.cn
基金资助:
国家自然科学基金项目(61773415);国家重点研发项目(2019YFD0900805)

Saliency detection-guided for image data augmentation

ZENG Wu¹(), ZHU Heng-liang¹, XING Shu-li¹, LIN Jiang-hong¹, MAO Guo-jun¹^,²()

1. School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou Fujian 350118, China
2. Fujian Key Laboratory of Big Data Mining and Applications, Fuzhou Fujian 350118, China

Received:2022-06-02 Accepted:2022-08-21 Online:2023-04-30 Published:2023-05-01
Contact: MAO Guo-jun (1966-), professor, Ph.D. His main research interests cover data mining, big data and distributed computing. E-mail：19662090@fjut.edu.cn
About author:ZENG Wu (1997-), master student. His main research interests cover image data augmentation and few-shot learning. E-mail：2201905122@smail.fjut.edu.cn
Supported by:
National Natural Science Foundation of China(61773415);National Key Research and Development Project(2019YFD0900805)

摘要/Abstract

摘要：

针对多数数据增强方法在裁剪区域的选择中过于随机，以及多数方法过分关注图像中的特征显著区域而忽略了对图像中鉴别性较差区域进行加强学习，提出SaliencyOut以及SaliencyCutMix方法，旨在加强对图像中鉴别性较差区域特征的学习。具体来说，SaliencyOut首先利用显著性检测技术生成原图像的显著性映射图，之后在显著性图中寻找一个特征显著区域，接着将此区域中的像素去除。SaliencyCutMix则是将原图像的裁剪区域去除之后，使用补丁图像中相同区域的图块进行替换。通过对图像中部分特征显著区域的遮挡或替换，引导模型学习关于目标对象的其他特征。此外，针对在裁剪区域较大时，可能丢失过多显著特征区域的问题，提出在裁剪边界的选定中加入自适应缩放因子。该因子可以根据裁剪区域边界初始大小的不同，动态地调整裁剪边界。在4个数据集中的实验表明：本文方法可显著提升模型的分类性能以及抗干扰能力，优于多数先进方法。尤其是在Mini-ImageNet数据集中，应用于ResNet-34网络，SaliencyCutMix相较于CutMix的Top-1准确率提升了1.18%。

关键词: 数据增强, 图像分类, 深度学习, 显著性检测, 图像混合

Abstract:

In view of the fact that most data augmentation methods tend to be overly random in their selection of cropped regions, and tend to place too much emphasis on the feature salient regions in the image while neglecting the reinforcement learning of the poorly discriminative regions in the image, the SaliencyOut and SaliencyCutMix methods were proposed to enhance the learning of poorly discriminative regions in images. Specifically, SaliencyOut first employed the saliency detection technology to generate a saliency map of the original image, subsequently identifying a feature salient area in the saliency map and removing the pixels in this area. SaliencyCutMix, on the other hand, removed the cropped area of the original image and replaced it with the same area of the patch image. By occluding or replacing some feature salient areas in the image, the model was guided to learn other features about the target object. In addition, to address the issue of losing too many salient feature regions in the cases of large cropping areas, an adaptive scaling factor was incorporated in the selection of the cropping boundary. This factor enabled the dynamic adjustment of the size of the cropping boundary according to the difference in the initial size of the cropping area boundary. Experimental results on four datasets showed that the proposed method could significantly improve the classification performance and anti-interference ability of the model, surpassing most advanced methods. In particular, in the Mini-ImageNet dataset, when applied to the ResNet-34 network, SaliencyCutMix could improve the Top-1 accuracy by 1.18% compared to CutMix.

Key words: data enhancement, image classification, deep learning, saliency detection, image mixing

中图分类号:

TP391

曾武, 朱恒亮, 邢树礼, 林江宏, 毛国君. 显著性检测引导的图像数据增强方法[J]. 图学学报, 2023, 44(2): 260-270.

ZENG Wu, ZHU Heng-liang, XING Shu-li, LIN Jiang-hong, MAO Guo-jun. Saliency detection-guided for image data augmentation[J]. Journal of Graphics, 2023, 44(2): 260-270.

图/表 15

参考文献 24

[1]	常东良, 尹军辉, 谢吉洋, 等. 面向图像分类的基于注意力引导的Dropout[J]. 图学学报, 2021, 42(1): 32-36.
	CHANG D L, YIN J H, XIE J Y, et al. Attention-guided Dropout for image classification[J]. Journal of Graphics, 2021, 42(1): 32-36. (in Chinese)
[2]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// The 25th International Conference on Neural Information Processing Systems-Volume 1. New York:ACM, 2012: 1097-1105.
[3]	DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2022-03- 05]. https://arxiv.org/abs/1708.04552.
[4]	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6022-6031.
[5]	GONG C Y, WANG D L, LI M, et al. KeepAugment: a simple information-preserving data augmentation approach[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1055-1064.
[6]	DABOUEI A, SOLEYMANI S, TAHERKHANI F, et al. SuperMix: supervising the mixing data augmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13789-13798.
[7]	UDDIN A F M S, MONIRA M S, SHIN W, et al. SaliencyMix: a saliency guided data augmentation strategy for better regularization[EB/OL]. [2022-03-05]. https://arxiv.org/abs/2006.01791.
[8]	ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 13001-13008.
[9]	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2022-03-05]. https://arxiv.org/abs/1710.09412.
[10]	HENDRYCKS D, MU N, CUBUK E D, et al. AugMix: a simple data processing method to improve robustness and uncertainty[EB/OL]. [2022-03-02]. https://arxiv.org/abs/1912.02781.
[11]	HONG M, CHOI J, KIM G. StyleMix: separating content and style for enhanced data augmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14857-14865.
[12]	MONTABONE S, SOTO A. Human detection using a mobile platform and novel features derived from a visual saliency mechanism[J]. Image and Vision Computing, 2010, 28(3): 391-402. DOI URL
[13]	CHENG M M, MITRA N J, HUANG X L, et al. Global contrast based salient region detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 569-582. DOI URL
[14]	LI C Y, YUAN Y C, CAI W D, et al. Robust saliency detection via regularized random walks ranking[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 2710-2717.
[15]	DENG Z J, HU X W, ZHU L, et al. R³Net: recurrent residual refinement network for saliency detection[C]// The 27th International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2018: 684-690.
[16]	ZHANG L, DAI J, LU H C, et al. A Bi-directional message passing model for salient object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 1741-1750.
[17]	KRIZHEVSKY A. Learning multiple layers of features from tiny images[D]. Petersburg City: University of Tront, 2009.
[18]	VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[EB/OL]. [2022-03-02]. https://arxiv.org/abs/1606.04080.
[19]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. DOI URL
[20]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[21]	HAN D, KIM J, KIM J,. Deep pyramidal residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6307-6315.
[22]	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. [2022-04-06] https://arxiv.53yu.com/pdf/1412.6572.pdf.
[23]	HOU X D, ZHANG L Q. Saliency detection: a spectral residual approach[C]// 2007 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2007: 1-8.
[24]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 618-626.

数据集	类别数	训练数	测试数
CIFAR-10	10	50000	10000
CIFAR-100	100	50000	10000
Mini-ImageNet	100	50000	10000
ImageNet	1000	1300000	50000

数据集	类别数	训练数	测试数
CIFAR-10	10	50000	10000
CIFAR-100	100	50000	10000
Mini-ImageNet	100	50000	10000
ImageNet	1000	1300000	50000

模型+方法	准确率
模型+方法	CIFAR-10 Top-1	CIFAR-100 Top-1
ResNet34	93.18	71.85
ResNet34+Cutout	93.65	72.15
ResNet34+SaliencyOut (本文)	93.86	72.87
ResNet34+Mixup	93.88	73.13
ResNet34+StyleMix	93.34	71.91
ResNet34+CutMix	94.21	73.81
ResNet34+StyleCutMix	94.25	73.73
ResNet34+SaliencyMix	94.33	74.21
ResNet34+SaliencyCutMix (本文)	94.63	74.70
ResNet110	94.09	75.94
ResNet110+Cutout	94.39	76.43
ResNet110+SaliencyOut (本文)	94.76	77.87
ResNet110+Mixup	94. 67	77.58
ResNet110+StyleMix	94.45	76.66
ResNet110+CutMix	95.34	78.29
ResNet110+StyleCutMix	95.39	78.02
ResNet110+SaliencyMix	95.23	78.46
ResNet110+SaliencyCutMix (本文)	95.86	78.96
PyramidNet110 ($\tilde{\alpha }$=64)	95.68	80.24
PyramidNet110 + Cutout	96.13	80.58
PyramidNet110+SaliencyOut (本文)	96.46	81.21
PyramidNet110+Mixup	96.11	81.34
PyramidNet110+StyleMix	95.50	80.43
PyramidNet110+CutMix	96.54	81.84
PyramidNet110+StyleCutMix	96.41	81.99
PyramidNet110+SaliencyMix	96.51	82.15
PyramidNet110+SaliencyCutMix (本文)	96.69	82.27

模型+方法	准确率
模型+方法	CIFAR-10 Top-1	CIFAR-100 Top-1
ResNet34	93.18	71.85
ResNet34+Cutout	93.65	72.15
ResNet34+SaliencyOut (本文)	93.86	72.87
ResNet34+Mixup	93.88	73.13
ResNet34+StyleMix	93.34	71.91
ResNet34+CutMix	94.21	73.81
ResNet34+StyleCutMix	94.25	73.73
ResNet34+SaliencyMix	94.33	74.21
ResNet34+SaliencyCutMix (本文)	94.63	74.70
ResNet110	94.09	75.94
ResNet110+Cutout	94.39	76.43
ResNet110+SaliencyOut (本文)	94.76	77.87
ResNet110+Mixup	94. 67	77.58
ResNet110+StyleMix	94.45	76.66
ResNet110+CutMix	95.34	78.29
ResNet110+StyleCutMix	95.39	78.02
ResNet110+SaliencyMix	95.23	78.46
ResNet110+SaliencyCutMix (本文)	95.86	78.96
PyramidNet110 ($\tilde{\alpha }$=64)	95.68	80.24
PyramidNet110 + Cutout	96.13	80.58
PyramidNet110+SaliencyOut (本文)	96.46	81.21
PyramidNet110+Mixup	96.11	81.34
PyramidNet110+StyleMix	95.50	80.43
PyramidNet110+CutMix	96.54	81.84
PyramidNet110+StyleCutMix	96.41	81.99
PyramidNet110+SaliencyMix	96.51	82.15
PyramidNet110+SaliencyCutMix (本文)	96.69	82.27

模型+方法	Top-1准确率
ResNet34	79.20
ResNet34 + Cutout	78.79
ResNet34 + SaliencyOut (本文)	79.34
ResNet34 + Mixup	79.70
ResNet34 + StyleMix	79.01
ResNet34 + CutMix	80.38
ResNet34 + StyleCutMix	80.10
ResNet34 + SaliencyMix	80.72
ResNet34 + SaliencyCutMix (本文)	81.56

显著性检测引导的图像数据增强方法

Saliency detection-guided for image data augmentation

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 24

相关文章 15

编辑推荐

Metrics

本文评价

模型+方法	Top-1准确率
ResNet50	74.92
ResNet50 + Cutout	75.50
ResNet50 + SaliencyOut (本文)	75.67
ResNet50 + Mixup	75.79
ResNet50 + CutMix	76.64
ResNet50 + StyleCutMix	76.30
ResNet50 + SaliencyMix	76.77
ResNet50 + SaliencyCutMix (本文)	76.89

方法	准确率
方法	FGSM(1) Top-1	FGSM(2) Top-1	FGSM(4) Top-1
Baseline	24.33	15.12	9.45
Cutout	23.46	13.57	9.98
SaliencyOut (本文)	24.57	16.28	10.98
Mixup	24.85	16.01	11.93
CutMix	25.73	16. 01	10.48
StyleCutMix	25.96	17.22	11.62
SaliencyMix	26.30	16.50	10.48
SaliencyCutMix (本文)	26.52	17.41	13.30

方法	Top-1准确率
SaliencyOut (Min)	74.21
SaliencyOut (本文)	74.96
SaliencyCutMix (Min)	75.41
SaliencyCutMix (本文)	76.61

[1]	毕春艳, 刘越. 基于深度学习的视频人体动作识别综述[J]. 图学学报, 2023, 44(4): 625-639.
[2]	曹义亲 , 周一纬 , 徐露 . 基于 E-YOLOX 的实时金属表面缺陷检测算法 [J]. 图学学报, 2023, 44(4): 677-690.
[3]	王道累, 康博, 朱瑞. 基于深度学习的电力设备铭牌文本检测方法 [J]. 图学学报, 2023, 44(4): 691-698.
[4]	邵俊棋, 钱文华, 徐启豪. 基于条件残差生成对抗网络的风景图生成 [J]. 图学学报, 2023, 44(4): 710-717.
[5]	余伟群, 刘佳涛, 张亚萍. 融合注意力的拉普拉斯金字塔单目深度估计 [J]. 图学学报, 2023, 44(4): 728-738.
[6]	郭印宏, 王立春, 李爽. 基于重复性和特异性约束的图像特征匹配 [J]. 图学学报, 2023, 44(4): 739-746.
[7]	毛爱坤, 刘昕明, 陈文壮, 宋绍楼. 改进YOLOv5算法的变电站仪表目标检测方法[J]. 图学学报, 2023, 44(3): 448-455.
[8]	王佳婧, 王晨, 朱媛媛, 王笑梅. 基于民国纸币的图元素匹配检索[J]. 图学学报, 2023, 44(3): 492-501.
[9]	杨柳, 吴晓群. 基于深度学习的三维形状补全研究综述[J]. 图学学报, 2023, 44(2): 201-215.
[10]	罗启明, 吴昊, 夏信, 袁国武. 基于Dual Dense U-Net的云南壁画破损区域预测[J]. 图学学报, 2023, 44(2): 304-312.
[11]	李洪安 , 郑峭雪 , 陶若霖 , 张敏 , 李占利 , 康宝生 . 基于深度学习的图像超分辨率研究综述[J]. 图学学报, 2023, 44(1): 1-15.
[12]	邵英杰, 尹辉, 谢颖, 黄华. 草图引导的选择循环推理式人脸图像修复网络 [J]. 图学学报, 2023, 44(1): 67-76.
[13]	潘东辉, 金映含, 孙旭, 刘玉生, 张东亮. CTH-Net：从线稿和颜色点生成服装图像的 CNN-Transformer 混合网络 [J]. 图学学报, 2023, 44(1): 120-130.
[14]	范震, 刘晓静, 李小波, 崔亚超. 一种对光照和遮挡鲁棒的单应性估计方法 [J]. 图学学报, 2023, 44(1): 166-176.
[15]	朱磊 , 李东彪 , 闫星志 , 刘向阳 , 沈才华 . 基于改进 Mask R-CNN 深度学习算法的隧道裂缝智能检测方法 [J]. 图学学报, 2023, 44(1): 177-183.

方法	Top-1准确率
SPRE	75.29
MONSO	76.61

方法	Top-1准确率
SPRE	75.29
MONSO	76.61