Saliency detection-guided for image data augmentation

doi:10.11996/JG.j.2095-302X.2023020260

Abstract

Abstract:

In view of the fact that most data augmentation methods tend to be overly random in their selection of cropped regions, and tend to place too much emphasis on the feature salient regions in the image while neglecting the reinforcement learning of the poorly discriminative regions in the image, the SaliencyOut and SaliencyCutMix methods were proposed to enhance the learning of poorly discriminative regions in images. Specifically, SaliencyOut first employed the saliency detection technology to generate a saliency map of the original image, subsequently identifying a feature salient area in the saliency map and removing the pixels in this area. SaliencyCutMix, on the other hand, removed the cropped area of the original image and replaced it with the same area of the patch image. By occluding or replacing some feature salient areas in the image, the model was guided to learn other features about the target object. In addition, to address the issue of losing too many salient feature regions in the cases of large cropping areas, an adaptive scaling factor was incorporated in the selection of the cropping boundary. This factor enabled the dynamic adjustment of the size of the cropping boundary according to the difference in the initial size of the cropping area boundary. Experimental results on four datasets showed that the proposed method could significantly improve the classification performance and anti-interference ability of the model, surpassing most advanced methods. In particular, in the Mini-ImageNet dataset, when applied to the ResNet-34 network, SaliencyCutMix could improve the Top-1 accuracy by 1.18% compared to CutMix.

Key words: data enhancement, image classification, deep learning, saliency detection, image mixing

CLC Number:

TP391

ZENG Wu, ZHU Heng-liang, XING Shu-li, LIN Jiang-hong, MAO Guo-jun. Saliency detection-guided for image data augmentation[J]. Journal of Graphics, 2023, 44(2): 260-270.

Figures/Tables 15

References 24

[1]	常东良, 尹军辉, 谢吉洋, 等. 面向图像分类的基于注意力引导的Dropout[J]. 图学学报, 2021, 42(1): 32-36.
	CHANG D L, YIN J H, XIE J Y, et al. Attention-guided Dropout for image classification[J]. Journal of Graphics, 2021, 42(1): 32-36. (in Chinese)
[2]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// The 25th International Conference on Neural Information Processing Systems-Volume 1. New York:ACM, 2012: 1097-1105.
[3]	DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. [2022-03- 05]. https://arxiv.org/abs/1708.04552.
[4]	YUN S, HAN D, CHUN S, et al. CutMix: regularization strategy to train strong classifiers with localizable features[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 6022-6031.
[5]	GONG C Y, WANG D L, LI M, et al. KeepAugment: a simple information-preserving data augmentation approach[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1055-1064.
[6]	DABOUEI A, SOLEYMANI S, TAHERKHANI F, et al. SuperMix: supervising the mixing data augmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 13789-13798.
[7]	UDDIN A F M S, MONIRA M S, SHIN W, et al. SaliencyMix: a saliency guided data augmentation strategy for better regularization[EB/OL]. [2022-03-05]. https://arxiv.org/abs/2006.01791.
[8]	ZHONG Z, ZHENG L, KANG G L, et al. Random erasing data augmentation[C]// The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 13001-13008.
[9]	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2022-03-05]. https://arxiv.org/abs/1710.09412.
[10]	HENDRYCKS D, MU N, CUBUK E D, et al. AugMix: a simple data processing method to improve robustness and uncertainty[EB/OL]. [2022-03-02]. https://arxiv.org/abs/1912.02781.
[11]	HONG M, CHOI J, KIM G. StyleMix: separating content and style for enhanced data augmentation[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 14857-14865.
[12]	MONTABONE S, SOTO A. Human detection using a mobile platform and novel features derived from a visual saliency mechanism[J]. Image and Vision Computing, 2010, 28(3): 391-402. DOI URL
[13]	CHENG M M, MITRA N J, HUANG X L, et al. Global contrast based salient region detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 569-582. DOI URL
[14]	LI C Y, YUAN Y C, CAI W D, et al. Robust saliency detection via regularized random walks ranking[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 2710-2717.
[15]	DENG Z J, HU X W, ZHU L, et al. R³Net: recurrent residual refinement network for saliency detection[C]// The 27th International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2018: 684-690.
[16]	ZHANG L, DAI J, LU H C, et al. A Bi-directional message passing model for salient object detection[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 1741-1750.
[17]	KRIZHEVSKY A. Learning multiple layers of features from tiny images[D]. Petersburg City: University of Tront, 2009.
[18]	VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[EB/OL]. [2022-03-02]. https://arxiv.org/abs/1606.04080.
[19]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. DOI URL
[20]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778.
[21]	HAN D, KIM J, KIM J,. Deep pyramidal residual networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 6307-6315.
[22]	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. [2022-04-06] https://arxiv.53yu.com/pdf/1412.6572.pdf.
[23]	HOU X D, ZHANG L Q. Saliency detection: a spectral residual approach[C]// 2007 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2007: 1-8.
[24]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 618-626.

数据集	类别数	训练数	测试数
CIFAR-10	10	50000	10000
CIFAR-100	100	50000	10000
Mini-ImageNet	100	50000	10000
ImageNet	1000	1300000	50000

数据集	类别数	训练数	测试数
CIFAR-10	10	50000	10000
CIFAR-100	100	50000	10000
Mini-ImageNet	100	50000	10000
ImageNet	1000	1300000	50000

模型+方法	准确率
模型+方法	CIFAR-10 Top-1	CIFAR-100 Top-1
ResNet34	93.18	71.85
ResNet34+Cutout	93.65	72.15
ResNet34+SaliencyOut (本文)	93.86	72.87
ResNet34+Mixup	93.88	73.13
ResNet34+StyleMix	93.34	71.91
ResNet34+CutMix	94.21	73.81
ResNet34+StyleCutMix	94.25	73.73
ResNet34+SaliencyMix	94.33	74.21
ResNet34+SaliencyCutMix (本文)	94.63	74.70
ResNet110	94.09	75.94
ResNet110+Cutout	94.39	76.43
ResNet110+SaliencyOut (本文)	94.76	77.87
ResNet110+Mixup	94. 67	77.58
ResNet110+StyleMix	94.45	76.66
ResNet110+CutMix	95.34	78.29
ResNet110+StyleCutMix	95.39	78.02
ResNet110+SaliencyMix	95.23	78.46
ResNet110+SaliencyCutMix (本文)	95.86	78.96
PyramidNet110 ($\tilde{\alpha }$=64)	95.68	80.24
PyramidNet110 + Cutout	96.13	80.58
PyramidNet110+SaliencyOut (本文)	96.46	81.21
PyramidNet110+Mixup	96.11	81.34
PyramidNet110+StyleMix	95.50	80.43
PyramidNet110+CutMix	96.54	81.84
PyramidNet110+StyleCutMix	96.41	81.99
PyramidNet110+SaliencyMix	96.51	82.15
PyramidNet110+SaliencyCutMix (本文)	96.69	82.27

模型+方法	准确率
模型+方法	CIFAR-10 Top-1	CIFAR-100 Top-1
ResNet34	93.18	71.85
ResNet34+Cutout	93.65	72.15
ResNet34+SaliencyOut (本文)	93.86	72.87
ResNet34+Mixup	93.88	73.13
ResNet34+StyleMix	93.34	71.91
ResNet34+CutMix	94.21	73.81
ResNet34+StyleCutMix	94.25	73.73
ResNet34+SaliencyMix	94.33	74.21
ResNet34+SaliencyCutMix (本文)	94.63	74.70
ResNet110	94.09	75.94
ResNet110+Cutout	94.39	76.43
ResNet110+SaliencyOut (本文)	94.76	77.87
ResNet110+Mixup	94. 67	77.58
ResNet110+StyleMix	94.45	76.66
ResNet110+CutMix	95.34	78.29
ResNet110+StyleCutMix	95.39	78.02
ResNet110+SaliencyMix	95.23	78.46
ResNet110+SaliencyCutMix (本文)	95.86	78.96
PyramidNet110 ($\tilde{\alpha }$=64)	95.68	80.24
PyramidNet110 + Cutout	96.13	80.58
PyramidNet110+SaliencyOut (本文)	96.46	81.21
PyramidNet110+Mixup	96.11	81.34
PyramidNet110+StyleMix	95.50	80.43
PyramidNet110+CutMix	96.54	81.84
PyramidNet110+StyleCutMix	96.41	81.99
PyramidNet110+SaliencyMix	96.51	82.15
PyramidNet110+SaliencyCutMix (本文)	96.69	82.27

模型+方法	Top-1准确率
ResNet34	79.20
ResNet34 + Cutout	78.79
ResNet34 + SaliencyOut (本文)	79.34
ResNet34 + Mixup	79.70
ResNet34 + StyleMix	79.01
ResNet34 + CutMix	80.38
ResNet34 + StyleCutMix	80.10
ResNet34 + SaliencyMix	80.72
ResNet34 + SaliencyCutMix (本文)	81.56