Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2021, Vol. 42 ›› Issue (1): 32-36.DOI: 10.11996/JG.j.2095-302X.2021010032

• Image Processing and Computer Vision • Previous Articles     Next Articles

Attention-guided Dropout for image classification

  

  1. (1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; 2. South-to-North Water Diversion Middle Route Information Technology Co., Ltd., Beijing 100176, China) 
  • Online:2021-02-28 Published:2021-01-29
  • Supported by:
    National Key Research and Development Program of China (2019YFF0303300, 2019YFF0303302); National Natural Science Foundation of China (61773071, 61922015, U19B2036); Beijing Academy of Artificial Intelligence (BAAI2020ZJ0204); Beijing Nova Program Interdisciplinary Cooperation Project (Z191100001119140); Scholarship from China Scholarship Council (202006470036); BUPT Excellent Ph.D. Students Foundation (CX2020105, CX2019109) 

Abstract: When a large-scale neural network is trained on a small training set, it typically yields “overfitting”, i.e., the model performs poorly on held-out test data. Therefore, various Dropout techniques have been proposed to alleviate this problem. However, the aforementioned methods cannot directly encourage the model to learn the less discriminative parts, which is also important to reducing overfitting. To address this problem, we proposed an attention-guided Dropout (AD), which utilized the self-attention mechanism to alleviate the co-adaptation of feature detectors more effectively. The AD comprised two distinctive components, the importance measurement mechanism for feature maps and the Dropout with a learnable probability. The importance measurement mechanism calculated the degree of importance for each feature map in whole by a Squeeze-and-Excitation block. The Dropout with a learnable probability can force the “bad” neurons to learn a better representation by dropping the “good” neurons. Therefore, it will diminish the co-adaptation and encourage models to learn the less discriminative part. The experimental results show that the proposed method can be easily applied to various convolutional neural network (CNN) architectures, thus yielding better performance. 

Key words:  , deep neural network, overfitting, Dropout, self-attention mechanism, image classification

CLC Number: