Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (3): 495-504.DOI: 10.11996/JG.j.2095-302X.2024030495

Previous Articles     Next Articles

Self-supervised active label cleaning

LIN Xiao1,2,3(), ZHANG Qiuyang1, ZHENG Xiaomei1,2, YANG Qizhe1()   

  1. 1. The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
    2. Shanghai Engineering Research Center of Intelligent Education and Big Data, Shanghai Normal University, Shanghai 200234, China
    3. The Research Base of Online Education for Shanghai Middle and Primary Schools, Shanghai 200234, China
  • Received:2023-07-21 Accepted:2023-11-22 Online:2024-06-30 Published:2024-06-11
  • Contact: YANG Qizhe (1994-), lecturer, Ph.D. His main research interest covers artificial intelligence. E-mail:qzyang@shnu.edu.cn
  • About author:

    LIN Xiao (1978-), professor, Ph.D. Her main research interest covers image processing. E-mail:lin6008@shnu.edu.cn

  • Supported by:
    Shanghai Municipal Special Project for Promoting High-Quality Development of Industries(2211106)

Abstract:

Active label cleaning utilizes the active learning method for label noise processing to lower the cost of manual annotation. However, the existing active label cleaning methods still suffer from high cost of extra manual annotation, particularly due to a high proportion of correctly labeled samples among the selected suspicious ones. To address this problem, a self-supervised active label cleaning method based on core-set was proposed. Firstly, self-supervised tasks were employed for representation learning of all samples, followed by mapping the samples to a future space. Suspicious samples were then identified using a greedy K-Center set covering method, and label noise samples were selected for re-labeling based on uncertainty. By considering both the representativeness and uncertainty of samples, this method could effectively lower the proportion of correct samples in suspicious ones. Experimental results on public datasets with varying proportions of label noise demonstrated that the proposed method could significantly reduce the cost of extra manual annotation in each iteration, while also mitigating the cold start problem to some extent. Additionally, the effectiveness of the self-supervised core-set sampling module and the uncertainty prediction module in this method were validated through ablation experiments.

Key words: active learning, self-supervised learning, label noise, label cleaning, cost of extra manual annotation

CLC Number: