Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (3): 521-530.DOI: 10.11996/JG.j.2095-302X.2023030521

Previous Articles     Next Articles

Bi-directionally aligned VAE based on double semantics for generalized zero-shot learning

SHI Cai-juan1,2(), SHI Ze1,2, YAN Jin-wei1,2, BI Yang-yang1,2   

  1. 1. College of Artificial Intelligence, North China University of Science and Technology, Tangshan Hebei 063210, China
    2. Hebei Key Laboratory of Industrial Intelligent Perception, Tangshan Hebei 063210, China
  • Received:2022-09-26 Accepted:2022-12-06 Online:2023-06-30 Published:2023-06-30
  • About author:

    SHI Cai-juan (1977-), professor, Ph.D. Her main research interests cover computer vision, image processing and deep learning. E-mail:scj-blue@163.com

  • Supported by:
    Distinguished Youth Foundation of North China University of Science and Technology(JQ201715);Talent Foundation of Tangshan(A202110011)

Abstract:

Generalized zero-shot learning (GZSL) aims to recognize both seen and unseen classes by utilizing the relationship between visual features and semantic information. However, existing GZSL methods mostly rely on generative models to generate pseudo visual features for unseen classes. The problem with these models is that they commonly employ unidirectional VAE and a single type of semantic prototype, which limits the obtained semantic information of unseen classes. To address this issue, a bi-directionally aligned VAE based on a double semantics model (BAVAE-DS) for GZSL was proposed. First, two types of prototypes, i.e., user-defined attributes and word vectors, were adopted to steadily generate two types of pseudo visual features respectively using the bi-directionally aligned VAE. This resulted in abundant semantic information that could be used to represent unseen classes. Next, a feature fusion model was designed to fuse the two types of pseudo visual features and remove the redundancy, thus enhancing the pseudo visual features. Finally, classification regularization was employed to enhance the independence of classes in the classification module. Extensive experiments were conducted on three benchmark datasets and the results were compared with other methods, proving the effectiveness of the proposed model.

Key words: generalized zero-shot learning, generative model, double semantic prototypes, bi-directionally aligned VAE, feature fusion and enhancement

CLC Number: