Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2023, Vol. 44 ›› Issue (1): 104-111.DOI: 10.11996/JG.j.2095-302X.2023010104

• Image Processing and Computer Vision • Previous Articles     Next Articles

An imitation U-shaped network for video object segmentation

HUANG Zhi-yong(), HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai   

  1. College of Computer and Information Technology, China Three Gorges University, Yichang Hubei 443000, China
  • Received:2022-06-17 Revised:2022-07-07 Online:2023-10-31 Published:2023-02-16
  • About author:HUANG Zhi-yong (1979-), associate professor, Ph.D. His main research interests cover computer vision and computer graphics. E-mail:hzy@hzy.org.cn
  • Supported by:
    National Natural Science Foundation of China(61871258)

Abstract:

For the semi-supervised video object segmentation method, the one-shot video object segmentation (OSVOS) method is guided by the object marking mask of the first frame to separate the foreground objects in the subsequent frames from the video. Despite the impressive segmentation results, this method is not applicable to cases where the appearance of foreground objects changes significantly or the appearances of foreground objects and background are similar. To solve these problems, an imitation U-shaped network structure for video object segmentation was proposed. The attention mechanism was added between the encoder and decoder of this network, thus establishing association between feature maps to generate global semantic information. At the same time, the loss function was optimized to further solve the imbalance between categories and improve the robustness of the model. In addition, multi-scale prediction was combined with fully connected conditional random field (FC/Dense CRF) to improve the smoothness of the edge of segmentation results. A large number of experiments were carried out on the challenging DAVIS 2016 dataset, and the proposed method obtained more competitive segmentation results than the most advanced ones.

Key words: semi-supervised video object segmentation, attention mechanism, loss function, multi-scale feature

CLC Number: