欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (1): 104-111.DOI: 10.11996/JG.j.2095-302X.2023010104

• 图像处理与计算机视觉 • 上一篇    下一篇

一种用于视频对象分割的仿U形网络

黄志勇(), 韩莎莎, 陈致君, 姚玉, 熊彪, 马凯   

  1. 三峡大学计算机与信息学院,湖北 宜昌 443000
  • 收稿日期:2022-06-17 修回日期:2022-07-07 出版日期:2023-10-31 发布日期:2023-02-16
  • 作者简介:黄志勇(1979-),男,副教授,博士。主要研究方向为计算机视觉、计算机图形学。E-mail:hzy@hzy.org.cn
  • 基金资助:
    国家自然科学基金项目(61871258)

An imitation U-shaped network for video object segmentation

HUANG Zhi-yong(), HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai   

  1. College of Computer and Information Technology, China Three Gorges University, Yichang Hubei 443000, China
  • Received:2022-06-17 Revised:2022-07-07 Online:2023-10-31 Published:2023-02-16
  • About author:HUANG Zhi-yong (1979-), associate professor, Ph.D. His main research interests cover computer vision and computer graphics. E-mail:hzy@hzy.org.cn
  • Supported by:
    National Natural Science Foundation of China(61871258)

摘要:

在半监督的分割任务中,单镜头视频对象分割(OSVOS)方法根据第一帧的对象标记掩模进行引导,从视频画面中分离出后续帧中的前景对象。虽然取得了令人印象深刻的分割结果,但其不适用于前景对象外观变化显著或前景对象与背景外观相似的情形。针对这些问题,提出一种用于视频对象分割的仿U形网络结构。将注意力机制加入到此网络的编码器和解码器之间,以便在特征图之间建立关联来产生全局语义信息。同时,优化损失函数,进一步解决了类别间的不平衡问题,提高了模型的鲁棒性。此外,还将多尺度预测与全连接条件随机场(FC/Dense CRF)结合,提高了分割结果边缘的平滑度。在具有挑战性的DAVIS 2016数据集上进行了大量实验,此方法与其他最先进方法相比获得了具有竞争力的分割结果。

关键词: 半监督视频对象分割, 注意力机制, 损失函数, 多尺度特征

Abstract:

For the semi-supervised video object segmentation method, the one-shot video object segmentation (OSVOS) method is guided by the object marking mask of the first frame to separate the foreground objects in the subsequent frames from the video. Despite the impressive segmentation results, this method is not applicable to cases where the appearance of foreground objects changes significantly or the appearances of foreground objects and background are similar. To solve these problems, an imitation U-shaped network structure for video object segmentation was proposed. The attention mechanism was added between the encoder and decoder of this network, thus establishing association between feature maps to generate global semantic information. At the same time, the loss function was optimized to further solve the imbalance between categories and improve the robustness of the model. In addition, multi-scale prediction was combined with fully connected conditional random field (FC/Dense CRF) to improve the smoothness of the edge of segmentation results. A large number of experiments were carried out on the challenging DAVIS 2016 dataset, and the proposed method obtained more competitive segmentation results than the most advanced ones.

Key words: semi-supervised video object segmentation, attention mechanism, loss function, multi-scale feature

中图分类号: