欢迎访问《图学学报》 分享到:

图学学报 ›› 2022, Vol. 43 ›› Issue (2): 247-253.DOI: 10.11996/JG.j.2095-302X.2022020247

• 图像处理与计算机视觉 • 上一篇    下一篇

基于深度学习的人物肖像全自动抠图算法

  

  1. 浙江科技学院理学院,浙江 杭州 310000
  • 出版日期:2022-04-30 发布日期:2022-05-07
  • 基金资助:
    浙江省自然科学基金项目(Ly20A010005)

Fully automatic matting algorithm for portraits based on deep learning

  1. School of Science, Zhejiang University of Science and Technology, Hangzhou Zhejiang 310000, China
  • Online:2022-04-30 Published:2022-05-07
  • Supported by:
    Natural Science Foundation of Zhejiang Province (Ly20A010005)

摘要: 针对抠图任务中人物抠图完整度低、边缘不够精细化等繁琐问题,提出了一种基于深度学习
的人物肖像全自动抠图算法。算法采用三分支网络进行学习,语义分割分支(SSB)学习  图的语义信息,细节
分支(DB)学习  图的细节信息,混合分支(COM)将 2 个分支的学习结果汇总。首先算法的编码网络采用轻量
级卷积神经网络(CNN) MobileNetV2,以加速算法的特征提取过程;其次在 SSB 中加入注意力机制对图像特
征通道重要性进行加权,在 DB 加入空洞空间金字塔池化(ASPP)模块,对图像的不同感受野所提取的特征进
行多尺度融合;然后解码网络的 2 个分支通过跳级连接融合不同阶段编码网络提取到的特征进行解码;最后
将 2 个分支学习的特征融合在一起得到图像的  图。实验结果表明,该算法在公开的数据集上抠图效果优于
所对比的基于深度学习的半自动和全自动抠图算法,在实时流视频抠图的效果优于 Modnet。

关键词: 全自动抠图, 轻量级卷积神经网络, 注意力机制, 空洞空间金字塔池化, 特征融合

Abstract: Aiming at the problems of low completeness of character matting, insufficiently refined edges, and
cumbersome matting in matting tasks, an automatic matting algorithm for portraits based on deep learning was
proposed. The algorithm employed a three-branch network for learning: the semantic information of the
semantic segmentation branch (SSB) learning  graph, and the detailed information of the detail branch (DB)
learning  graph. The combination branch (COM) summarized the learning results of the two branches. First, the
algorithm’s coding network utilized a lightweight convolutional neural network MobileNetV2, aiming to
accelerate the feature extraction process of the algorithm. Second, an attention mechanism was added to the SSB
branch to weight the importance of image feature channels, the atrous spatial pyramid pooling module was added
to the DB branch, and multi-scale fusion was achieved for the features extracted from the different receptive
fields of the image. Then, the two branches of the decoding network merged the features extracted by the
encoding network at different stages through the jump connection, thus conducting the decoding. Finally, the
features learned by the two branches were fused together to obtain the image  graph. The experimental results
show that on the public data set, this algorithm can outperform the semi-automatic and fully automatic matting algorithms based on deep learning, and that the effect of real-time streaming video matting is superior to that of
Modnet.

Key words: fully automatic matting, lightweight convolutional neural network, attention mechanism, atrous spatial
pyramid pooling,
feature fusion

中图分类号: