欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 683-695.DOI: 10.11996/JG.j.2095-302X.2024040683

• 图像处理与计算机视觉 • 上一篇    下一篇

基于语义引导的人像自动抠图模型

程艳1,4(), 严志航2,4, 赖建明2,4, 王桂喜2,4, 钟林辉3,4   

  1. 1.江西师范大学软件学院,江西南昌 330022
    2.江西师范大学数字产业学院,江西 上饶 334000
    3.江西师范大学计算机信息与工程学院,江西 南昌 330022
    4.江西省智能信息处理与情感计算省重点实验室,江西 南昌 330022
  • 收稿日期:2024-02-27 接受日期:2024-05-10 出版日期:2024-08-31 发布日期:2024-09-03
  • 第一作者:程艳(1976-),女,教授,博士。主要研究方向为人工智能和图像处理。E-mail:chyan88888@jxnu.edu.cn
  • 基金资助:
    国家自然科学基金项目(62167006);国家自然科学基金项目(61967011);江西省科技创新基地计划省重点实验室项目(2024SSY03131);江西省自然科学基金项目(20212BAB202017);江西省03专项及5G项目(20212ABC03A22);江西省主要学科学术和技术带头人培养计划领军人才项目(20213BCJL22047)

Automatic portrait matting model based on semantic guidance

CHENG Yan1,4(), YAN Zhihang2,4, LAI Jianming2,4, WANG Guixi2,4, ZHONG Linhui3,4   

  1. 1. School of Software, Jiangxi Normal University, Nanchang Jiangxi 330022, China
    2. School of Digital Industry, Jiangxi Normal University, Shangrao Jiangxi 334000, China
    3. School of Computer Information and Engineering, Jiangxi Normal University, Nanchang Jiangxi 330022, China
    4. Key Laboratory of Intelligent Information Processing and Emotional Computing of Jiangxi Province, Nanchang Jiangxi 330022, China
  • Received:2024-02-27 Accepted:2024-05-10 Published:2024-08-31 Online:2024-09-03
  • First author:CHENG Yan (1976-), professor, Ph.D. Her main research interests cover artificial intelligence and image processing. E-mail:chyan88888@jxnu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62167006);National Natural Science Foundation of China(61967011);Jiangxi Province Science and Technology Innovation Base Plan Jiangxi Province Key Laboratory Project(2024SSY03131);Natural Science Foundation of Jiangxi Province(20212BAB202017);Jiangxi Province 03 Special and 5G Projects(20212ABC03A22);Jiangxi Province Major Disciplines Academic and Technical Leaders Training Plan Leading Talent Project(20213BCJL22047)

摘要:

为解决现有人像抠图方法中存在的语义判别错误和抠图细节模糊问题,提出一种基于语义引导的人像自动抠图模型。首先引入CNN-Transformer混合架构EMO进行特征编码。接着,在语义分割解码分支利用多尺度混合注意力模块处理最高层编码特征,以增强多尺度表征和像素级判别能力。然后,使用特征增强模块融合高层次特征,促使高层语义信息在浅层网络的流动。同时,细节抠取解码分支中的聚合以引导来自模块不同分支的特征聚合,利用聚合特征更好地引导网络提取浅层特征,提高了边缘细节抠取精度。在3个数据集上的实验表明,该方法与所比较方法相比性能达到了最优,并显著降低了参数量和计算复杂度,具有较高的竞争力。

关键词: 人像抠图, 语义引导, 多尺度, 特征增强, 聚合引导

Abstract:

To address the issues of semantic discrimination errors and unclear details in existing portrait matting methods, an automatic matting model based on semantic guidance was proposed.Firstly, a hybrid CNN-Transformer architecture EMO was introduced for feature encoding. Then, the semantic segmentation decoding branch utilized a multi-scale hybrid attention module to process the top-level encoded features, enhancing multi-scale representation and pixel-level discrimination capabilities. Next, a feature enhancement module was employed to merge high-level features, facilitating the flow of high-level semantic information through the shallow network. Simultaneously, the aggregation guidance module in the detail extraction decoding branch aggregated features from different branches, utilizing the aggregated features to better guide the network in extracting shallow features, thereby improving the accuracy of edge and detail extraction. Experiments on three datasets demonstrated that our approach outperformed the compared methods, achieving optimal performance while significantly reducing parameter count and computational complexity, validating the competitiveness of our proposed method.

Key words: portrait matting, semantic guidance, multi-scale, feature enhancement, aggregation guidance

中图分类号: