欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (3): 531-539.DOI: 10.11996/JG.j.2095-302X.2023030531

• 图像处理与计算机视觉 • 上一篇    下一篇

融合空间十字注意力与通道注意力的语义分割网络

吴文欢(), 张淏坤   

  1. 湖北汽车工业学院电气与信息工程学院,湖北 十堰 442002
  • 收稿日期:2022-10-05 接受日期:2023-02-22 出版日期:2023-06-30 发布日期:2023-06-30
  • 作者简介:

    吴文欢(1985-),男,副教授,博士。主要研究方向为计算机视觉和图像处理等。E-mail:wuwenhuan5@163.com

  • 基金资助:
    湖北省自然科学基金项目(2022CFB538);湖北汽车工业学院博士科研启动基金项目(BK202004)

Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention

WU Wen-huan(), ZHANG Hao-kun   

  1. School of Electrical and Information Engineering, Hubei University of Automotive Technology, Shiyan Hubei 442002, China
  • Received:2022-10-05 Accepted:2023-02-22 Online:2023-06-30 Published:2023-06-30
  • About author:

    WU Wen-huan (1985-), associate professor, Ph.D. His main research interests cover computer vision and image processing, etc. E-mail:wuwenhuan5@163.com

  • Supported by:
    Natural Science Fund Project of Hubei Province(2022CFB538);Ph.D Research Startup Fund Project of Hubei University of Automotive Technology(BK202004)

摘要:

针对现有语义分割方法无法有效构建上下文语义关联关系以及所提取的语义特征表征能力不足的问题,提出了一种新的空间十字注意力与通道注意力相融合的语义分割网络。首先,采用空间十字注意力模块(SCCAM)聚合目标像素在水平和垂直方向上的上下文信息,进而高效地建立像素之间的非局部语义依赖关系。其次,在通道注意力模块(CAM)中引入多头注意力机制,在多个通道子空间上挖掘语义更显著的通道特征。在此基础上,通过融合空间与通道两个维度上的注意力特征,进一步增强特征的语义表征能力,提升语义分割精度。在Cityscapes数据集、PASCAL VOC2012数据集以及CamVid数据集上的实验结果表明,与其他先进语义分割方法相比,该网络模型具有更高的分割精度。

关键词: 语义分割, 神经网络, 注意力机制, 空间注意力, 通道注意力

Abstract:

In light of the shortcomings of current semantic segmentation methods, which suffer from ineffective construction of contextual semantic associations and insufficient representation of extracted semantic features, a novel semantic segmentation network that combines spatial criss-cross attention and channel attention was proposed. Firstly, the spatial criss-cross attention module (SCCAM) was adopted to aggregate context information of each target pixel in the horizontal and vertical directions, thus enabling efficient construction of non-local semantic dependencies between pixels. Secondly, the multi-head attention mechanism was introduced in the channel attention module (CAM) to mine channel features with more significant semantics on multiple channel subspaces. Finally, the semantic representation capability was strengthened by merging attention features on both spatial and channel dimensions, thereby improving the precision of semantic segmentation. The experimental results on several datasets, including Cityscapes, PASCAL VOC2012, and CamVid demonstrated that the proposed network model outperformed other state-of-the-art semantic segmentation methods in terms of segmentation accuracy.

Key words: semantic segmentation, neural networks, attention mechanism, space attention, channel attention

中图分类号: