欢迎访问《图学学报》 分享到:

图学学报 ›› 2024, Vol. 45 ›› Issue (4): 745-759.DOI: 10.11996/JG.j.2095-302X.2024040745

• 图像处理与计算机视觉 • 上一篇    下一篇

一种用于互惠目标检测与实例分割的深层架构

宫永超1,2(), 沈旭昆1,2,3()   

  1. 1.北京航空航天大学计算机学院,北京 100191
    2.北京航空航天大学虚拟现实技术与系统国家重点实验室,北京 100191
    3.北京航空航天大学新媒体艺术与设计学院,北京 100191
  • 收稿日期:2023-12-18 接受日期:2024-05-03 出版日期:2024-08-31 发布日期:2024-09-03
  • 通讯作者:沈旭昆(1965-),男,教授,博士。主要研究方向为计算机图形学和虚拟现实等。E-mail:xkshen@buaa.edu.cn
  • 第一作者:宫永超(1988-),男,博士后,博士。主要研究方向为计算机视觉与深度学习。E-mail:gyc_ustc@163.com

A deep architecture for reciprocal object detection and instance segmentation

GONG Yongchao1,2(), SHEN Xukun1,2,3()   

  1. 1. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2. State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
    3. School of New Media Art and Design, Beihang University, Beijing 100191, China
  • Received:2023-12-18 Accepted:2024-05-03 Published:2024-08-31 Online:2024-09-03
  • Contact: SHEN Xukun (1965-), professor, Ph.D. His main research interests cover computer graphics and virtual reality, etc. E-mail:xkshen@buaa.edu.cn
  • First author:GONG Yongchao (1988-), Ph.D. His main research interests cover computer vision and deep learning. E-mail:gyc_ustc@163.com

摘要:

目标检测与实例分割是计算机视觉中2种重要且关系紧密的任务,但其间的关联在大多数工作中还未得到充分的探索。为此,提出了RDSNet,一种用于互惠目标检测与实例分割的深层架构。为了实现这2种任务之间协同优化,设计了一个双流式结构来联合学习目标级别和像素级别的特征表达,分别用于编码目标级别和像素级别的信息,并在双流之间引入了3个模块来实现二者的相互作用,让目标信息辅助实例分割,像素信息辅助目标检测。通过相关模块提供一种计算目标级和像素级特征相似度的手段,以便于驱动属于同一目标的特征尽可能一致,提高实例掩码的精度。裁剪模块利用目标信息为像素级感知引入实例的概念和平移变化性,以便于更准确地区分不同实例和减少背景噪声。为了进一步提高检测框与目标的贴合程度,提出了基于掩码的边界精细化模块来对掩码和检测框做融合,利用掩码的准确性优势修正检测框的误差。在COCO数据集上的大量实验分析和对比证实了RDSNet的有效性和高效性。此外,通过在边界精细化模块引入掩码打分策略,以新的方式实现了实例分割对目标检测的辅助,使RDSNet的性能得到了进一步提升。

关键词: 目标检测, 实例分割, 互惠关系, 特征表达, 边界精细化

Abstract:

Object detection and instance segmentation are two fundamental and closely correlated tasks in computer vision, yet their relations have not been fully explored in most previous works.For this reason, we presented the reciprocal object detection and instance segmentation network (RDSNet), a novel deep architecture. To reciprocate between these two tasks, we designed a two-stream structure to learn feature representations jointly at both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks), thus encoding object- and pixel-level information respectively. Moreover, three new modules were introduced for the interactions between the two streams, allowing object-level information to assist instance segmentation and pixel-level information to assist object detection. Specifically, a correlation module was used to measure the similarity between object- and pixel-level features, promoting the consistency in features belonging to the same object and enhancing the accuracy of instance masks consequently. We proposed a cropping module to better distinguish different instances and reduce background noise, by introducing the awareness of instance and translation variance to pixel-level perception. To further refine the alignment between bounding boxes and their corresponding objects, a mask-based boundary refinement module (MBRM) was proposed for the fusion of bounding boxes and instance masks, which had the potential to correct the errors in bounding boxes with the help of instance masks. Extensive experimental analyses and comparisons on the COCO dataset demonstrated the effectiveness and efficiency of RDSNet. In addition, we further improved the performance of RDSNet by integrating the mask scoring strategy into MBRM, which allowed object detection to benefit from instance segmentation in a new way.

Key words: object detection, instance segmentation, reciprocal relation, feature representation, boundary refinement

中图分类号: