Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2024, Vol. 45 ›› Issue (4): 745-759.DOI: 10.11996/JG.j.2095-302X.2024040745

• Image Processing and Computer Vision • Previous Articles     Next Articles

A deep architecture for reciprocal object detection and instance segmentation

GONG Yongchao1,2(), SHEN Xukun1,2,3()   

  1. 1. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2. State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
    3. School of New Media Art and Design, Beihang University, Beijing 100191, China
  • Received:2023-12-18 Accepted:2024-05-03 Online:2024-08-31 Published:2024-09-03
  • Contact: SHEN Xukun
  • About author:First author contact:

    GONG Yongchao (1988-), Ph.D. His main research interests cover computer vision and deep learning. E-mail:gyc_ustc@163.com

Abstract:

Object detection and instance segmentation are two fundamental and closely correlated tasks in computer vision, yet their relations have not been fully explored in most previous works.For this reason, we presented the reciprocal object detection and instance segmentation network (RDSNet), a novel deep architecture. To reciprocate between these two tasks, we designed a two-stream structure to learn feature representations jointly at both the object level (i.e., bounding boxes) and the pixel level (i.e., instance masks), thus encoding object- and pixel-level information respectively. Moreover, three new modules were introduced for the interactions between the two streams, allowing object-level information to assist instance segmentation and pixel-level information to assist object detection. Specifically, a correlation module was used to measure the similarity between object- and pixel-level features, promoting the consistency in features belonging to the same object and enhancing the accuracy of instance masks consequently. We proposed a cropping module to better distinguish different instances and reduce background noise, by introducing the awareness of instance and translation variance to pixel-level perception. To further refine the alignment between bounding boxes and their corresponding objects, a mask-based boundary refinement module (MBRM) was proposed for the fusion of bounding boxes and instance masks, which had the potential to correct the errors in bounding boxes with the help of instance masks. Extensive experimental analyses and comparisons on the COCO dataset demonstrated the effectiveness and efficiency of RDSNet. In addition, we further improved the performance of RDSNet by integrating the mask scoring strategy into MBRM, which allowed object detection to benefit from instance segmentation in a new way.

Key words: object detection, instance segmentation, reciprocal relation, feature representation, boundary refinement

CLC Number: