Multi-scale modality perception network for referring image segmentation

doi:10.11996/JG.j.2095-302X.2022061150

Journal of Graphics ›› 2022, Vol. 43 ›› Issue (6): 1150-1158.DOI: 10.11996/JG.j.2095-302X.2022061150

• Image Processing and Computer Vision • Previous Articles Next Articles

Multi-scale modality perception network for referring image segmentation

1. School of Artificial Intelligence and Automation, Beijing University of Technology, Beijing 100124, China; 2. School of Mathematical Sciences, Dalian University of Technology, Dalian Liaoning 116024, China

Online:2022-12-30 Published:2023-01-11
Supported by:
The 7th National Postdoctoral Innovative Talent Support Program (BX20220025); The 70th Batch of National Post-Doctoral Research Grants (2021M700303)

Abstract

Abstract:

Referring image segmentation (RIS) is the task of parsing the instance referred to by the text description and segmenting the instance in the corresponding image. It is a popular research topic in computer vision and media. Currently, most RIS methods are based on the fusion of single-scale text/image modality information to perceive the location and semantic information of referential instances. However, it is difficult for single-scale modal information to simultaneously cover both the semantics and structural context information required to locate instances of different sizes. This defect hinders the model from perceiving referent instances of any size, which affects the model’s segmentation of referent instances of different sizes. This paper designed a Multi-scale Visual-Language Interaction Perception Module and a Multi-scale Mask Prediction Module to solve this problem. The former could enhance the model’s ability to perceive instances at different scales and promote effective alignment of semantics between different modalities. The latter could improve the performance of referring instance segmentation by fully capturing the required semantic and structural information of instances at different scales. Therefore, this paper proposed a multi-scale modality perception network for referring image segmentation (MMPN-RIS). The experimental results show that the MMPN-RIS model has achieved cutting-edge performance on the oIoU indicators of the three public datasets RefCOCO, RefCOCO+, and RefCOCOg. For the RIS of different scales, the MMPN-RIS model could also yield good performance.

Key words: visual and language, referring image segmentation, multi-modality fusion and perception, feature pyramid network

CLC Number:

TP 391

LIU Jing , HU Yong-li , LIU Xiu-ping , TAN Hong-chen , YIN Bao-cai. Multi-scale modality perception network for referring image segmentation[J]. Journal of Graphics, 2022, 43(6): 1150-1158.

[1]	DONG Zhe-tong, LIN Hong-wei. Computer aided topological design ——survey on geometric design and processing based on persistent homology [J]. Journal of Graphics, 2022, 43(6): 957-966.
[2]	CHEN Wei , CAI Zhan-chuan , LI Jian , LIANG Yan-yan , XIONG Gang-qiang , SONG Rui-xia. A Survey of theory and applications of U-system and V-system [J]. Journal of Graphics, 2022, 43(6): 1002-1017.
[3]	HU Wen-kai , MA Hong-yu , LIU Ya-zui , WEI Xiao-dong , ZHAO Gang , SHEN Li-yong , LI Xin. T-splines a new representation for CAD, CAE and CAM [J]. Journal of Graphics, 2022, 43(6): 1018-1033.
[4]	LI Ming , ZHANG Cheng-hu , HU Jing-qiao , HU Xin-zhuo , LIU Ji-kai. Methods of porous structure design [J]. Journal of Graphics, 2022, 43(6): 1034-1048.
[5]	WANG Han , ZHU Chun-gang. Free-form deformation based on extension factor for toric-Bézier curve [J]. Journal of Graphics, 2022, 43(6): 1070-1079.
[6]	WU Chen , CAO Li , QIN Yu , WU Miao-miao , Koo SiuKong. Atomic model rendering method based on reference images [J]. Journal of Graphics, 2022, 43(6): 1080-1087.
[7]	ZHU Peng-hui, YUAN Hong-tao, NIE Yong-wei, LI Gui-qing. AC-HAPE3D: an algorithm for irregular packing based on reinforcement learning [J]. Journal of Graphics, 2022, 43(6): 1096-1103.
[8]	GUAN Qi-chao, LIU Hao, WANG Yuan-cheng, FU Xiao-ming . Error-bounded unstructured T-spline surface fitting with low distortion [J]. Journal of Graphics, 2022, 43(6): 1104-1113.
[9]	HE Ke-yu, CHEN Zhong-gui . Circle packing based texture generation [J]. Journal of Graphics, 2022, 43(6): 1114-1123.
[10]	GUO Wen , LI Dong , YUAN Fei. 1. School of Information and Electronic Engineering, Shandong Technology and Business University, Yantai Shandong 264005, China; 2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100195, China [J]. Journal of Graphics, 2022, 43(6): 1124-1133.
[11]	CUI Zhen-dong , LI Zong-min, YANG Shu-lin , LIU Yu-jie , LI Hua. 3D object detection based on semantic segmentation guidance [J]. Journal of Graphics, 2022, 43(6): 1134-1142.
[12]	SUN Ya-nan, WEN Yu-hui, SHU Ye-zhi, LIU Yong-jin . Multimodal emotion recognition with action features [J]. Journal of Graphics, 2022, 43(6): 1159-1169.
[13]	FAN Yi-hua , WANG Yong-zhen , YAN Xue-feng , GONG Li-na , GUO Yan-wen , WEI Ming-qiang. Face recognition-driven low-light image enhancement [J]. Journal of Graphics, 2022, 43(6): 1170-1181.
[14]	MO Han-lin, HAO You, GUO Rui, HAO Hong-xiang, ZHANG He, LI Qi, LI Hua, . The construction and application of integral invariants and differential invariants of graphics and images [J]. Journal of Graphics, 2022, 43(6): 1182-1192.
[15]	GENG Yuan, TAN Hong-chen, LI Jing-hua, WANG Li-chun . Visual information accumulation network for person re-identification [J]. Journal of Graphics, 2022, 43(6): 1193-1200.

Multi-scale modality perception network for referring image segmentation

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments