Journal of Graphics ›› 2023, Vol. 44 ›› Issue (1): 104-111.DOI: 10.11996/JG.j.2095-302X.2023010104
• Image Processing and Computer Vision • Previous Articles Next Articles
					
													HUANG Zhi-yong(
), HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai
												  
						
						
						
					
				
Received:2022-06-17
															
							
																	Revised:2022-07-07
															
							
															
							
																	Online:2023-10-31
															
							
																	Published:2023-02-16
															
						About author:HUANG Zhi-yong (1979-), associate professor, Ph.D. His main research interests cover computer vision and computer graphics. E-mail:hzy@hzy.org.cn				
													Supported by:CLC Number:
HUANG Zhi-yong, HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai. An imitation U-shaped network for video object segmentation[J]. Journal of Graphics, 2023, 44(1): 104-111.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023010104
| Method | OL | J&F Mean[%] | J Mean[%] | F Mean[%] | 
|---|---|---|---|---|
| OSMN[ |  × | 73.45 | 74.00 | 72.90 | 
| FAVOS[ |  × | 80.95 | 82.40 | 79.50 | 
| RGMP[ |  × | 81.75 | 81.50 | 82.00 | 
| FEELVOS[ |  × | 81.65 | 81.10 | 82.20 | 
| CRVOS[ |  × | 81.60 | 82.20 | 81.00 | 
| SAT[ |  × | 83.10 | 82.60 | 83.60 | 
| RANet[ |  × | 85.50 | 85.50 | 85.40 | 
| MaskTrack[ |  √ | 77.55 | 79.70 | 75.40 | 
| OSVOS[ |  √ | 80.20 | 79.80 | 80.60 | 
| FRTMVOS[ |  √ | 83.50 | - | - | 
| LucidTracker[ |  √ | 83.60 | 84.80 | 82.30 | 
| STCNN[ |  √ | 83.80 | 83.80 | 83.80 | 
| OnAVOS[ |  √ | 84.95 | 85.70 | 84.20 | 
| PReMVOS[ |  √ | 86.75 | 84.90 | 88.60 | 
| CINM[ |  √ | 84.20 | 83.40 | 85.00 | 
| MHPVOS[ |  √ | 88.55 | 87.60 | 89.50 | 
| Ours | √ | 87.07 | 86.26 | 87.88 | 
Table 1 Results compared with the state-of-the-art methods
| Method | OL | J&F Mean[%] | J Mean[%] | F Mean[%] | 
|---|---|---|---|---|
| OSMN[ |  × | 73.45 | 74.00 | 72.90 | 
| FAVOS[ |  × | 80.95 | 82.40 | 79.50 | 
| RGMP[ |  × | 81.75 | 81.50 | 82.00 | 
| FEELVOS[ |  × | 81.65 | 81.10 | 82.20 | 
| CRVOS[ |  × | 81.60 | 82.20 | 81.00 | 
| SAT[ |  × | 83.10 | 82.60 | 83.60 | 
| RANet[ |  × | 85.50 | 85.50 | 85.40 | 
| MaskTrack[ |  √ | 77.55 | 79.70 | 75.40 | 
| OSVOS[ |  √ | 80.20 | 79.80 | 80.60 | 
| FRTMVOS[ |  √ | 83.50 | - | - | 
| LucidTracker[ |  √ | 83.60 | 84.80 | 82.30 | 
| STCNN[ |  √ | 83.80 | 83.80 | 83.80 | 
| OnAVOS[ |  √ | 84.95 | 85.70 | 84.20 | 
| PReMVOS[ |  √ | 86.75 | 84.90 | 88.60 | 
| CINM[ |  √ | 84.20 | 83.40 | 85.00 | 
| MHPVOS[ |  √ | 88.55 | 87.60 | 89.50 | 
| Ours | √ | 87.07 | 86.26 | 87.88 | 
																													Fig. 5 Comparison of qualitative results ((a) Ours; (b) MHPVOS; (c) CINM; (d) FEELVOS; (e) FAVOS; (f) OSVOS; (g) MaskTrack; (h) LucidTracker; (i) Ground truth)
| Method | J Mean | F Mean | 
|---|---|---|
| Ours | 86.26 | 87.88 | 
| DA | 84.29 | 84.58 | 
| Dense CRF | 74.03 | 74.56 | 
Table 2 Ablation experiments on the DAVIS 2016 validation dataset (%)
| Method | J Mean | F Mean | 
|---|---|---|
| Ours | 86.26 | 87.88 | 
| DA | 84.29 | 84.58 | 
| Dense CRF | 74.03 | 74.56 | 
| Method | J Mean | F Mean | 
|---|---|---|
| Ours | 74.03 | 74.56 | 
| Original | 73.79 | 73.35 | 
Table 3 Ablation experiment on loss function (%)
| Method | J Mean | F Mean | 
|---|---|---|
| Ours | 74.03 | 74.56 | 
| Original | 73.79 | 73.35 | 
| [1] | CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5320-5329. | 
| [2] | JAIN S D, XIONG B, GRAUMAN K. FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2126. | 
| [3] |  
											 KHOREVA A, BENENSON R, ILG E, et al.  Lucid data dreaming for video object segmentation[J]. International Journal of Computer Vision, 2019, 127(9): 1175-1197. 
																							 DOI  | 
										
| [4] | PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3491-3500. | 
| [5] | KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22) [2022-01-30].https://arxiv.org/abs/1412.6980. | 
| [6] | HELD D, THRUN S, SAVARESE S. Learning to track at 100 FPS with deep regression networks[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 749-765. | 
| [7] | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4293-4302. | 
| [8] | VOIGTLAENDER P, LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[C]//The British Machine Vision Conference 2017. Durham University: British Machine Vision Association, 2017: 1-13. | 
| [9] | GRIFFIN B A, CORSO J J. BubbleNets: learning to select the guidance frame in video object segmentation by deep sorting frames[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8906-8915. | 
| [10] | SHARIR G, SMOLYANSKY E, FRIEDMAN I. Video object segmentation using tracked object proposals[EB/OL]. [2022-01-03].https://arxiv.org/abs/1707.06545. | 
| [11] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. [2022-01-03]. https://arxiv.org/abs/1412.7062. | 
| [12] | HU Y T, HUANG J B, SCHWING A. Maskrnn: Instance level video object segmentation[C]//Neural Information Processing Systems. California: MIT Press, 2017: 325-334. | 
| [13] | MÄRKI N, PERAZZI F, WANG O, et al. Bilateral space video segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 743-751. | 
| [14] | JAMPANI V, GADDE R, GEHLER P V. Video propagation networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3154-3164. | 
| [15] | CHENG J C, TSAI Y H, HUNG W C, et al. Fast and accurate online video object segmentation via tracking parts[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7415-7424. | 
| [16] | YANG L J, WANG Y R, XIONG X H, et al. Efficient video object segmentation via network modulation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6499-6507. | 
| [17] | XIAO H X, FENG J S, LIN G S, et al. MoNet: deep motion exploitation for video object segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 1140-1148. | 
| [18] | LUITEN J, VOIGTLAENDER P, LEIBE B. PReMVOS: proposal-generation, refinement and merging for video object segmentation[M]//Computer Vision - ACCV 2018. Cham: Springer International Publishing, 2018: 565-580. | 
| [19] | HU Y T, HUANG J B, SCHWING A G. VideoMatch: matching based video object segmentation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 56-73. | 
| [20] | VOIGTLAENDER P, CHAI Y N, SCHROFF F, et al. FEELVOS: fast end-to-end embedding learning for video object segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9473-9482. | 
| [21] | OH S W, LEE J Y, SUNKAVALLI K, et al. Fast video object segmentation by reference-guided mask propagation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7376-7385. | 
| [22] | OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9226-9235. | 
| [23] | JOHNANDER J, DANELLJAN M, BRISSMAN E, et al. A generative appearance model for end-to-end video object segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8945-8954. | 
| [24] | LIN H J, QI X J, JIA J Y. AGSS-VOS: attention guided single-shot video object segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3948-3956. | 
| [25] | ZENG X H, LIAO R J, GU L, et al. DMM-net: differentiable mask-matching network for video object segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3929-3938. | 
| [26] | WANG Z Q, XU J, LIU L, et al. RANet: ranking attention network for fast video object segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3978-3987. | 
| [27] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. | 
| [28] |  
											 BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. 
																							 DOI PMID  | 
										
| [29] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17) [2021-12-05].https://arxiv.org/abs/1706.05587. | 
| [30] | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 801-818. | 
| [31] | HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1314-1324. | 
| [32] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04) [2022-01-10].https://arxiv.org/abs/1409.1556. | 
| [33] | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3146-3154. | 
| [34] |  
											 SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. 
																							 DOI PMID  | 
										
| [35] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1-9. | 
| [36] | KRÄHENBÜHL P, KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]// The 24th International Conference on Neural Information Processing Systems. New York: ACM, 2012: 109-117. | 
| [37] | XIE S N, TU Z W. Holistically-nested edge detection[C]//2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1395-1403. | 
| [38] |  
											 MANINIS K K, PONT-TUSET J, ARBELÁEZ P, et al.  Convolutional oriented boundaries: from image segmentation to high-level tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 819-833. 
																							 DOI URL  | 
										
| [39] | MANINIS K K, PONT-TUSET J, ARBELÁEZ P, et al. Deep retinal image understanding[M]//Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. Cham: Springer International Publishing, 2016: 140-148. | 
| [40] | PERAZZI F, PONT-TUSET J, MCWILLIAMS B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 724-732. | 
| [41] | PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 DAVIS challenge on video object segmentation[EB/OL]. (2017-04-03) [2022-01-10].https://arxiv.org/abs/1704.00675. | 
| [42] | CHENG J C, TSAI Y H, HUNG W C, et al. Fast and accurate online video object segmentation via tracking parts[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7415-7424. | 
| [43] | CHO S, CHO M, CHUNG T Y, et al. Crvos: clue refining network for video object segmentation[C]//2020 IEEE International Conference on Image Processing. New York: IEEE Press, 2020: 2301-2305. | 
| [44] | CHEN X, LI Z X, YUAN Y, et al. State-aware tracker for real-time video object segmentation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9384-9393. | 
| [45] | ROBINSON A, JÄREMO LAWIN F, DANELLJAN M, et al. Learning fast and robust target models for video object segmentation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7406-7415. | 
| [46] | XU K, WEN L Y, LI G R, et al. Spatiotemporal CNN for video object segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1379-1388. | 
| [47] | BAO L C, WU B Y, LIU W.CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5977-5986. | 
| [48] | XU S J, LIU D Z, BAO L C, et al. MHP-VOS: multiple hypotheses propagation for video object segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 314-323. | 
| [1] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. | 
| [2] | PI Jun, NIU Hou-xing, GAO Zhi-yun. Lightweight human pose estimation algorithm by integrating CA and BiFPN [J]. Journal of Graphics, 2023, 44(5): 868-878. | 
| [3] | SONG Huan-sheng, WEN Ya, SUN Shi-jie, SONG Xiang-yu, ZHANG Chao-yang, LI Xu. Tunnel fire detection based on improved student-teacher network [J]. Journal of Graphics, 2023, 44(5): 978-987. | 
| [4] | LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism [J]. Journal of Graphics, 2023, 44(4): 658-666. | 
| [5] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. | 
| [6] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. | 
| [7] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. | 
| [8] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. | 
| [9] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. | 
| [10] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. | 
| [11] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. | 
| [12] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. | 
| [13] | SUN Long-fei, LIU Hui, YANG Feng-chang, LI Pan. Research on cyclic generative network oriented to inter-layer interpolation of medical images [J]. Journal of Graphics, 2023, 44(3): 502-512. | 
| [14] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. | 
| [15] | LU Qiu, SHAO Hua-ze, ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. | 
| Viewed | ||||||
| 
										Full text | 
									
										 | 
								|||||
| 
										Abstract | 
									
										 | 
								|||||