Journal of Graphics ›› 2023, Vol. 44 ›› Issue (1): 104-111.DOI: 10.11996/JG.j.2095-302X.2023010104
• Image Processing and Computer Vision • Previous Articles Next Articles
HUANG Zhi-yong(), HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai
Received:
2022-06-17
Revised:
2022-07-07
Online:
2023-10-31
Published:
2023-02-16
About author:
HUANG Zhi-yong (1979-), associate professor, Ph.D. His main research interests cover computer vision and computer graphics. E-mail:hzy@hzy.org.cn
Supported by:
CLC Number:
HUANG Zhi-yong, HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai. An imitation U-shaped network for video object segmentation[J]. Journal of Graphics, 2023, 44(1): 104-111.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023010104
Method | OL | J&F Mean[%] | J Mean[%] | F Mean[%] |
---|---|---|---|---|
OSMN[ | × | 73.45 | 74.00 | 72.90 |
FAVOS[ | × | 80.95 | 82.40 | 79.50 |
RGMP[ | × | 81.75 | 81.50 | 82.00 |
FEELVOS[ | × | 81.65 | 81.10 | 82.20 |
CRVOS[ | × | 81.60 | 82.20 | 81.00 |
SAT[ | × | 83.10 | 82.60 | 83.60 |
RANet[ | × | 85.50 | 85.50 | 85.40 |
MaskTrack[ | √ | 77.55 | 79.70 | 75.40 |
OSVOS[ | √ | 80.20 | 79.80 | 80.60 |
FRTMVOS[ | √ | 83.50 | - | - |
LucidTracker[ | √ | 83.60 | 84.80 | 82.30 |
STCNN[ | √ | 83.80 | 83.80 | 83.80 |
OnAVOS[ | √ | 84.95 | 85.70 | 84.20 |
PReMVOS[ | √ | 86.75 | 84.90 | 88.60 |
CINM[ | √ | 84.20 | 83.40 | 85.00 |
MHPVOS[ | √ | 88.55 | 87.60 | 89.50 |
Ours | √ | 87.07 | 86.26 | 87.88 |
Table 1 Results compared with the state-of-the-art methods
Method | OL | J&F Mean[%] | J Mean[%] | F Mean[%] |
---|---|---|---|---|
OSMN[ | × | 73.45 | 74.00 | 72.90 |
FAVOS[ | × | 80.95 | 82.40 | 79.50 |
RGMP[ | × | 81.75 | 81.50 | 82.00 |
FEELVOS[ | × | 81.65 | 81.10 | 82.20 |
CRVOS[ | × | 81.60 | 82.20 | 81.00 |
SAT[ | × | 83.10 | 82.60 | 83.60 |
RANet[ | × | 85.50 | 85.50 | 85.40 |
MaskTrack[ | √ | 77.55 | 79.70 | 75.40 |
OSVOS[ | √ | 80.20 | 79.80 | 80.60 |
FRTMVOS[ | √ | 83.50 | - | - |
LucidTracker[ | √ | 83.60 | 84.80 | 82.30 |
STCNN[ | √ | 83.80 | 83.80 | 83.80 |
OnAVOS[ | √ | 84.95 | 85.70 | 84.20 |
PReMVOS[ | √ | 86.75 | 84.90 | 88.60 |
CINM[ | √ | 84.20 | 83.40 | 85.00 |
MHPVOS[ | √ | 88.55 | 87.60 | 89.50 |
Ours | √ | 87.07 | 86.26 | 87.88 |
Fig. 5 Comparison of qualitative results ((a) Ours; (b) MHPVOS; (c) CINM; (d) FEELVOS; (e) FAVOS; (f) OSVOS; (g) MaskTrack; (h) LucidTracker; (i) Ground truth)
Method | J Mean | F Mean |
---|---|---|
Ours | 86.26 | 87.88 |
DA | 84.29 | 84.58 |
Dense CRF | 74.03 | 74.56 |
Table 2 Ablation experiments on the DAVIS 2016 validation dataset (%)
Method | J Mean | F Mean |
---|---|---|
Ours | 86.26 | 87.88 |
DA | 84.29 | 84.58 |
Dense CRF | 74.03 | 74.56 |
Method | J Mean | F Mean |
---|---|---|
Ours | 74.03 | 74.56 |
Original | 73.79 | 73.35 |
Table 3 Ablation experiment on loss function (%)
Method | J Mean | F Mean |
---|---|---|
Ours | 74.03 | 74.56 |
Original | 73.79 | 73.35 |
[1] | CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5320-5329. |
[2] | JAIN S D, XIONG B, GRAUMAN K. FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2117-2126. |
[3] |
KHOREVA A, BENENSON R, ILG E, et al. Lucid data dreaming for video object segmentation[J]. International Journal of Computer Vision, 2019, 127(9): 1175-1197.
DOI |
[4] | PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3491-3500. |
[5] | KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22) [2022-01-30].https://arxiv.org/abs/1412.6980. |
[6] | HELD D, THRUN S, SAVARESE S. Learning to track at 100 FPS with deep regression networks[M]//Computer Vision - ECCV 2016. Cham: Springer International Publishing, 2016: 749-765. |
[7] | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 4293-4302. |
[8] | VOIGTLAENDER P, LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[C]//The British Machine Vision Conference 2017. Durham University: British Machine Vision Association, 2017: 1-13. |
[9] | GRIFFIN B A, CORSO J J. BubbleNets: learning to select the guidance frame in video object segmentation by deep sorting frames[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8906-8915. |
[10] | SHARIR G, SMOLYANSKY E, FRIEDMAN I. Video object segmentation using tracked object proposals[EB/OL]. [2022-01-03].https://arxiv.org/abs/1707.06545. |
[11] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. [2022-01-03]. https://arxiv.org/abs/1412.7062. |
[12] | HU Y T, HUANG J B, SCHWING A. Maskrnn: Instance level video object segmentation[C]//Neural Information Processing Systems. California: MIT Press, 2017: 325-334. |
[13] | MÄRKI N, PERAZZI F, WANG O, et al. Bilateral space video segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 743-751. |
[14] | JAMPANI V, GADDE R, GEHLER P V. Video propagation networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 3154-3164. |
[15] | CHENG J C, TSAI Y H, HUNG W C, et al. Fast and accurate online video object segmentation via tracking parts[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7415-7424. |
[16] | YANG L J, WANG Y R, XIONG X H, et al. Efficient video object segmentation via network modulation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 6499-6507. |
[17] | XIAO H X, FENG J S, LIN G S, et al. MoNet: deep motion exploitation for video object segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 1140-1148. |
[18] | LUITEN J, VOIGTLAENDER P, LEIBE B. PReMVOS: proposal-generation, refinement and merging for video object segmentation[M]//Computer Vision - ACCV 2018. Cham: Springer International Publishing, 2018: 565-580. |
[19] | HU Y T, HUANG J B, SCHWING A G. VideoMatch: matching based video object segmentation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 56-73. |
[20] | VOIGTLAENDER P, CHAI Y N, SCHROFF F, et al. FEELVOS: fast end-to-end embedding learning for video object segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 9473-9482. |
[21] | OH S W, LEE J Y, SUNKAVALLI K, et al. Fast video object segmentation by reference-guided mask propagation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7376-7385. |
[22] | OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 9226-9235. |
[23] | JOHNANDER J, DANELLJAN M, BRISSMAN E, et al. A generative appearance model for end-to-end video object segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 8945-8954. |
[24] | LIN H J, QI X J, JIA J Y. AGSS-VOS: attention guided single-shot video object segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3948-3956. |
[25] | ZENG X H, LIAO R J, GU L, et al. DMM-net: differentiable mask-matching network for video object segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3929-3938. |
[26] | WANG Z Q, XU J, LIU L, et al. RANet: ranking attention network for fast video object segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 3978-3987. |
[27] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[28] |
BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
DOI PMID |
[29] | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL]. (2017-06-17) [2021-12-05].https://arxiv.org/abs/1706.05587. |
[30] | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//Computer Vision - ECCV 2018. Cham: Springer International Publishing, 2018: 801-818. |
[31] | HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 1314-1324. |
[32] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2014-09-04) [2022-01-10].https://arxiv.org/abs/1409.1556. |
[33] | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3146-3154. |
[34] |
SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
DOI PMID |
[35] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1-9. |
[36] | KRÄHENBÜHL P, KOLTUN V.Efficient inference in fully connected CRFs with Gaussian edge potentials[C]// The 24th International Conference on Neural Information Processing Systems. New York: ACM, 2012: 109-117. |
[37] | XIE S N, TU Z W. Holistically-nested edge detection[C]//2015 IEEE International Conference on Computer Vision. New York: IEEE Press, 2015: 1395-1403. |
[38] |
MANINIS K K, PONT-TUSET J, ARBELÁEZ P, et al. Convolutional oriented boundaries: from image segmentation to high-level tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 819-833.
DOI URL |
[39] | MANINIS K K, PONT-TUSET J, ARBELÁEZ P, et al. Deep retinal image understanding[M]//Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. Cham: Springer International Publishing, 2016: 140-148. |
[40] | PERAZZI F, PONT-TUSET J, MCWILLIAMS B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 724-732. |
[41] | PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 DAVIS challenge on video object segmentation[EB/OL]. (2017-04-03) [2022-01-10].https://arxiv.org/abs/1704.00675. |
[42] | CHENG J C, TSAI Y H, HUNG W C, et al. Fast and accurate online video object segmentation via tracking parts[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7415-7424. |
[43] | CHO S, CHO M, CHUNG T Y, et al. Crvos: clue refining network for video object segmentation[C]//2020 IEEE International Conference on Image Processing. New York: IEEE Press, 2020: 2301-2305. |
[44] | CHEN X, LI Z X, YUAN Y, et al. State-aware tracker for real-time video object segmentation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 9384-9393. |
[45] | ROBINSON A, JÄREMO LAWIN F, DANELLJAN M, et al. Learning fast and robust target models for video object segmentation[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 7406-7415. |
[46] | XU K, WEN L Y, LI G R, et al. Spatiotemporal CNN for video object segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 1379-1388. |
[47] | BAO L C, WU B Y, LIU W.CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 5977-5986. |
[48] | XU S J, LIU D Z, BAO L C, et al. MHP-VOS: multiple hypotheses propagation for video object segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 314-323. |
[1] | YANG Chen-cheng, DONG Xiu-cheng, HOU Bing, ZHANG Dang-cheng, XIANG Xian-ming, FENG Qi-ming. Reference based transformer texture migrates depth images super resolution reconstruction [J]. Journal of Graphics, 2023, 44(5): 861-867. |
[2] | PI Jun, NIU Hou-xing, GAO Zhi-yun. Lightweight human pose estimation algorithm by integrating CA and BiFPN [J]. Journal of Graphics, 2023, 44(5): 868-878. |
[3] | SONG Huan-sheng, WEN Ya, SUN Shi-jie, SONG Xiang-yu, ZHANG Chao-yang, LI Xu. Tunnel fire detection based on improved student-teacher network [J]. Journal of Graphics, 2023, 44(5): 978-987. |
[4] | LI Li-xia, WANG Xin, WANG Jun, ZHANG You-yuan. Small object detection algorithm in UAV image based on feature fusion and attention mechanism [J]. Journal of Graphics, 2023, 44(4): 658-666. |
[5] | HAO Shuai, ZHAO Xin-sheng, MA Xu, ZHANG Xu, HE Tian, HOU Li-xiang. Multi-class defect target detection method for transmission lines based on TR-YOLOv5 [J]. Journal of Graphics, 2023, 44(4): 667-676. |
[6] | LI Xin, PU Yuan-yuan, ZHAO Zheng-peng, XU Dan, QIAN Wen-hua. Content semantics and style features match consistent artistic style transfer [J]. Journal of Graphics, 2023, 44(4): 699-709. |
[7] | YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping. Monocular depth estimation based on Laplacian pyramid with attention fusion [J]. Journal of Graphics, 2023, 44(4): 728-738. |
[8] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[9] | LI Gang, ZHANG Yun-tao, WANG Wen-kai, ZHANG Dong-yang. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion [J]. Journal of Graphics, 2023, 44(3): 438-447. |
[10] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[11] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[12] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[13] | SUN Long-fei, LIU Hui, YANG Feng-chang, LI Pan. Research on cyclic generative network oriented to inter-layer interpolation of medical images [J]. Journal of Graphics, 2023, 44(3): 502-512. |
[14] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. |
[15] | LU Qiu, SHAO Hua-ze, ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||