Journal of Graphics ›› 2024, Vol. 45 ›› Issue (4): 683-695.DOI: 10.11996/JG.j.2095-302X.2024040683
• Image Processing and Computer Vision • Previous Articles Next Articles
CHENG Yan1,4(), YAN Zhihang2,4, LAI Jianming2,4, WANG Guixi2,4, ZHONG Linhui3,4
Received:
2024-02-27
Accepted:
2024-05-10
Online:
2024-08-31
Published:
2024-09-03
About author:
First author contact:CHENG Yan (1976-), professor, Ph.D. Her main research interests cover artificial intelligence and image processing. E-mail:chyan88888@jxnu.edu.cn
Supported by:
CLC Number:
CHENG Yan, YAN Zhihang, LAI Jianming, WANG Guixi, ZHONG Linhui. Automatic portrait matting model based on semantic guidance[J]. Journal of Graphics, 2024, 45(4): 683-695.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024040683
编码器 变体 | 各阶段iRMB 堆叠次数 | 通道大小 | 参数量/M |
---|---|---|---|
EMO_T | (2,2,8,3) | (32,48,80,168) | 1.3 |
EMO_S | (3,3,9,3) | (32,48,120,200) | 2.3 |
EMO_B | (3,3,9,3) | (48,72,160,288) | 5.1 |
Table 1 Details of EMO Network
编码器 变体 | 各阶段iRMB 堆叠次数 | 通道大小 | 参数量/M |
---|---|---|---|
EMO_T | (2,2,8,3) | (32,48,80,168) | 1.3 |
EMO_S | (3,3,9,3) | (32,48,120,200) | 2.3 |
EMO_B | (3,3,9,3) | (48,72,160,288) | 5.1 |
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 31.65 | 12.76 | 16.81 | 30.29 | 18.74 |
HATT | ResNeXt-101 | 25.97 | 7.21 | 15.42 | 25.29 | 14.91 |
SHM | ResNet-50 | 21.95 | 9.93 | 12.76 | 18.17 | 27.06 |
GFM | ResNet-34 | 13.09 | 5.12 | 7.68 | 17.34 | 12.63 |
MODNet | MobileNetV2 | 12.93 | 4.57 | 7.54 | 13.31 | 12.38 |
P3M | ResNet-34 | 10.29 | 3.64 | 5.98 | 13.69 | 10.92 |
P3M-Swin | Swin | 9.60 | 3.47 | 5.72 | 13.21 | 9.13 |
APM-SG(T) | EMO_T | 9.49 | 3.34 | 5.56 | 13.17 | 9.26 |
APM-SG(S) | EMO_S | 8.56 | 2.78 | 4.92 | 12.49 | 8.09 |
APM-SG(B) | EMO_B | 8.09 | 2.59 | 4.63 | 12.10 | 7.68 |
Table 2 Experimental results on P3M-500-P
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 31.65 | 12.76 | 16.81 | 30.29 | 18.74 |
HATT | ResNeXt-101 | 25.97 | 7.21 | 15.42 | 25.29 | 14.91 |
SHM | ResNet-50 | 21.95 | 9.93 | 12.76 | 18.17 | 27.06 |
GFM | ResNet-34 | 13.09 | 5.12 | 7.68 | 17.34 | 12.63 |
MODNet | MobileNetV2 | 12.93 | 4.57 | 7.54 | 13.31 | 12.38 |
P3M | ResNet-34 | 10.29 | 3.64 | 5.98 | 13.69 | 10.92 |
P3M-Swin | Swin | 9.60 | 3.47 | 5.72 | 13.21 | 9.13 |
APM-SG(T) | EMO_T | 9.49 | 3.34 | 5.56 | 13.17 | 9.26 |
APM-SG(S) | EMO_S | 8.56 | 2.78 | 4.92 | 12.49 | 8.09 |
APM-SG(B) | EMO_B | 8.09 | 2.59 | 4.63 | 12.10 | 7.68 |
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 40.71 | 16.34 | 23.72 | 41.36 | 17.63 |
HATT | ResNeXt-101 | 30.53 | 9.18 | 17.63 | 27.42 | 19.88 |
SHM | ResNet-50 | 23.63 | 10.61 | 13.68 | 15.17 | 28.52 |
GFM | ResNet-34 | 14.98 | 9.04 | 8.65 | 17.57 | 14.68 |
MODNet | MobileNetV2 | 15.68 | 6.14 | 9.23 | 13.63 | 15.29 |
P3M | ResNet-34 | 12.88 | 4.63 | 7.42 | 12.85 | 12.31 |
P3M-Swin | Swin | 9.84 | 3.24 | 5.74 | 11.45 | 9.43 |
APM-SG(T) | EMO_T | 9.69 | 3.13 | 5.65 | 11.20 | 9.25 |
APM-SG(S) | EMO_S | 8.77 | 2.65 | 5.13 | 10.82 | 8.32 |
APM-SG(B) | EMO_B | 8.62 | 2.49 | 5.02 | 10.67 | 8.18 |
Table 3 Experimental results on P3M-500-NP
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 40.71 | 16.34 | 23.72 | 41.36 | 17.63 |
HATT | ResNeXt-101 | 30.53 | 9.18 | 17.63 | 27.42 | 19.88 |
SHM | ResNet-50 | 23.63 | 10.61 | 13.68 | 15.17 | 28.52 |
GFM | ResNet-34 | 14.98 | 9.04 | 8.65 | 17.57 | 14.68 |
MODNet | MobileNetV2 | 15.68 | 6.14 | 9.23 | 13.63 | 15.29 |
P3M | ResNet-34 | 12.88 | 4.63 | 7.42 | 12.85 | 12.31 |
P3M-Swin | Swin | 9.84 | 3.24 | 5.74 | 11.45 | 9.43 |
APM-SG(T) | EMO_T | 9.69 | 3.13 | 5.65 | 11.20 | 9.25 |
APM-SG(S) | EMO_S | 8.77 | 2.65 | 5.13 | 10.82 | 8.32 |
APM-SG(B) | EMO_B | 8.62 | 2.49 | 5.02 | 10.67 | 8.18 |
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 73.54 | 45.46 | 53.27 | 72.24 | 68.34 |
HATT | ResNeXt-101 | 52.36 | 23.84 | 36.32 | 73.27 | 48.31 |
SHM | ResNet-50 | 55.83 | 26.21 | 36.90 | 69.25 | 53.74 |
GFM | ResNet-34 | 40.89 | 24.02 | 32.46 | 62.03 | 40.03 |
MODNet | MobileNetV2 | 43.01 | 24.95 | 35.15 | 67.93 | 42.27 |
P3M | ResNet-34 | 38.91 | 21.84 | 29.84 | 60.11 | 38.82 |
P3M-Swin | Swin | 35.42 | 20.24 | 28.27 | 61.15 | 35.43 |
APM-SG(T) | EMO_T | 34.98 | 19.12 | 26.82 | 59.78 | 34.95 |
APM-SG(S) | EMO_S | 33.14 | 18.01 | 26.11 | 60.69 | 33.13 |
APM-SG(B) | EMO_B | 32.67 | 17.32 | 25.74 | 59.45 | 32.67 |
Table 4 Experimental results on RWP-636
方法 | 骨干网络 | SAD↓ | MSE↓ | MAD↓ | Grad↓ | Conn↓ |
---|---|---|---|---|---|---|
LFM | DenseNet-201 | 73.54 | 45.46 | 53.27 | 72.24 | 68.34 |
HATT | ResNeXt-101 | 52.36 | 23.84 | 36.32 | 73.27 | 48.31 |
SHM | ResNet-50 | 55.83 | 26.21 | 36.90 | 69.25 | 53.74 |
GFM | ResNet-34 | 40.89 | 24.02 | 32.46 | 62.03 | 40.03 |
MODNet | MobileNetV2 | 43.01 | 24.95 | 35.15 | 67.93 | 42.27 |
P3M | ResNet-34 | 38.91 | 21.84 | 29.84 | 60.11 | 38.82 |
P3M-Swin | Swin | 35.42 | 20.24 | 28.27 | 61.15 | 35.43 |
APM-SG(T) | EMO_T | 34.98 | 19.12 | 26.82 | 59.78 | 34.95 |
APM-SG(S) | EMO_S | 33.14 | 18.01 | 26.11 | 60.69 | 33.13 |
APM-SG(B) | EMO_B | 32.67 | 17.32 | 25.74 | 59.45 | 32.67 |
方法 | 参数量/M | 计算量/GFLOPs |
---|---|---|
LFM | 37.90 | 1502.50 |
HATT | 107.00 | 870.20 |
SHM | 79.30 | 40.30 |
GFM | 55.30 | 1518.60 |
MODNet | 6.50 | 34.60 |
P3M | 39.50 | 160.20 |
P3M-Swin | 45.13 | 177.00 |
APM-SG(T) | 3.71 | 23.08 |
APM-SG(S) | 5.92 | 26.56 |
APM-SG(B) | 12.26 | 50.44 |
Table 5 Comparison of parameter count and computational cost
方法 | 参数量/M | 计算量/GFLOPs |
---|---|---|
LFM | 37.90 | 1502.50 |
HATT | 107.00 | 870.20 |
SHM | 79.30 | 40.30 |
GFM | 55.30 | 1518.60 |
MODNet | 6.50 | 34.60 |
P3M | 39.50 | 160.20 |
P3M-Swin | 45.13 | 177.00 |
APM-SG(T) | 3.71 | 23.08 |
APM-SG(S) | 5.92 | 26.56 |
APM-SG(B) | 12.26 | 50.44 |
MAM | FEM | AGM | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|---|---|
× | × | × | 10.98 | 3.54 | 6.38 |
× | × | √ | 10.52 | 3.48 | 5.94 |
√ | × | √ | 9.72 | 3.39 | 5.68 |
× | √ | √ | 9.93 | 3.41 | 5.70 |
√ | √ | √ | 9.49 | 3.34 | 5.56 |
Table 6 Effect of different modules on model performance
MAM | FEM | AGM | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|---|---|
× | × | × | 10.98 | 3.54 | 6.38 |
× | × | √ | 10.52 | 3.48 | 5.94 |
√ | × | √ | 9.72 | 3.39 | 5.68 |
× | √ | √ | 9.93 | 3.41 | 5.70 |
√ | √ | √ | 9.49 | 3.34 | 5.56 |
是否进行损失计算 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
-损失计算 | 9.62 | 3.38 | 5.60 |
+损失计算 | 9.49 | 3.34 | 5.56 |
Table 7 Effect of FEM loss calculation on model performance
是否进行损失计算 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
-损失计算 | 9.62 | 3.38 | 5.60 |
+损失计算 | 9.49 | 3.34 | 5.56 |
PAM和CAM的位置关系 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
PAM串联在前 | 9.54 | 3.36 | 5.57 |
CAM串联在前 | 9.56 | 3.38 | 5.59 |
并行放置 | 9.49 | 3.34 | 5.56 |
Table 8 Effect of attention module position on model performance
PAM和CAM的位置关系 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
PAM串联在前 | 9.54 | 3.36 | 5.57 |
CAM串联在前 | 9.56 | 3.38 | 5.59 |
并行放置 | 9.49 | 3.34 | 5.56 |
功能 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
引导 | 11.25 | 3.77 | 6.52 |
聚合 | 10.34 | 3.82 | 5.98 |
聚合+引导 | 9.49 | 3.34 | 5.56 |
Table 9 Effect of AGM on model performance
功能 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
引导 | 11.25 | 3.77 | 6.52 |
聚合 | 10.34 | 3.82 | 5.98 |
聚合+引导 | 9.49 | 3.34 | 5.56 |
注意力机制 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
CBAM | 10.07 | 3.63 | 5.83 |
EMA | 10.57 | 3.79 | 6.13 |
ACmix | 11.08 | 4.12 | 6.43 |
GMA | 11.25 | 3.77 | 6.52 |
CAM/PAM | 9.49 | 3.34 | 5.56 |
Table 10 Effect of attention on model performance
注意力机制 | SAD↓ | MSE↓ | MAD↓ |
---|---|---|---|
CBAM | 10.07 | 3.63 | 5.83 |
EMA | 10.57 | 3.79 | 6.13 |
ACmix | 11.08 | 4.12 | 6.43 |
GMA | 11.25 | 3.77 | 6.52 |
CAM/PAM | 9.49 | 3.34 | 5.56 |
编码器 | SAD↓ | MSE↓ | MAD↓ | 计算量/GFLOPs |
---|---|---|---|---|
ResNet-34 | 10.26 | 3.81 | 5.93 | 186.90 |
MobileViT-xxs | 10.04 | 3.44 | 5.70 | 30.50 |
EMO_T | 9.49 | 3.34 | 5.56 | 23.08 |
Table 11 Effect of encoder on model performance
编码器 | SAD↓ | MSE↓ | MAD↓ | 计算量/GFLOPs |
---|---|---|---|---|
ResNet-34 | 10.26 | 3.81 | 5.93 | 186.90 |
MobileViT-xxs | 10.04 | 3.44 | 5.70 | 30.50 |
EMO_T | 9.49 | 3.34 | 5.56 | 23.08 |
[1] | ZHANG J, TAO D C. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(10): 7789-7817. |
[2] |
CHEN Q F, LI D, TANG C K. KNN matting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(9): 2175-2188.
DOI PMID |
[3] | PORTER T, DUFF T. Compositing digital images[C]// The 11th Annual Conference on Computer Graphics and Interactive Techniques. New York: ACM, 1984: 253-259. |
[4] | AKSOY Y, AYDIN T O, POLLEFEYS M. Designing effective inter-pixel information flow for natural image matting[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 228-236. |
[5] | CHUANG Y Y, CURLESS B, SALESIN D H, et al. A Bayesian approach to digital matting[C]// 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Computer Society, 2001: 264. |
[6] | GASTAL E S L, OLIVEIRA M M. Shared sampling for real-time alpha matting[J]. Computer Graphics Forum, 2010, 29(2): 575-584. |
[7] |
LEVIN A, LISCHINSKI D, WEISS Y. A closed-form solution to natural image matting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(2): 228-242.
PMID |
[8] |
LEVIN A, RAV-ACHA A, LISCHINSKI D. Spectral matting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(10): 1699-1712.
DOI PMID |
[9] | SUN J, JIA J Y, TANG C K, et al. Poisson matting[J]. ACM Transactions on Graphics, 23(3): 315-321. |
[10] | PHAM V Q, TAKAHASHI K, NAEMURA T. Real-time video matting based on bilayer segmentation[C]// Asian Conference on Computer Vision. Berlin, Heidelberg: Springer, 2010: 489-501. |
[11] | XU N, PRICE B, COHEN S, et al. Deep image matting[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 311-320. |
[12] | LIU Y H, XIE J K, SHI X, et al. Tripartite information mining and integration for image matting[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 7535-7544. |
[13] | SENGUPTA S, JAYARAM V, CURLESS B, et al. Background matting: the world is your green screen[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 2288-2297. |
[14] | LIN S C, YANG L J, SALEEMI I, et al. Robust high-resolution video matting with temporal guidance[C]// 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. New York: IEEE Press, 2022: 3132-3141. |
[15] | 彭泓, 张家宝, 贾迪, 等. 结合背景图的高分辨率视频人像实时抠图网络[J]. 中国图象图形学报, 2024, 29(2): 478-490. |
PENG H, ZHANG J B, JIA D, et al. Real-time high-resolution video portrait matting network combined with background image[J]. Journal of Image and Graphics, 2024, 29(2): 478-490 (in Chinese). | |
[16] | CHEN Q, GE T Z, XU Y Y, et al. Semantic human matting[C]// The 26th ACM International Conference on Multimedia. New York: ACM, 2018: 618-626. |
[17] | DEORA R, SHARMA R, RAJ D S S. Salient image matting[EB/OL]. [2023-10-19]. http://arxiv.org/abs/2103.12337. |
[18] | SHARMA R, DEORA R, VISHVAKARMA A. AlphaNet: an attention guided deep network for automatic image matting[C]// 2020 International Conference on Omni-layer Intelligent Systems. New York: IEEE Press, 2020: 1-8. |
[19] | ZHANG Y K, GONG L X, FAN L B, et al. A late fusion CNN for digital matting[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 7461-7470. |
[20] | QIAO Y, LIU Y H, YANG X, et al. Attention-guided hierarchical structure aggregation for image matting[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 13673-13682. |
[21] | LI J, ZHANG J, MAYBANK S J, et al. Bridging composite and real: towards end-to-end deep image matting[J]. International Journal of Computer Vision, 2022, 130(2): 246-266. |
[22] | LI J, ZHANG J, TAO D C. Deep automatic natural image matting[EB/OL]. [2023-10-19]. https://arxiv.org/abs/2107.07235. |
[23] | 苏常保, 龚世才. 基于深度学习的人物肖像全自动抠图算法[J]. 图学学报, 2022, 43(2): 247-253. |
SU C B, GONG S C. Fully automatic matting algorithm for portraits based on deep learning[J]. Journal of Graphics, 2022, 43(2): 247-253 (in Chinese).
DOI |
|
[24] | KE Z H, SUN J Y, LI K C, et al. MODNet: real-time trimap-free portrait matting via objective decomposition[C]// 2022 AAAI Conference on Artificial Intelligence. Vancouver: AAAI Press, 2022: 1140-1147. |
[25] | MA S H, LI J, ZHANG J, et al. Rethinking portrait matting with privacy preserving[J]. International Journal of Computer Vision, 2023, 131(8): 2172-2197. |
[26] | O’SHEA K, NASH R. An introduction to convolutional neural networks[EB/OL]. [2023-10-19]. http://arxiv.org/abs/1511.08458. |
[27] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems, Red Hook: NIPS, 2017: 6000-6010. |
[28] | ZHANG J N, LI X T, LI J, et al. Rethinking mobile block for efficient attention-based models[EB/OL]. [2023-10-19]. http://arxiv.org/abs/2301.01146. |
[29] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[30] | HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 4002-4011. |
[31] | HUANG Z L, WANG X G, WEI Y C, et al. CCNet: criss-cross attention for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 6896-6908. |
[32] | FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 3141-3149. |
[33] | GUO M H, LU C Z, HOU Q B, et al. SegNeXt: rethinking convolutional attention design for semantic segmentation[EB/OL]. [2023-10-19]. http://arxiv.org/abs/2209.08575. |
[34] | TAKIKAWA T, ACUNA D, JAMPANI V, et al. Gated-SCNN: gated shape CNNs for semantic segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 5228-5237. |
[35] | HOU Q Q, LIU F. Context-aware image matting for simultaneous foreground and alpha estimation[C]// 2019 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2019: 4129-4138. |
[36] | KINGMA D, BA J. Adam: a method for stochastic optimization[C]// 2015 International Conference on Learning Representations. San Diego: ICLR, 2015: 1-15. |
[37] | YU Q H, ZHANG J M, ZHANG H, et al. Mask guided matting via progressive refinement network[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 1154-1163. |
[38] | RHEMANN C, ROTHER C, WANG J, et al. A perceptually motivated online benchmark for image matting[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2009: 1826-1833. |
[39] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2261-2269. |
[40] | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5987-5995. |
[41] | SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 4510-4520. |
[42] | LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical Vision Transformer using Shifted Windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
[43] | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[44] | OUYANG D L, HE S, ZHANG G Z, et al. Efficient multi-scale attention module with cross-spatial learning[C]// 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE Press, 2023: 1-5. |
[45] | PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 805-815. |
[46] | GE C J, DING X H, TONG Z, et al. Advancing vision transformers with group-mix attention[EB/OL]. [2023-10-19]. http://arxiv.org/abs/2311.15157. |
[47] | MEHTA S, RASTEGARI M. MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer[EB/OL]. [2023-10-19]. http://arxiv.org/abs/2110.02178. |
[1] | LIU Li, ZHANG Qifan, BAI Yuang, HUANG Kaiye. Research on multi-scale remote sensing image change detection using Swin Transformer [J]. Journal of Graphics, 2024, 45(5): 941-956. |
[2] | LI Daxiang, JI Zhan, LIU Ying, TANG Yao. Improving YOLOv7 remote sensing image target detection algorithm [J]. Journal of Graphics, 2024, 45(4): 650-658. |
[3] | ZHANG Xinyu, ZHANG Jiayi, GAO Xin. ASC-Net: fast segmentation network for surgical instruments and organs in laparoscopic video [J]. Journal of Graphics, 2024, 45(4): 659-669. |
[4] | ZHU Guanghui, MIAO Jun, HU Hongli, SHEN Ji, DU Ronghua. 3D piece-wise planar reconstruction from a single indoor image based on self-augmented -attention mechanism [J]. Journal of Graphics, 2024, 45(3): 464-471. |
[5] | FAN Teng, YANG Hao, YIN Wen, ZHOU Dong-ming. Multi-scale view synthesis based on neural radiance field [J]. Journal of Graphics, 2023, 44(6): 1140-1148. |
[6] | ZHANG Li-yuan, ZHAO Hai-rong, HE Wei, TANG Xiong-feng. Knee cysts detection algorithm based on Mask R-CNN integrating global-local attention module [J]. Journal of Graphics, 2023, 44(6): 1183-1190. |
[7] | SHI Jia-hao, YAO Li. Video captioning based on semantic guidance [J]. Journal of Graphics, 2023, 44(6): 1191-1201. |
[8] | YUN Feng, WANG You-zhi, SONG Jiao, GENG Lei, ZHANG Cheng-hu, LIU Ji-kai. A lattice-solid hybrid structure topology optimization method for support-free additive manufacturing [J]. Journal of Graphics, 2023, 44(5): 1013-1020. |
[9] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[10] | LU Qiu, SHAO Hua-ze, ZHANG Yun-lei. Dynamic balanced multi-scale feature fusion for colorectal polyp segmentation [J]. Journal of Graphics, 2023, 44(2): 225-232. |
[11] | LUO Qi-ming, WU Hao, XIA Xin, YUAN Guo-wu. Prediction of damaged areas in Yunnan murals using Dual Dense U-Net [J]. Journal of Graphics, 2023, 44(2): 304-312. |
[12] | ZHANG Qian, WANG Xia-li, WANG Wei-hao, WU Li-zhan, LI Chao. Cell counting method based on multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(1): 41-49. |
[13] | HUANG Zhi-yong, HAN Sha-sha, CHEN Zhi-jun, YAO Yu, XIONG Biao, MA Kai. An imitation U-shaped network for video object segmentation [J]. Journal of Graphics, 2023, 44(1): 104-111. |
[14] | WANG Yu-ping, ZENG Yi, LI Sheng-hui, ZHANG Lei. A Transformer-based 3D human pose estimation method [J]. Journal of Graphics, 2023, 44(1): 139-145. |
[15] | WU Li-zhan, WANG Xia-li, ZHANG Qian, WANG Wei-hao, LI Chao . An object detection method of falling person based on optimized YOLOv5s [J]. Journal of Graphics, 2022, 43(5): 791-802. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||