Journal of Graphics ›› 2025, Vol. 46 ›› Issue (1): 35-46.DOI: 10.11996/JG.j.2095-302X.2025010035
• Image Processing and Computer Vision • Previous Articles Next Articles
CHEN Guanhao(), XU Dan(
), HE Kangjian, SHI Hongzhen, ZHANG Hao
Received:
2024-07-04
Accepted:
2024-10-07
Online:
2025-02-28
Published:
2025-02-14
Contact:
XU Dan
About author:
First author contact:CHEN Guanhao (1995-), master student. His main research interest covers computer vision. E-mail:chenguanhao@stu.ynu.edu.cn
Supported by:
CLC Number:
CHEN Guanhao, XU Dan, HE Kangjian, SHI Hongzhen, ZHANG Hao. TSA-SFNet: transpose self-attention and CNN based stereoscopic fusion network for image super-resolution[J]. Journal of Graphics, 2025, 46(1): 35-46.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2025010035
Fig. 1 Performance comparison between TSA-SFNet and SwinIR ((a) ×2 SR on Urban100 (PSNR); (b) ×2 SR on Urban100 (SSIM); (c) ×4 SR on Urban100 (PSNR); (d) ×4 SR on Urban100 (SSIM))
Method | Scale | Params/M | Set14 | Urban100 | ||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | |||
SwinIR | ×4 | 11.9 | 28.94 | 0.791 4 | 27.07 | 0.816 4 |
w/o SFB | ×4 | 13.7 | 29.08 | 0.793 8 | 27.43 | 0.825 0 |
w/o HCAB | ×4 | 19.7 | 29.05 | 0.793 6 | 27.60 | 0.828 5 |
TSA-SFNet-S | ×4 | 10.0 | 29.05 | 0.793 2 | 27.54 | 0.826 9 |
TSA-SFNet | ×4 | 21.3 | 29.10 | 0.794 2 | 27.64 | 0.829 3 |
Table 1 For comparison with SwinIR, ablation experiments on SFB and HCAB were conducted
Method | Scale | Params/M | Set14 | Urban100 | ||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | |||
SwinIR | ×4 | 11.9 | 28.94 | 0.791 4 | 27.07 | 0.816 4 |
w/o SFB | ×4 | 13.7 | 29.08 | 0.793 8 | 27.43 | 0.825 0 |
w/o HCAB | ×4 | 19.7 | 29.05 | 0.793 6 | 27.60 | 0.828 5 |
TSA-SFNet-S | ×4 | 10.0 | 29.05 | 0.793 2 | 27.54 | 0.826 9 |
TSA-SFNet | ×4 | 21.3 | 29.10 | 0.794 2 | 27.64 | 0.829 3 |
Method | Scale | Params/M | Set14 | Urban100 | ||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | |||
HAT | ×4 | 20.8 | 29.04 | 0.793 4 | 27.47 | 0.825 8 |
w/o SFB | ×4 | 20.8 | 29.07 | 0.794 2 | 27.51 | 0.826 5 |
w/o HCAB | ×4 | 21.3 | 29.07 | 0.794 2 | 27.63 | 0.829 1 |
TSA-SFNet | ×4 | 21.3 | 29.10 | 0.794 2 | 27.64 | 0.829 3 |
Table 2 For comparison with HAT, ablation experiments on SFB and HCAB were conducted
Method | Scale | Params/M | Set14 | Urban100 | ||
---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | |||
HAT | ×4 | 20.8 | 29.04 | 0.793 4 | 27.47 | 0.825 8 |
w/o SFB | ×4 | 20.8 | 29.07 | 0.794 2 | 27.51 | 0.826 5 |
w/o HCAB | ×4 | 21.3 | 29.07 | 0.794 2 | 27.63 | 0.829 1 |
TSA-SFNet | ×4 | 21.3 | 29.10 | 0.794 2 | 27.64 | 0.829 3 |
Fig. 4 For comparison with SwinIR, the visualized results of the ablation experiments for SFB and HCAB are presented (w/o SFB and w/o HCAB represent the replacement of the corresponding modules in TSA-SFNet with those from SwinIR) ((a) HR; (b) SwinIR; (c) w/o SFB; (d) w/o HCAB; (e) TSA-SFNet)
Fig. 5 For comparison with HAT, the visualized results of the ablation experiments for SFB and HCAB are presented. (w/o SFB and w/o HCAB represent the replacement of the corresponding modules in TSA-SFNet with those from HAT) ((a) HR; (b) HAT; (c) w/o SFB; (d) w/o HCAB; (e) TSA-SFNet)
Method | Scale | Years | Set5 | Set14 | B100 | Urban100 | Manga109 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |||
EDSR | ×4 | 2017 | 32.46 | 0.896 8 | 28.80 | 0.787 6 | 27.71 | 0.742 0 | 26.64 | 0.803 3 | 31.02 | 0.914 8 |
RCAN | ×4 | 2018 | 32.63 | 0.900 2 | 28.87 | 0.788 9 | 27.77 | 0.743 6 | 26.82 | 0.808 7 | 31.22 | 0.917 3 |
RDN | ×4 | 2018 | 32.47 | 0.899 0 | 28.81 | 0.787 1 | 27.72 | 0.741 9 | 26.61 | 0.802 8 | 31.00 | 0.915 1 |
SAN | ×4 | 2019 | 32.64 | 0.900 3 | 28.92 | 0.788 8 | 27.78 | 0.743 6 | 26.79 | 0.806 8 | 31.18 | 0.916 9 |
HAN | ×4 | 2020 | 32.64 | 0.900 2 | 28.90 | 0.789 0 | 27.80 | 0.744 2 | 26.85 | 0.809 4 | 31.42 | 0.917 7 |
IGNN | ×4 | 2020 | 32.57 | 0.899 8 | 28.85 | 0.789 1 | 27.77 | 0.743 4 | 26.84 | 0.809 0 | 31.28 | 0.918 2 |
SwinIR | ×4 | 2021 | 32.72 | 0.902 1 | 28.94 | 0.791 4 | 27.83 | 0.745 9 | 27.07 | 0.816 4 | 31.67 | 0.922 6 |
ELAN | ×4 | 2022 | 32.75 | 0.902 2 | 28.96 | 0.791 4 | 27.83 | 0.745 9 | 27.13 | 0.816 7 | 31.68 | 0.922 6 |
SRFormer | ×4 | 2023 | 32.81 | 0.902 9 | 29.01 | 0.791 9 | 27.85 | 0.747 2 | 27.20 | 0.818 9 | 31.75 | 0.923 7 |
HAT | ×4 | 2023 | 32.81 | 0.903 6 | 29.04 | 0.793 4 | 27.91 | 0.748 7 | 27.47 | 0.825 8 | 32.01 | 0.925 4 |
TSA-SFNet | ×4 | - | 32.90 | 0.903 8 | 29.10 | 0.794 2 | 27.91 | 0.749 2 | 27.64 | 0.829 3 | 32.07 | 0.926 1 |
Table 3 Quantitative comparison of TSA-SFNet with the state-of-the-art-super-resolution methods
Method | Scale | Years | Set5 | Set14 | B100 | Urban100 | Manga109 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |||
EDSR | ×4 | 2017 | 32.46 | 0.896 8 | 28.80 | 0.787 6 | 27.71 | 0.742 0 | 26.64 | 0.803 3 | 31.02 | 0.914 8 |
RCAN | ×4 | 2018 | 32.63 | 0.900 2 | 28.87 | 0.788 9 | 27.77 | 0.743 6 | 26.82 | 0.808 7 | 31.22 | 0.917 3 |
RDN | ×4 | 2018 | 32.47 | 0.899 0 | 28.81 | 0.787 1 | 27.72 | 0.741 9 | 26.61 | 0.802 8 | 31.00 | 0.915 1 |
SAN | ×4 | 2019 | 32.64 | 0.900 3 | 28.92 | 0.788 8 | 27.78 | 0.743 6 | 26.79 | 0.806 8 | 31.18 | 0.916 9 |
HAN | ×4 | 2020 | 32.64 | 0.900 2 | 28.90 | 0.789 0 | 27.80 | 0.744 2 | 26.85 | 0.809 4 | 31.42 | 0.917 7 |
IGNN | ×4 | 2020 | 32.57 | 0.899 8 | 28.85 | 0.789 1 | 27.77 | 0.743 4 | 26.84 | 0.809 0 | 31.28 | 0.918 2 |
SwinIR | ×4 | 2021 | 32.72 | 0.902 1 | 28.94 | 0.791 4 | 27.83 | 0.745 9 | 27.07 | 0.816 4 | 31.67 | 0.922 6 |
ELAN | ×4 | 2022 | 32.75 | 0.902 2 | 28.96 | 0.791 4 | 27.83 | 0.745 9 | 27.13 | 0.816 7 | 31.68 | 0.922 6 |
SRFormer | ×4 | 2023 | 32.81 | 0.902 9 | 29.01 | 0.791 9 | 27.85 | 0.747 2 | 27.20 | 0.818 9 | 31.75 | 0.923 7 |
HAT | ×4 | 2023 | 32.81 | 0.903 6 | 29.04 | 0.793 4 | 27.91 | 0.748 7 | 27.47 | 0.825 8 | 32.01 | 0.925 4 |
TSA-SFNet | ×4 | - | 32.90 | 0.903 8 | 29.10 | 0.794 2 | 27.91 | 0.749 2 | 27.64 | 0.829 3 | 32.07 | 0.926 1 |
Fig. 6 Qualitative comparison of our method TSA-SFNet with the state-of-the-art image super-resolution methods on SR task ((a) HR; (b) Bicubic; (c) EDSR; (d) RCAN; (e) RDN; (f) IGNN; (g) NLSA; (h) SwinIR; (i) HAT; (j) TSA-SFNet (Ours))
Fig. 7 Qualitative comparison of our method TSA-SFNet with the state-of-the-art image super-resolution methods on the ×4 letter SR task ((a) HR; (b) NLSA; (c) SwinIR; (d) HAT; (e) TSA-SFNet (Ours))
Method | Scale | Params/M | Set5 | Set14 | B100 | Urban100 | Manga109 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |||
EDSR | ×4 | 43.1 | 32.46 | 0.896 8 | 28.80 | 0.787 6 | 27.71 | 0.742 0 | 26.64 | 0.803 3 | 31.02 | 0.914 8 |
RCAN | ×4 | 15.6 | 32.63 | 0.900 2 | 28.87 | 0.788 9 | 27.77 | 0.743 6 | 26.82 | 0.808 7 | 31.22 | 0.917 3 |
RDN | ×4 | 22.3 | 32.47 | 0.899 0 | 28.81 | 0.787 1 | 27.72 | 0.741 9 | 26.61 | 0.802 8 | 31.00 | 0.915 1 |
RNAN | ×4 | 9.3 | 32.49 | 0.898 2 | 28.83 | 0.787 8 | 27.72 | 0.742 1 | 26.61 | 0.802 3 | 31.09 | 0.914 9 |
IGNN | ×4 | 49.5 | 32.57 | 0.899 8 | 28.85 | 0.789 1 | 27.77 | 0.743 4 | 26.84 | 0.809 0 | 31.28 | 0.918 2 |
NLSA | ×4 | 44.2 | 32.59 | 0.900 0 | 28.87 | 0.789 1 | 27.78 | 0.744 4 | 26.96 | 0.810 9 | 31.27 | 0.918 4 |
SwinIR | ×4 | 11.9 | 32.72 | 0.902 1 | 28.94 | 0.791 4 | 27.83 | 0.745 9 | 27.07 | 0.816 4 | 31.67 | 0.922 6 |
ELAN | ×4 | 8.3 | 32.75 | 0.902 2 | 28.96 | 0.791 4 | 27.83 | 0.745 9 | 27.13 | 0.816 7 | 31.68 | 0.922 6 |
HAT-S | ×4 | 9.6 | 32.69 | 0.902 1 | 29.06 | 0.792 8 | 27.89 | 0.748 1 | 27.44 | 0.824 3 | 31.88 | 0.924 5 |
TSA-SFNet-S | ×4 | 10.0 | 32.79 | 0.902 8 | 29.05 | 0.793 2 | 27.90 | 0.748 6 | 27.54 | 0.826 9 | 31.94 | 0.925 1 |
Table 4 Quantitative comparison of TSA-SFNet-S and state-of-the-art-super-resolution methods
Method | Scale | Params/M | Set5 | Set14 | B100 | Urban100 | Manga109 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |||
EDSR | ×4 | 43.1 | 32.46 | 0.896 8 | 28.80 | 0.787 6 | 27.71 | 0.742 0 | 26.64 | 0.803 3 | 31.02 | 0.914 8 |
RCAN | ×4 | 15.6 | 32.63 | 0.900 2 | 28.87 | 0.788 9 | 27.77 | 0.743 6 | 26.82 | 0.808 7 | 31.22 | 0.917 3 |
RDN | ×4 | 22.3 | 32.47 | 0.899 0 | 28.81 | 0.787 1 | 27.72 | 0.741 9 | 26.61 | 0.802 8 | 31.00 | 0.915 1 |
RNAN | ×4 | 9.3 | 32.49 | 0.898 2 | 28.83 | 0.787 8 | 27.72 | 0.742 1 | 26.61 | 0.802 3 | 31.09 | 0.914 9 |
IGNN | ×4 | 49.5 | 32.57 | 0.899 8 | 28.85 | 0.789 1 | 27.77 | 0.743 4 | 26.84 | 0.809 0 | 31.28 | 0.918 2 |
NLSA | ×4 | 44.2 | 32.59 | 0.900 0 | 28.87 | 0.789 1 | 27.78 | 0.744 4 | 26.96 | 0.810 9 | 31.27 | 0.918 4 |
SwinIR | ×4 | 11.9 | 32.72 | 0.902 1 | 28.94 | 0.791 4 | 27.83 | 0.745 9 | 27.07 | 0.816 4 | 31.67 | 0.922 6 |
ELAN | ×4 | 8.3 | 32.75 | 0.902 2 | 28.96 | 0.791 4 | 27.83 | 0.745 9 | 27.13 | 0.816 7 | 31.68 | 0.922 6 |
HAT-S | ×4 | 9.6 | 32.69 | 0.902 1 | 29.06 | 0.792 8 | 27.89 | 0.748 1 | 27.44 | 0.824 3 | 31.88 | 0.924 5 |
TSA-SFNet-S | ×4 | 10.0 | 32.79 | 0.902 8 | 29.05 | 0.793 2 | 27.90 | 0.748 6 | 27.54 | 0.826 9 | 31.94 | 0.925 1 |
Fig. 8 Qualitative comparison of our method TSA-SFNet-S with the state-of-the-art image SR methods on the ×4 SR task ((a) HR; (b )Bicubic; (c) EDSR; (d) RCAN; (e) RDN; (f) IGNN; (g) NLSA; (h) SwinIR; (i) HAT; (j) TSA-SFNet-S (Ours))
Fig. 9 Qualitative comparison of our method TSA-SFNet with the state-of-the-art methods on the ×4 real-world image SR task ((a) LR; (b) ESRGAN; (c) BSRGAN; (d) Real-ESRGAN; (e) SwinIR; (f) HAT; (g) TSA-SFNet (Ours))
[1] |
DONG C, LOY C C, HE K M, et al. Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 295-307.
DOI PMID |
[2] | DONG C, LOY C C, TANG X O. Accelerating the super-resolution convolutional neural network[C]// The 14th European Conference. Cham: Springer, 2016: 391-407. |
[3] | KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 1646-1654. |
[4] | LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2017: 1132-1140. |
[5] | ZHANG Y L, LI K P, LI K, et al. Image super-resolution using very deep residual channel attention networks[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 294-310. |
[6] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. |
[7] | LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution USING A generative adversarial network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 105-114. |
[8] | WANG X T, YU K, WU S X, et al. ESRGAN: enhanced super-resolution generative adversarial networks[C]// European Conference on Computer Vision. Cham: Springer, 2018: 63-79. |
[9] | ZHANG K, LIANG J Y, VAN GOOL L, et al. Designing a practical degradation model for deep blind image super-resolution[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 4771-4780. |
[10] | LIANG J Y, CAO J Z, SUN G L, et al. SwinIR: image restoration using swin transformer[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 1833-1844. |
[11] | FANG J S, LIN H J, CHEN X Y, et al. A hybrid network of CNN and transformer for lightweight image super-resolution[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 1102-1111. |
[12] | CHEN X Y, WANG X T, ZHOU J T, et al. Activating more pixels in image super-resolution transformer[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2023: 22367-22377. |
[13] | ZHANG X D, ZENG H, GUO S, et al. Efficient long-range attention network for image super-resolution[C]// The 17th European Conference on Computer Vision. Cham: Springer, 2022: 649-667. |
[14] | ZHOU Y P, LI Z, GUO C L, et al. SRFormer: permuted self-attention for single image super-resolution[C]// 2023 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2023: 12734-12745. |
[15] | LIU Z, HU H, LIN Y T, et al. Swin transformer v2: scaling up capacity and resolution[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 11999-12009. |
[16] | PARK N, KIM S. How do vision transformers work?[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2202.06709. |
[17] | WANG P H, ZHENG W Q, CHEN T L, et al. Anti-oversmoothing in deep vision transformers via the Fourier domain analysis: from theory to practice[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2203.05962. |
[18] | TIMOFTE R, AGUSTSSON E, VAN GOOL L, et al. Ntire 2017 challenge on single image super-resolution: methods and results[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. New York: IEEE Press, 2017: 1110-1121. |
[19] | WANG X T, YU K, DONG C, et al. Recovering realistic texture in image super-resolution by deep spatial feature transform[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 606-615. |
[20] | GHIFARY M, KLEIJN W B, ZHANG M J, et al. Deep reconstruction-classification networks for unsupervised domain adaptation[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 597-613. |
[21] | TAI Y, YANG J, LIU X M. Image super-resolution via deep recursive residual network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 2790-2798. |
[22] | WANG X T, XIE L B, DONG C, et al. Real-ESRGAN: training real-world blind super-resolution with pure synthetic data[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 1905-1914. |
[23] | TAI Y, YANG J, LIU X M, et al. MemNet: a persistent memory network for image restoration[C]// 2017 IEEE International Conference on Computer Vision. New York: IEEE Press, 2017: 4549-4557. |
[24] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[25] | 满开亮, 汪友生, 刘继荣. 基于稠密残差网络的图像超分辨率重建算法[J]. 图学学报, 2021, 42(4): 556-562. |
MAN K L, WANG Y S, LIU J R. Image super-resolution reconstruction algorithm based on dense residual network[J]. Journal of Graphics, 2021, 42(4): 556-562 (in Chinese). | |
[26] | 李彬, 王平, 赵思逸. 基于双重注意力机制的图像超分辨重建算法[J]. 图学学报, 2021, 42(2): 206-215. |
LI B, WANG P, ZHAO S Y. Image super-resolution reconstruction based on dual attention mechanism[J]. Journal of Graphics, 2021, 42(2): 206-215 (in Chinese).
DOI |
|
[27] | WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 7794-7803. |
[28] | MEI Y Q, FAN Y C, ZHOU Y Q. Image super-resolution with non-local sparse attention[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 3516-3525. |
[29] | DAI T, CAI J R, ZHANG Y B, et al. Second-order attention network for single image super-resolution[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2019: 11057-11066. |
[30] | ZHOU S C, ZHANG J W, ZUO W M, et al. Cross-scale internal graph neural network for image super-resolution[C]// The 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 295. |
[31] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 9992-10002. |
[32] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2010.11929. |
[33] | CHU X X, TIAN Z, WANG Y Q, et al. Twins: revisiting the design of spatial attention in vision transformers[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 716. |
[34] | LIU L, OUYANG W L, WANG X G, et al. Deep learning for generic object detection: a survey[J]. International Journal of Computer Vision, 2020, 128(2): 261-318. |
[35] | WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 548-558. |
[36] | WU B C, XU C F, DAI X L, et al. Visual transformers: token-based image representation and processing for computer vision[EB/OL]. [2024-05-04]. https://arxiv.org/abs/2006.03677. |
[37] | RAGHU M, UNTERTHINER T, KORNBLITH S, et al. Do vision transformers see like convolutional neural networks?[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 927. |
[38] | LI K C, WANG Y L, ZHANG J H, et al. UniFormer: unifying convolution and self-attention for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12581-12600. |
[39] | YUAN K, GUO S P, LIU Z W, et al. Incorporating convolution designs into visual transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 559-568. |
[40] | WANG Z D, CUN X D, BAO J M, et al. Uformer: a GENERAL U-shaped transformer for image restoration[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 17662-17672. |
[41] | SHI W Z, CABALLERO J, HUSZÁR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 1874-1883. |
[42] | SUNDARARAJAN M, TALY A, YAN Q Q. Axiomatic attribution for deep networks[EB/OL]. [2024-05-24]. https://dl.acm.org/doi/10.5555/3305890.3306024. |
[43] | WU H P, XIAO B, CODELLA N, et al. CvT: introducing convolutions to vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2021: 22-31. |
[44] | XIAO T T, SINGH M, MINTUN E, et al. Early convolutions help transformers see better[C]// The 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 30392-30400. |
[45] | LE BA J, KIROS J R, HINTON G E. Layer normalization[EB/OL]. [2024-05-24]. https://arxiv.org/abs/1607.06450. |
[46] | DING M, YANG Z Y, HONG W Y, et al. CogView: mastering text-to-image generation via transformers[EB/OL]. [2024-05-24]. https://proceedings.neurips.cc/paper/2021/hash/a4d92e2cd541fca87e4620aba658316d-Abstract.html. |
[47] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// The 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
[48] | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. [2024-05-24]. https://dl.acm.org/doi/10.5555/3045118.3045167. |
[49] | WU Y X, HE K M. Group normalization[C]// The 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
[50] | PATEL K, BUR A M, LI F J, et al. Aggregating global features into local vision transformer[C]// The 26th International Conference on Pattern Recognition. New York: IEEE Press, 2022: 1141-1147. |
[51] | VASWANI A, RAMACHANDRAN P, SRINIVAS A, et al. Scaling local self-attention for parameter efficient visual backbones[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2021: 12889-12899. |
[52] | BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding[EB/OL]. [2024-05-24]. https://people.rennes.inria.fr/Aline.Roumy/results/SR_BMVC12.html. |
[53] | ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations[C]// The 7th International Conference on Curves and Surfaces. Cham: Springer, 2012: 711-730. |
[54] | MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics[C]// The 8th IEEE International Conference on Computer Vision. New York: IEEE Press, 2001: 416-423. |
[55] | HUANG J B, SINGH A, AHUJA N. Single image super-resolution from transformed self-exemplars[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 5197-5206. |
[56] | MATSUI Y, ITO K, ARAMAKI Y, et al. Sketch-based manga retrieval using manga109 dataset[J]. Multimedia Tools and Applications, 2017, 76(20): 21811-21838. |
[57] | JOHNSON J, ALAHI A, FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution[C]// The 14th European Conference on Computer Vision. Cham: Springer, 2016: 694-711. |
[58] | ZHANG Y L, TIAN Y P, KONG Y, et al. Residual dense network for image super-resolution[C]// IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2018: 2472-2481. |
[59] | NIU B, WEN W L, REN W Q, et al. Single image super-resolution VIA A holistic attention network[C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 191-207. |
[60] | ZHANG Y L, LI K P, LI K, et al. Residual non-local attention networks for image restoration[EB/OL]. [2024-05-24]. https://arxiv.org/abs/1903.10082. |
[1] | PENG Wen, LIN Jinwei. A short chromosome classification method based on spatial attention and texture enhancement [J]. Journal of Graphics, 2024, 45(5): 1017-1029. |
[2] | LIU Zongming, HONG Wei, LONG Rui, ZHU Yue, ZHANG Xiaoyu. Research on automatic generation and application of Ruyuan Yao embroidery based on self-attention mechanism [J]. Journal of Graphics, 2024, 45(5): 1096-1105. |
[3] | CAO Yi-qin, WU Ming-lin, XU Lu. Steel surface defect detection based on improved YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(2): 335-345. |
[4] | LI Bin , WANG Ping , ZHAO Si-yi. Image super-resolution reconstruction based on dual attention mechanism [J]. Journal of Graphics, 2021, 42(2): 206-215. |
[5] | CHANG Dong-liang , YIN Jun-hui , XIE Ji-yang , SUN Wei-ya , MA Zhan-yu. Attention-guided Dropout for image classification [J]. Journal of Graphics, 2021, 42(1): 32-36. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||