Journal of Graphics ›› 2023, Vol. 44 ›› Issue (2): 271-279.DOI: 10.11996/JG.j.2095-302X.2023020271
Previous Articles Next Articles
XIONG Ju-ju1(), XU Yang1,2(), FAN Run-ze1, SUN Shao-cong1
Received:
2022-09-02
Accepted:
2022-11-24
Online:
2023-04-30
Published:
2023-05-01
Contact:
XU Yang (1980-), associate professor, Ph.D. His main research interests cover data collection, machine learning, etc. E-mail:About author:
XIONG Ju-ju (2000-), master student. His main research interest covers image processing. E-mail:juxiong0416@163.com
Supported by:
CLC Number:
XIONG Ju-ju, XU Yang, FAN Run-ze, SUN Shao-cong. Flowers recognition based on lightweight visual transformer[J]. Journal of Graphics, 2023, 44(2): 271-279.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2023020271
Method | Size | Parametric(M) | FLOPs (G) | Throughput (image/s) | Accuracy (%) | |
---|---|---|---|---|---|---|
102数据集 | 104数据集 | |||||
RegNet-4G | 2242 | 20.60 | 4.00 | 367.4 | 85.3 | 84.6 |
EffcientNet-B4 | 3802 | 19.30 | 4.20 | 410.9 | 87.9 | 87.5 |
Inception-V3 | 2992 | 27.16 | 6.00 | 96.5 | 80.3 | 79.8 |
MobileNet-160 | 2242 | 5.50 | 0.58 | 755.1 | 78.3 | 77.4 |
ViT-B | 3842 | 86.40 | 55.40 | 27.2 | 82.9 | 82.6 |
DeiT-S | 2242 | 22.10 | 4.60 | 298.7 | 84.8 | 84.3 |
Swin-T | 2242 | 29.00 | 4.50 | 239.9 | 86.3 | 85.9 |
CSwin-T | 2242 | 23.00 | 4.30 | 215.6 | 87.7 | 87.2 |
Ours | 2242 | 19.30 | 3.20 | 459.3 | 88.1 | 87.3 |
Table 1 Comparison of results with other methods on two datasets
Method | Size | Parametric(M) | FLOPs (G) | Throughput (image/s) | Accuracy (%) | |
---|---|---|---|---|---|---|
102数据集 | 104数据集 | |||||
RegNet-4G | 2242 | 20.60 | 4.00 | 367.4 | 85.3 | 84.6 |
EffcientNet-B4 | 3802 | 19.30 | 4.20 | 410.9 | 87.9 | 87.5 |
Inception-V3 | 2992 | 27.16 | 6.00 | 96.5 | 80.3 | 79.8 |
MobileNet-160 | 2242 | 5.50 | 0.58 | 755.1 | 78.3 | 77.4 |
ViT-B | 3842 | 86.40 | 55.40 | 27.2 | 82.9 | 82.6 |
DeiT-S | 2242 | 22.10 | 4.60 | 298.7 | 84.8 | 84.3 |
Swin-T | 2242 | 29.00 | 4.50 | 239.9 | 86.3 | 85.9 |
CSwin-T | 2242 | 23.00 | 4.30 | 215.6 | 87.7 | 87.2 |
Ours | 2242 | 19.30 | 3.20 | 459.3 | 88.1 | 87.3 |
Method | Stage1 | Stage2 | Stage3 | Stage4 | Parametric (M) | FLOPs (G) | Accuracy (%) |
---|---|---|---|---|---|---|---|
Swin-T | × | × | × | × | 29.00 | 4.50 | 86.3 |
Ours-1 | √ | × | × | × | 26.91 | 3.90 | 85.7 |
Ours-2 | √ | √ | × | × | 19.05 | 2.97 | 84.9 |
Ours-3 | √ | √ | √ | × | 17.26 | 2.64 | 83.1 |
Ours-4 | √ | √ | √ | √ | 14.35 | 2.13 | 80.6 |
Table 2 Comparison of models with different numbers of PoolFormers replaced
Method | Stage1 | Stage2 | Stage3 | Stage4 | Parametric (M) | FLOPs (G) | Accuracy (%) |
---|---|---|---|---|---|---|---|
Swin-T | × | × | × | × | 29.00 | 4.50 | 86.3 |
Ours-1 | √ | × | × | × | 26.91 | 3.90 | 85.7 |
Ours-2 | √ | √ | × | × | 19.05 | 2.97 | 84.9 |
Ours-3 | √ | √ | √ | × | 17.26 | 2.64 | 83.1 |
Ours-4 | √ | √ | √ | √ | 14.35 | 2.13 | 80.6 |
Method | Stage1 | Stage2 | Stage3 | Stage4 | Parametric (M) | FLOPs (G) | Accuracy (%) |
---|---|---|---|---|---|---|---|
Swin-T | - | - | - | - | 29.00 | 4.50 | 86.3 |
Ours-2 | - | - | - | - | 19.05 | 2.97 | 84.9 |
√ | - | - | - | 19.12 | 3.03 | 85.2 | |
- | √ | - | - | 19.19 | 3.11 | 86.1 | |
- | - | √ | - | 19.30 | 3.20 | 87.6 | |
- | - | - | √ | 19.48 | 3.32 | 86.6 |
Table 3 Network comparison of adding DCAM modules in different positions
Method | Stage1 | Stage2 | Stage3 | Stage4 | Parametric (M) | FLOPs (G) | Accuracy (%) |
---|---|---|---|---|---|---|---|
Swin-T | - | - | - | - | 29.00 | 4.50 | 86.3 |
Ours-2 | - | - | - | - | 19.05 | 2.97 | 84.9 |
√ | - | - | - | 19.12 | 3.03 | 85.2 | |
- | √ | - | - | 19.19 | 3.11 | 86.1 | |
- | - | √ | - | 19.30 | 3.20 | 87.6 | |
- | - | - | √ | 19.48 | 3.32 | 86.6 |
Method | Lcross | Lcon | Lcross+Lcon | Accuracy (%) |
---|---|---|---|---|
Swin-T | √ | - | - | 86.3 |
- | √ | - | 86.4 | |
- | - | √ | 86.6 | |
Ours | √ | - | - | 87.6 |
- | √ | - | 87.8 | |
- | - | √ | 88.1 |
Table 4 Comparison of networks using different loss functions
Method | Lcross | Lcon | Lcross+Lcon | Accuracy (%) |
---|---|---|---|---|
Swin-T | √ | - | - | 86.3 |
- | √ | - | 86.4 | |
- | - | √ | 86.6 | |
Ours | √ | - | - | 87.6 |
- | √ | - | 87.8 | |
- | - | √ | 88.1 |
[1] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
DOI URL |
[2] | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2005: 886-893. |
[3] |
郭艾侠, 彭明明, 邢仲璟. 机器视觉技术在荔枝识别与定位研究中的应用[J]. 计算机工程与应用, 2017, 53(17): 218-223, 259.
DOI |
GUO A X, PENG M M, XING Z J. Study on recognition and positioning of litchi based on technology of machine vision[J]. Computer Engineering and Applications, 2017, 53(17): 218-223, 259. (in Chinese)
DOI |
|
[4] | 王永皎, 张引, 张三元. 基于图像处理的植物叶面积测量方法[J]. 计算机工程, 2006, 32(8): 210-212. |
WANG Y J, ZHANG Y, ZHANG S Y. Approach to measure plant leaf area based on image process[J]. Computer Engineering, 2006, 32(8): 210-212. (in Chinese) | |
[5] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]// The 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010. |
[6] | YU W H, LUO M, ZHOU P, et al. MetaFormer is actually what You need for vision[EB/OL]. [2022-06-21]. https://arxiv.org/abs/2111.11418. |
[7] | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module[EB/OL]. [2022-05-14]. https://arxiv.org/abs/1807.06514. |
[8] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// The 25th International Conference on Neural Information Processing Systems-Volume 1. New York:ACM, 2012: 1097-1105. |
[9] | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-05- 16]. https://arxiv.org/abs/1409.1556. |
[10] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2015: 1-9. |
[11] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[12] | LEE S H, CHAN C S, WILKIN P, et al. Deep-plant: plant identification with convolutional neural networks[C]// 2015 IEEE International Conference on Image Processing. New York: IEEE Press, 2015: 452-456. |
[13] | XIA X L, XU C, NAN B. Inception-v3 for flower classification[C]// 2017 2nd International Conference on Image, Vision and Computing. New York: IEEE Press, 2017: 783-787. |
[14] | GAVAI N R, JAKHADE Y A, TRIBHUVAN S A, et al. MobileNets for flower classification using TensorFlow[C]// 2017 International Conference on Big Data, IoT and Data Science. New York: IEEE Press, 2018: 154-158. |
[15] |
CAO S, SONG B. Visual attentional-driven deep learning method for flower recognition[J]. Mathematical Biosciences and Engineering: MBE, 2021, 18(3): 1981-1991.
DOI URL |
[16] | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2022-05-16]. https://arxiv.org/abs/2010.11929. |
[17] | TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image transformers & distillation through attention[EB/OL]. [2022-05-16]. https://arxiv.org/abs/2012.12877. |
[18] | YUAN L, CHEN Y P, WANG T, et al. Tokens-to-token ViT: training vision transformers from scratch on ImageNet[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 538-547. |
[19] | LEE-THORP J, AINSLIE J, ECKSTEIN I, et al. FNet: mixing tokens with Fourier transforms[EB/OL]. [2022-05-16]. https://arxiv.org/abs/2105.03824. |
[20] | TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-mixer: an all-MLP architecture for vision[EB/OL]. [2022-05-16]. https://arxiv.org/abs/2105.01601. |
[21] | LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision. New York: IEEE Press, 2022: 9992-10002. |
[22] | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 936-944. |
[23] | RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[M]// Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015: 234-241. |
[24] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2016: 770-778. |
[25] | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2017: 5987-5995. |
[26] | HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs)[EB/OL]. [2022-05-16]. https://arxiv.org/abs/1606.08415. |
[27] | RADOSAVOVIC I, KOSARAJU R P, GIRSHICK R, et al. Designing network design spaces[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2020: 10425-10433. |
[28] | HUMPHREY E J, BELLO J P. Rethinking automatic chord recognition with convolutional neural networks[C]// The 11th International Conference on Machine Learning and Applications. New York: IEEE Press, 2013: 357-362. |
[29] | DONG X Y, BAO J M, CHEN D D, et al. CSWin transformer: a general vision transformer backbone with cross-shaped windows[EB/OL]. [2022-05-16]. https://arxiv.org/abs/2107.00652. |
[1] |
LI Li-xia , WANG Xin, WANG Jun , ZHANG You-yuan.
Small object detection algorithm in UAV image based on
feature fusion and attention mechanism
[J]. Journal of Graphics, 2023, 44(4): 658-666.
|
[2] |
CAO Yi-qin , ZHOU Yi-wei , XU Lu.
A real-time metallic surface defect detection algorithm based on E-YOLOX
[J]. Journal of Graphics, 2023, 44(4): 677-690.
|
[3] |
LI Xin , PU Yuan-yuan, ZHAO Zheng-peng , XU Dan , QIAN Wen-hua.
Content semantics and style features match consistent
artistic style transfer
[J]. Journal of Graphics, 2023, 44(4): 699-709.
|
[4] |
YU Wei-qun, LIU Jia-tao, ZHANG Ya-ping.
Monocular depth estimation based on Laplacian
pyramid with attention fusion
[J]. Journal of Graphics, 2023, 44(4): 728-738.
|
[5] | HU Xin, ZHOU Yun-qiang, XIAO Jian, YANG Jie. Surface defect detection of threaded steel based on improved YOLOv5 [J]. Journal of Graphics, 2023, 44(3): 427-437. |
[6] | MAO Ai-kun, LIU Xin-ming, CHEN Wen-zhuang, SONG Shao-lou. Improved substation instrument target detection method for YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(3): 448-455. |
[7] | HAO Peng-fei, LIU Li-qun, GU Ren-yuan. YOLO-RD-Apple orchard heterogenous image obscured fruit detection model [J]. Journal of Graphics, 2023, 44(3): 456-464. |
[8] | LI Yu, YAN Tian-tian, ZHOU Dong-sheng, WEI Xiao-peng. Natural scene text detection based on attention mechanism and deep multi-scale feature fusion [J]. Journal of Graphics, 2023, 44(3): 473-481. |
[9] | XIAO Tian-xing, WU Jing-jing. Segmentation of laser coding characters based on residual and feature-grouped attention [J]. Journal of Graphics, 2023, 44(3): 482-491. |
[10] | WU Wen-huan, ZHANG Hao-kun. Semantic segmentation with fusion of spatial criss-cross and channel multi-head attention [J]. Journal of Graphics, 2023, 44(3): 531-539. |
[11] | XIE Guo-bo, HE Di-xuan, HE Yu-qin, LIN Zhi-yi. P-CenterNet for chimney detection in optical remote-sensing images [J]. Journal of Graphics, 2023, 44(2): 233-240. |
[12] | CHENG Lang, JING Chao. X-ray image rotating object detection based on improved YOLOv7 [J]. Journal of Graphics, 2023, 44(2): 324-334. |
[13] | CAO Yi-qin, WU Ming-lin, XU Lu. Steel surface defect detection based on improved YOLOv5 algorithm [J]. Journal of Graphics, 2023, 44(2): 335-345. |
[14] | ZHANG Wei-kang, SUN Hao, CHEN Xin-kai, LI Xu-bing, YAO Li-gang, DONG Hui. Research on weed detection in vegetable seedling fields based on the improved YOLOv5 intelligent weeding robot [J]. Journal of Graphics, 2023, 44(2): 346-356. |
[15] | LI Xiao-bo , LI Yang-gui, GUO Ning , FAN Zhen. Mask detection algorithm based on YOLOv5 integrating attention mechanism [J]. Journal of Graphics, 2023, 44(1): 16-25. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||