欢迎访问《图学学报》 分享到:

图学学报 ›› 2023, Vol. 44 ›› Issue (2): 271-279.DOI: 10.11996/JG.j.2095-302X.2023020271

• 图像处理与计算机视觉 • 上一篇    下一篇

基于轻量化视觉Transformer的花卉识别

熊举举1(), 徐杨1,2(), 范润泽1, 孙少聪1   

  1. 1.贵州大学大数据与信息工程学院,贵州 贵阳 550025
    2.贵阳铝镁设计研究院有限公司,贵州 贵阳 550025
  • 收稿日期:2022-09-02 接受日期:2022-11-24 出版日期:2023-04-30 发布日期:2023-05-01
  • 通讯作者: 徐杨(1980-),男,副教授,博士。主要研究方向为数据采集、机器学习等。E-mail:xuy@gzu.edu.cn
  • 作者简介:熊举举(2000-),男,硕士研究生。主要研究方向为数字图像处理。E-mail:juxiong0416@163.com
  • 基金资助:
    贵州省科技计划项目(黔科合支撑[2021]一般176)

Flowers recognition based on lightweight visual transformer

XIONG Ju-ju1(), XU Yang1,2(), FAN Run-ze1, SUN Shao-cong1   

  1. 1. College of Big Data and Information Engineering, Guizhou University, Guiyang Guizhou 550025, China
    2. Guiyang Aluminum-Magnesium Design and Research Institute Co., Ltd., Guiyang Guizhou 550025, China
  • Received:2022-09-02 Accepted:2022-11-24 Online:2023-04-30 Published:2023-05-01
  • Contact: XU Yang (1980-), associate professor, Ph.D. His main research interests cover data collection, machine learning, etc. E-mail:xuy@gzu.edu.cn
  • About author:XIONG Ju-ju (2000-), master student. His main research interest covers image processing. E-mail:juxiong0416@163.com
  • Supported by:
    Science and Technology Plan Project of Guizhou Province(Qian Kehe [2021] General 176)

摘要:

由于不同种类花卉之间的相似性以及同种花卉的差异性,提取局部特征信息的卷积神经网络(CNN)在花卉图像的识别上取得的结果不够理想。在Swin Transformer (Swin-T)网络的基础上,提出了一种轻量型的Transformer网络LWFormer。首先,该网络将基于移动窗口的PoolFormer模块引入Swin-T网络的第一、二阶段,对网络进行轻量化。其次,引入了双通道注意力机制,2个独立的通道分别关注了特征图的“位置”和“内容”,提高网络提取全局特征信息的能力。最后,使用了对比损失函数,进一步优化了网络的性能。在Oxford 102 Flower Dataset和104 Flowers Garden of Eden这2个公开的数据集上对改进的模型进行评估,并与其他方法进行对比,在这2个数据集上,分别得到了88.1%与87.3%的准确率。与Swin-T网络相比,该网络参数量降低了33.45%,FLOPs降低了28.89%,throughtput提高了91.45%,准确率提高了1.8%。实验结果表明,该网络在提升了准确率的同时降低了参数量,得到了速度与精度地提升。

关键词: 花卉识别, 轻量化, 注意力机制, 双通道注意力, 对比损失函数

Abstract:

Due to the similarity between different kinds of flowers and the dissimilarity within the same kind of flowers, the results of convolutional neural network (CNN) that extracts local feature information in flower image recognition are not ideal. Based on the Swin Transformer (Swin-T) network, this paper proposed a lightweight Transformer network LWFormer. Firstly, the network introduced the mobile window-based PoolFormer module into the first and second stages of the Swin-T network to lightweight the network. Secondly, a dual-channel attention mechanism was introduced, in which two independent channels focused on the “location” and “content” of the feature map, respectively, to improve the network′s ability to extract global feature information. Finally, a contrastive loss function was employed to further optimize the performance of the network. The enhanced model was evaluated on two public datasets, Oxford 102 Flower Dataset and 104 Flowers Garden of Eden, and compared with other methods. On these two datasets, the accuracy rates were 88.1% and 87.3%, respectively. Compared with the Swin-T network, the network parameters were reduced by 33.45%, FLOPs was reduced by 28.89%, throughput was increased by 91.45%, and accuracy was increased by 1.8%. Experimental results showed that the proposed network could improve the accuracy while reducing the number of parameters, thus enhancing the speed and accuracy.

Key words: flower recognition, lightweight, attention mechanism, dual-channel attention, contrastive loss function

中图分类号: