欢迎访问《图学学报》 分享到:

图学学报

• 专论:第18届中国虚拟现实大会暨第8届国际虚拟现实与可视化会议(ChinaVR & ICVRV 2018 青岛 ) • 上一篇    下一篇

基于文本与视觉信息的细粒度图像分类

  

  1. (1. 北京恒华伟业科技股份有限公司,北京 100011; 
    2. 华北电力大学控制与计算机工程学院,北京 102206)
  • 出版日期:2019-06-30 发布日期:2019-08-02
  • 基金资助:
    北京市科技计划课题(Z171100001217006)

Fine-Grained Image Classification Based on Text and Visual Information

  1. (1. Beijing Forever Technology Co. Ltd, Beijing 100011, China;
    2. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China)
  • Online:2019-06-30 Published:2019-08-02

摘要: 一般细粒度图像分类只关注图像局部视觉信息,但在一些问题中图像局部的文本 信息对图像分类结果有直接帮助,通过提取图像文本语义信息可以进一步提升图像细分类效果。 我们综合考虑了图像视觉信息与图像局部文本信息,提出一个端到端的分类模型来解决细粒度 图像分类问题。一方面使用深度卷积神经网络获取图像视觉特征,另一方面依据提出的端到端 文本识别网络,提取图像的文本信息,再通过相关性计算模块合并视觉特征与文本特征,送入 分类网络。最终在公共数据集 Con-Text 上测试该方法在图像细分类中的结果,同时也在 SVT 数据集上验证端到端文本识别网络的能力,均较之前方法获得更好的效果。

关键词: 计算机视觉, 细粒度图像分类, 场景文本识别, 卷积神经网络, 注意力机制

Abstract: The fine-grained image classification generally only focuses on the partial visual information of image, but in some problems the text information of partial image has a direct relationship with the classification result. By extracting the semantic information of the image text, the image classification effect can be further improved. We comprehensively consider the visual information and local text information of image, and then propose an end-to-end classification model to solve the problem of fine-grained image classification. On the one hand, the deep convolutional neural network is used to obtain the visual features of the image, on the other hand, according to the proposed end-to-end text recognition network, the text information of the image is extracted, and then the visual feature and the text feature are merged by the correlation calculation module and sent to the classification network. Finally, we test the results of our method in the image classification on the public dataset Con-Text, and also verify the end-to-end text recognition network on the SVT dataset, which is better than the previous method.

Key words:  computer vision, fine-grained image classification, scene text recognition, convolution neural network, attention mechanism