基于文本与视觉信息的细粒度图像分类

doi:10.11996/JG.j.2095-302X.2019030503

图学学报

• 专论：第18届中国虚拟现实大会暨第8届国际虚拟现实与可视化会议(ChinaVR & ICVRV 2018 青岛 ) • 上一篇下一篇

基于文本与视觉信息的细粒度图像分类

(1. 北京恒华伟业科技股份有限公司，北京 100011；
2. 华北电力大学控制与计算机工程学院，北京 102206)

出版日期:2019-06-30 发布日期:2019-08-02
基金资助:
北京市科技计划课题(Z171100001217006)

Fine-Grained Image Classification Based on Text and Visual Information

(1. Beijing Forever Technology Co. Ltd, Beijing 100011, China;
2. School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China)

Online:2019-06-30 Published:2019-08-02

摘要/Abstract

摘要： 一般细粒度图像分类只关注图像局部视觉信息，但在一些问题中图像局部的文本信息对图像分类结果有直接帮助，通过提取图像文本语义信息可以进一步提升图像细分类效果。我们综合考虑了图像视觉信息与图像局部文本信息，提出一个端到端的分类模型来解决细粒度图像分类问题。一方面使用深度卷积神经网络获取图像视觉特征，另一方面依据提出的端到端文本识别网络，提取图像的文本信息，再通过相关性计算模块合并视觉特征与文本特征，送入分类网络。最终在公共数据集 Con-Text 上测试该方法在图像细分类中的结果，同时也在 SVT 数据集上验证端到端文本识别网络的能力，均较之前方法获得更好的效果。

关键词: 计算机视觉, 细粒度图像分类, 场景文本识别, 卷积神经网络, 注意力机制

Abstract: The fine-grained image classification generally only focuses on the partial visual information of image, but in some problems the text information of partial image has a direct relationship with the classification result. By extracting the semantic information of the image text, the image classification effect can be further improved. We comprehensively consider the visual information and local text information of image, and then propose an end-to-end classification model to solve the problem of fine-grained image classification. On the one hand, the deep convolutional neural network is used to obtain the visual features of the image, on the other hand, according to the proposed end-to-end text recognition network, the text information of the image is extracted, and then the visual feature and the text feature are merged by the correlation calculation module and sent to the classification network. Finally, we test the results of our method in the image classification on the public dataset Con-Text, and also verify the end-to-end text recognition network on the SVT dataset, which is better than the previous method.

Key words: computer vision, fine-grained image classification, scene text recognition, convolution neural network, attention mechanism

袁建平 1，陈晓龙 1，陈显龙 1，何恩杰 1，张加其 2，高宇豆 2 . 基于文本与视觉信息的细粒度图像分类[J]. 图学学报, DOI: 10.11996/JG.j.2095-302X.2019030503.

YUAN Jian-ping1, CHEN Xiao-long1, CHEN Xian-long1, HE En-jie1, ZHANG Jia-qi2, GAO Yu-dou2 . Fine-Grained Image Classification Based on Text and Visual Information[J]. Journal of Graphics, DOI: 10.11996/JG.j.2095-302X.2019030503.

[1]	张盾, 黄志开, 王欢, 吴义鹏, 王颖, 邹家豪. 基于多尺度特征实现超参进化的野生菌分类研究与应用[J]. 图学学报, 2022, 43(4): 580-589.
[2]	梁振宇, 华嘉皓, 陈浩龙, 邓逸川. 基于计算机视觉的建筑施工期临时结构损伤识别方法 [J]. 图学学报, 2022, 43(4): 608-615.
[3]	贺琪, 李汶龙, 宋巍, 杜艳玲, 黄冬梅, 耿立佳 . 结合残差时空注意力机制的海面温度预测算法[J]. 图学学报, 2022, 43(4): 677-684.
[4]	熊琛, 陈立斌, 李林泽, 许镇, 赵杨平. 基于计算机视觉与 BIM 的裂缝可视化管理方法[J]. 图学学报, 2022, 43(4): 721-728.
[5]	方洪波, 万广, 陈忠辉, 黄以卫, 张文勇, 谢本亮. 基于改进 YOLOv5s 的离线手写数学符号识别[J]. 图学学报, 2022, 43(3): 387-395.
[6]	白静, 孟庆亮, 徐昊, 范有福, 杨瞻源. ST-Rec3D：基于结构和目标感知的三维重建[J]. 图学学报, 2022, 43(3): 469-477.
[7]	李扬科, 宋全博, 周元峰. 用于手势识别的时空融合网络以及虚拟签名系统[J]. 图学学报, 2022, 43(3): 504-512.
[8]	高铭, 张荷花, 张庭瑞, 张轩铭. 基于深度学习的公共建筑像素施工图空间识别[J]. 图学学报, 2022, 43(2): 189-196.
[9]	廖志伟, 金兢, 张超凡, 杨学志. 基于分层压缩激励的 ASPP 网络单目深度估计[J]. 图学学报, 2022, 43(2): 214-222.
[10]	张明, 张芳慧, 宗佳平, 宋治, 岑翼刚, 张琳娜. 基于轻量级网络的人脸检测及嵌入式实现[J]. 图学学报, 2022, 43(2): 239-246.
[11]	苏常保, 龚世才. 基于深度学习的人物肖像全自动抠图算法[J]. 图学学报, 2022, 43(2): 247-253.
[12]	李翠云, 白静, 郑凉. 融合边缘增强注意力机制和 U-Net 网络的医学图像分割[J]. 图学学报, 2022, 43(2): 273-278.
[13]	段锐, 邓晖, 邓逸川. ICT 支持的塔吊安全管理框架—— 回顾与展望[J]. 图学学报, 2022, 43(1): 11-20.
[14]	何国忠, 梁宇. 基于卷积神经网络的 PCB 缺陷检测[J]. 图学学报, 2022, 43(1): 21-27.
[15]	史彩娟, 陈厚儒, 葛录录, 王子雯. 注意力残差多尺度特征增强的显著性实例分割[J]. 图学学报, 2021, 42(6): 883-890.

基于文本与视觉信息的细粒度图像分类

Fine-Grained Image Classification Based on Text and Visual Information

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价