欢迎访问《图学学报》 分享到:

图学学报

• 视觉与图像 • 上一篇    下一篇

中英文混排扭曲文本图像快速校正方法

  

  • 出版日期:2015-12-31 发布日期:2016-01-15

A Fast Correcting Method for Warped Chinese and English#br# Mixed Document Images

  • Online:2015-12-31 Published:2016-01-15

摘要: :针对OCR 在识别文本图像时,由于扭曲造成的中英文混排文本图像识别率不理想
的情况,提出一种快速扭曲校正方法。图像经过预处理后,首先利用形态学膨胀定位文本行,
得到各文本行上下边界;分别对每个文本行参考垂直投影信息进行文字切分,获得字符包围盒;
然后根据中英文的不同特点在每个文本行中逐个对字符位置进行校正,最终实现图像重构。实
验结果表明,该方法校正速度快、精度高,对于中英文混排扭曲文档图像有较好地校正效果,
校正后图像OCR 识别率有明显提高。

关键词: 中英文混排, 扭曲文档图像, 文本行提取, 字符切分

Abstract: Character recognition rate of OCR processing is not well for warped Chinese and English
document image. To resolve this problem, a fast distortion correcting method is proposed in this paper.
After the process of image preprocessing, the upper and lower boundary of each text line could be
obtained by morphological dilation method. Then, the characters in each line are segmented one by
one based on the vertical projection information. Every character can be described in a minimum
bounding box. After that, the positions of the segmented characters are corrected according to the
different structure characteristics between Chinese and English in each line. Finally, the image could
be reconstructed. Experiments showed that this correction method could rectify the warped Chinese
and English document image quickly and effectively. The OCR rate of the corrected images could be
significantly improved.

Key words: mixture of chinese and english, warped document images, text line extraction, character
segmentation