Welcome to Journal of Graphics share: 

Journal of Graphics ›› 2022, Vol. 43 ›› Issue (5): 865-874.DOI: 10.11996/JG.j.2095-302X.2022050865

• Image Processing and Computer Vision • Previous Articles     Next Articles

Automatic segmentation algorithm for text lines of Dongba hieroglyphs document image  

  

  1. 1. Sports Department, Suzhou Vocational University, Suzhou Jiangsu 215000, China;  2. School of Computer Engineering, Suzhou Vocational University, Suzhou Jiangsu 215000, China
  • Online:2022-10-31 Published:2022-10-28
  • Supported by:
    Suzhou Vocational University Introduced Talents Scientific Research Start-up Fund Project (201905000034) 

Abstract:

Deep learning technologies represented by convolutional neural networks (CNN) have shown excellent performance in the field of image classification and recognition. However, since there is no standard and public dataset for Dongba hieroglyphs, we cannot draw on or use the existing deep learning algorithms. In order to establish an authoritative and effective Dongba hieroglyphs dataset, the current primary task is to analyze the layout structure of the published Dongba classic documents, and extract the text lines and Dongba hieroglyphs in the documents. Therefore, based on the structural features of Dongba hieroglyphic document images, an automatic text-line segmentation algorithm was proposed for Dongba document images. The algorithm first employed the d-k-means clustering algorithm to determine the classification quantity and classification standard of text lines; then, the wrong results in the segmentation were corrected through the secondary processing of the text blocks, so as to enhance the accuracy of the algorithm. While making full use of the structural features of Dongba characters, the algorithm retained such advantages of the machine-learning model  as objectivity and immunity to subjective experience. Experiments show that the algorithm can be used for the text line segmentation of Dongba document images, offline handwritten Chinese characters, Dongba scriptures, and the segmentation of individual Dongba and Chinese characters in text lines. It is simple in implementation, high in accuracy, and strong in adaptability, thus laying the foundation for the establishment of the Dongba character library. 

Key words: Dongba hieroglyph, Dongba documents analysis, text line segmentation, projection segmentation, d-K-means 

CLC Number: