نوع مقاله: مقاله پژوهشی انگلیسی
Department of Electrical Engineering, Faculty of Engineering, Golestan University, Gorgan, Iran.
Text line segmentation is an important stage of the optical character recognition (OCR) algorithms. To analyze and recognize a document, text lines have to be segmented accurately. Text line segmentation of handwritten documents is more difficult than that of machine-printed ones. Curved and multi-skewed text lines, overlapping text lines, and very small text lines are the main challenges. Most of the proposed approaches did not consider local features of text lines in a document image. In our proposed method, both global and local features are considered. The proposed method is based on using directional 2D anisotropic filters. The parameters of our method are tuned based on a main global parameter which is computed for each document, separately. Hence, the proposed method is a dataset-independent method. A document is divided into several blocks for which some local characteristics are calculated. In each block, text regions are detected by using local characteristics such as the block skew. In order to estimate the skew of text regions in a block, a novel text block skew estimation algorithm is proposed in this paper. Experimental results show that the proposed method outperforms all the state-of-the-art methods on three standard datasets. Our final F-Measure are 0.54%, 0.03%, and 0.02% greater than the winner of ICDAR2013 text line segmentation contests on ICDAR2013, ICDAR09, and HIT-MW datasets, respectively. The experiments proved that the proposed method can accurately segment text lines of complicated handwritings.