Ensemble Recognition of Persian Typed Sub-word in limited Space Using Smart weighted voting

Document Type : Persian Original Article

Authors

1 Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran

2 Department of Electronics, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran

3 Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran

Abstract

In this paper, an ensemble method for recognition of Persian typed sub-words is proposed. First, the search space is limited to a very small number of sub-words using a few simple features. Then, by combining the six basic classifications with the weighted voting method, the sub-word is recognized. One of basic classifiers is the same as the search space limiter. Four of the basic classifiers use the nearest neighbor method with each feature of the loci, zoning, the number of the vertical cross between text and background and DCT, respectively. In another classifier, using the product of the normalized image of the input sub-word and the images of the reduced training sub-words, A degree of similarity is obtained for each training sub-words And with its help, the sub-word is recognized. The final sub-word is selected from the options obtained in a weighted voting process whose optimal weights are obtained by an intelligent algorithm. This method has been tested for lotus font and 98.34% recognition rate has been gained for this data.

Keywords


[1] T. Adamek, N. E. Connor, and A. F. Smeaton, "Word matching using single closed contours for indexing Handwritten Historical Documents," International Journal of Document Analysis and Recognition, vol. 9, no. 2-4, pp. 153-165, 2007.
[2] J. R. Pinales, R. J. Rivas, and M. J. C. Bleda, "Holistic Cursive Word Recognition Based On Perceptual Features," Pattern Recognition Letters, vol. 28, no. 13, pp. 1600-1609, 1 Oct. 2007.
[3] Amin, "Recognition Of Printed Arabic Text Based On Global Features And Decision Tree Learning Techniques," Pattern Recognition, vol. 33, no. 8, pp. 1309-1323, 2000.
[4] افشین ابراهیمی، "استفاده از شکل کلی زیرکلمات چاپی در بازیابی تصویر مستندات و بازشناسی متون فارسی"، رساله دکتری مهندسی برق- الکترونیک، دانشگاه تربیت مدرس، تهران، 1384.
[5] حسین خسروی، احسان الله کبیر، "ارزیابی روش‌های بازشناسی متون فارسی بر مبنای شکل کلی زیرکلمات"، نشریه مهندسی برق و کامپیوتر ایران، جلد 7، شماره4، صص. 280-267، 1388.
[6] S. Madhvanath, G. Kim, and V. Govindaraju, "Chain Code Contour Processing For Handwritten Word Recognition," IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 21, no. 9,pp. 928-932, Sep. 1999.
[7] K. Zagoris, K. Ergina, and N. Papamarkos, "A Document Image Retrieval System," Engineering Application of Artificial Intelligence, vol. 23, no. 6, pp. 872-879, 2010.
[8] S. Bai, L. Li, and C. L. Tan, "Keyword Spotting In Document Images Through Word Shape Coding," in Proc. 10th International Conference on Document Analysis and Recognition, ICDAR'09, pp. 331-335, 26-29 Jul. 2009.
[9] L. Li, S. Lu, and C. L. Tan, "A Fast Keyword-Spotting Technique," in Proc. 9th Int. Conference on Document Analysis and Recognition, ICDAR'07, pp.68-72, 23-26 Sep. 2007.
[10] S. Lu and C. L. Tan, "Document Image Retrieval Through Word Shape Coding," IEEE Transactions  on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1913-1918, Nov. 2008.
[11] J. A. Rodriguez-Serrano and F. Perronnin, "Handwritten Word Spotting Using Hidden Markov Models And Vocabularies", Pattern Recognition, vol. 42, no. 9, pp. 2106-2116, Sep. 2009.
[12] T. M. Rath and R. Manmatha, "Word Spotting For Historical Documents," International Jurnal on Document Analysis and Recognition, Vol. 9, no. 2-4, pp. 139-152, Apr. 2007.
[13] Y. Lu and C. L. Tan, "Information Retrieval In Document Image Databases," IEEE Transactions on nowledge and Data Engineering, Vol. 16, no. 11, pp. 1398-1410, Nov. 2004.
[14] اسماعیل میری، سید محمد رضوی، ناصر مهرشاد، "کاهش فضای جستجو برای بازشناسی زیرکلمات تایپی فارسی با استفاده از ویژگی-های ساده، کوانتیزاسیون ویژگی و ترکیب طبقه‌بندها"، مجله علمی پژوهشی رایانش نرم و فناوری اطلاعات دانشگاه صنعتی نوشیروانی بابل، جلد ۹، شماره ۲، صفحه۷۳-۶۱، ۱۳۹۹.
[15] Ebrahimi and E. Kabir, "A Pictorial Dictionary For Printed Farsi Sub Words," Pattern Recognition Letters, Vol. 29, no. 5, pp. 656-663, 2008.
[16] Rehman and T. Saba, "Off - Line Cursive Script Recognition: Current Advances, Comparisons And Remaining Problems," Artificial Intelligence Review, vol. 37, no. 4, pp. 261-288, 2012.
[17] S. G. Madhvanath and V. Govindaraju, "The Role Of Holistic Paradigms In Handwritten Word Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 149-164, Feb. 2001.
[18] L. M. Lorigo and V. Govindaraju, "Off - Line Arabic Handwriting Recognition: A Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 712-724, May 2008.
[19] Jija Dasgupta, Kallol Bhattacharya, Bhabatosh Chanda, "A holistic approach for Off-line handwritten cursive word recognition using directional feature based on Arnold transform", Pattern Recognition Letters, Vol 79,pp. 73-79, Aug. 2016
[20] M. Shafii, "Optical Character Recognition of Printed Persian/Arabic Documents", Ph.D. dissertation, Windsor Univ., Ontario, Canada, 2014.
[21] S. Nasrollah and A. Ebrahimi, "Printed persian subword Recognition Using Wavelet Packet Descriptors", Journal of Engineering (Hindawi Publishing Corporation), 2013
[22] M. S. Khorsheed and W. F, Clocksin, "Multi-Font Arabic Word Recognition Using Spectral Features", Proc. 16th Pattern Recognition Int. Conf., 2000.
[23] افشین ابراهیمی، احسان الله کبیر، "یک روش دو مرحله‌ای برای بازشناسی زیرکلمات چاپی"، نشریه مهندسی برق و مهندسی کامپیوتر ایران، سال 2، شماره 2، 1383.
[24] E. Miri, S.M. Razavi, N. Mehrshad, "Recognition Of The Persian Typed Sub-Words With A Hierarchical Manner," Journal of Engineering and Applied Sciences 12 (8): 2009-2017 ,2017.
[25] محمدعلی باقری، غلامعلی منتظر، احسان الله کبیر، "سیستم‌های دسته‌بند چندگانه: روش‌های طراحی و قواعد ترکیب شورا"، دو فصلنامه پردازش علائم و داده‌ها، شماره 2، پیاپی 16، سال 1390
[26] J. Kennedy and R. Eberhart, "Particle Swarm Optimization", 4th IEEE Neural Networks Int. Conf., 1995, pp. 1942–1948. doi:10.1109/ICNN.1995.488968.