Search Space Reduction for Farsi Printed Subwords Recognition by Simple Features, Feature Quantization and Fusion of Classifiers

Document Type : Persian Original Article

Authors

Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran

Abstract

Abstract- In this paper, a method is presented for search space reduction in Farsi Printed Sub words recognition. First 10 simple features from sub word are extracted. By using the concept of quantization, These features are quantized according to the interval changes of each feature in training data, and are converted to integers. A score is given to every class, using each feature and its distance to corresponding feature of each training sample. By applying all features, each class has a score per feature. A final score is obtained, by fusion of these scores using algebra operations, for each class. Search space is reduced using sorting of final scores and selection of some sub words with more scores. For fusion of scores, sum, prod, max, min and weighted sum operations are used. The weighted sum method, which Optimized weights are obtained by particle swarm optimization (PSO), has given the best response.

Keywords


S.A.A. Abbaszadeh Arani and E. Kabir and R. Ebrahimpour. “Combining right-to-left and left-to-right HMMs to recognize handwritten Farsi words of small- and medium-sized vocabularies” IET Computer Vision, Vol. 12, Issue 6. 2018
N. Aouadi Aouadi and A.K. Echi. “Word extraction and recognition in arabic. handwritten Text” International Journal of Computing and Information Sciences, Vol. 12, No. 01, 2016.
M. Shafii. “Optical character recognition of printed persian/arabic documents”, Ph.D. dissertation, Windsor Univ., Ontario, Canada, 2014.
S. Nasrollah and A. Ebrahimi. “Printed persian subword recognition using wavelet packet descriptors”, Journal of Engineering (Hindawi Publishing Corporation), 2013.
P.K. Powalka and N. Sherkat and R.J. Whitrow. “The use of word shape information for cursive script recognition” In Fourth International Workshop on Frontiers of Handwriting Recognition, pp. 67-76. 1994.
S. Mozaffari and K. Faez and V. Märgner and H. Elabed.  “Two-stage lexicon reduction for offline arabic handwritten word recognition” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 22, No. 07: pp. 1323-1341, November 2008.
H. Davoudi and M. Cheriet and E. Kabir. “lexicon reduction of handwritten arabic subwords based on the prominent shape regions” International Journal on Document Analysis and Recognition (IJDAR), Vol. 19, Issue 2, pp. 139–153, 2016.
سمیه برومند، ایرانپور مبارکه، مجید، "بازشناسی کلمات دست‌نوشته با ویژگی‌های نوین و کاهش فرهنگ لغت"، ﻣﺠﻠﻪ ﭘﺮدازش ﺑﯿﻨﺎﯾﯽ و ﺗﺼﻮﯾﺮ، آماده چاپ، 1396.
فائقه فتحی، "استخراج حروف شاخص از زیرکلمات چاپی فارسی"، پایان‌نامه کارشناسی ارشد، دانشگاه صنعتی سهند، تبریز، ایران، 1388.
H. Davoudi and E. Kabir. “Using compatible shape descriptor for lexicon reduction of printed farsi subwordsjournal” International Journal on Document Analysis and Recognition (IJDAR), Vol. 19, Issue 2. pp. 139-153, 2016.
افشین ابراهیمی، احسان الله کبیر "یک روش دو مرحله‌ای برای بازشناسی زیرکلمات چاپی"، نشریه مهندسی برق و مهندسی کامپیوتر ایران، سال 2، شماره 2، 1383.
هما داودی، احسان الله کبیر "استفاده از مناطق شاخص زیرکلمات چاپی فارسی برای کاهش فضای جستجو در بازشناسی آنها"، نشریه ‏مهندسی برق و مهندسی کامپیوتر ایران، ب –مهندسی کامپیوتر، سال 12، شماره1، 1393.‏
اسماعیل میری، سیدمحمد رضوی، ناصر مهرشاد، "روشی ساده برای کاهش فضای جستجو در بازشناسی زیرکلمات تایپی فارسی "، نهمین کنفرانس ماشین بینایی و پردازش تصویر ایران، دانشگاه شهید بهشتی، آبان ماه 1394.
E. Miri and S.M. Razavi and N. Mehrshad. “Recognition of the persian typed sub-words with a hierarchical manner” Journal of Engineering and Applied Sciences, 12 (8): 2009-2017, 2017.
A. Ebrahimi and E. Kabir. “A pictorial dictionary for printed farsi subwords” Pattern Recognition Letters, Vol. 29, pp. 656-663, 2008.
J. Kennedy and R Eberhart. “Particle swarm optimization”. In Fourth IEEE International Conference on Neural Networks, pp. 1942–1948, 1995. doi:10.1109/ICNN.1995.488968