بهبود عملکرد الگوریتم KNN با استفاده از الگوریتم فرا ابتکاری PSO

نوع مقاله : مقاله پژوهشی فارسی

نویسندگان

1 گروه مدیریت صنعتی، دانشکده اقتصاد، مدیریت و حسابداری، دانشگاه یزد

2 یزد، بلوار دانش، دانشگاه یزد، دانشکده اقتصاد، مدیریت و حسابداری ، کد پستی: 89158

3 بلوار دانش، دانشگاه یزد، دانشکده اقتصاد، مدیریت و حسابداری

4 یزد، بلوار دانش، دانشگاه یزد، دانشکده اقتصاد، مدیریت و حسابداری

چکیده

الگوریتم KNN یکی از مهم‌ترین الگوریتم‌های نا پارامتری است و جزء روش‌های اثربخش دسته‌بندی محسوب می‌شود. سازوکار این الگوریتم برای تعیین دسته نمونه جدید، مبتنی بر محاسبه فاصله نمونه جدید تا سایر نمونه‌هاست. زمانی که پایگاه داده شامل صفات غیر عددی (رتبه‌ای و اسمی) باشد، نحوه محاسبه فاصله می‌تواند بر کارآیی الگوریتم اثرگذار باشد. در این مقاله روشی برای محاسبه فاصله ارائه‌شده است که می‌تواند کارآیی الگوریتم KNN را بهبود دهد. ایده ارائه‌شده در این پژوهش مبتنی بر محاسبه فاصله پویاست. منظور از فاصله پویا، فاصله‌ای است که بین هر دو مقدار از یک صفت غیر عددی تعریف می‌شود و به ماهیت مسئله بستگی دارد. نحوه تعیین این فاصله پویا در قالب یک مسئله بهینه‌سازی بیان‌شده است که در درون ساختار الگوریتم KNN تعبیه‌شده و با استفاده از الگوریتم بهینه‌سازی انبوه ذرات حل می‌شود. برای آزمایش کارآیی الگوریتم پیشنهادی از مجموعه داده‌های UCI استفاده‌شده است. نتایج نشان می‌دهد میزان بهبود صحت حداقل %3.6 و حداکثر %32.7 است.

کلیدواژه‌ها


[1] X. V. Wu, "The top ten algorithm in data mining," International Standard Book, vol. 13, pp. 978-1, 2009.
[2] E. J. J. L. Fix, "Discriminatory analysis-nonparametric discrimination: consistency properties," California Univ Berkeley, Texas, 1951.
[3] T. P. Cover, "Nearest neighbor pattern classification," IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967.
[4] A. N. Y. Papadopoulos, Nearest Neighbor Search: A Database Perspective, New York: Springer Science & Business Media, 2006.
[5] D. T. Larose, Discovering knowledge in data: an introduction to data mining, John Wiley & Sons, 2014.
[6] K. G. Q. L. J. Z. J. X. W. M. Zheng, "Applications of support vector machine and improved k-Nearest Neighbor algorithm in fault diagnosis and fault degree evaluation of gas insulated switchgear," in 2017 1st International Conference on Electrical Materials and Power Equipment, Xian, PEOPLES R CHINA, 2017.
[7] M. P.-N. Steinbach, "kNN: k-nearest neighbors," in The top ten algorithms in data mining, Chapman and Hall/CRC, 2009, pp. 165-176..
[8] Y. Lin, J. Li, M. Lin and J. Chen, "A new nearest neighbor classifier via fusing neighborhood information," Neurocomputing, pp. 164-169, 2014.
[9] S. V. V. Boriah, "Similarity measures for categorical data: A comparative evaluation," in Proceedings of the 2008 SIAM international conference on data mining, 2008.
[10] Z. H. Šulc, "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, vol. 36, no. 1, pp. 58-72, 2019.
[11] S. D. Z. Z. Luo, "Non-Numerical Nearest Neighbor Classifiers with Value-Object Hierarchical Embedding," Expert Systems with Applications, vol. 150, p. 113206, 2020.
[12] A. H. V. Desai, "Disc: Data-intensive similarity measure for categorical data," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Berlin, Heidelberg, 2011.
[13] L. Y. G. J. Chen, "Kernel-based linear classification on categorical data," Soft Computing, vol. 20, no. 8, pp. 2981-2993, 2016.
[14] D. Z. Z. Luo, "Non-Numerical Nearest Neighbor Classifiers with Value-Object Hierarchical Embedding," Expert Systems with Applications, p. 113206, 2020.
[15] L. G. Chen, "Nearest neighbor classification of categorical data by attributes weighting," Expert Systems with Applications, vol. 42, no. 6, pp. 3142-3149, 2015.
[16] J. P. G. C. Domeniconi, "Locally adaptive metric nearest-neighbor classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1281-1285, 2002.
[17] M. S. A. K. M. R. M. K. R. M. K. S. Huda and C. M. Rahman, "A dynamic k-nearest neighbor algorithm for pattern analysis problem," in 3rd International Conference on Electrical & Computer Engineering, 2004.
[18] J. Hocke and T. Martinetz, "Feature weighting by maximum distance minimization," in International Conference on Artificial Neural Networks, 2013.
[19] M. R. H. M. M. B. J. R. K. Hassan, "Improving k-nearest neighbour classification with distance functions based on receiver operating characteristics," in Machine Learning and Knowledge Discovery in Databases, Berlin, Heidelberg, 2008.
[20] K. U. E. B. O. S. Syaliman, "Improving the accuracy of k-nearest neighbor using local mean based and distance weight," in 2nd International Conference on Computing and Applied Informatics 2017, ICCAI 2017, Medan, INDONESIA, 2018.
[21] G. D. Tutz, "Improved nearest neighbor classifiers by weighting and selection of predictors," Statistics and Computing, vol. 26, no. 5, pp. 1039-1057, 2016.
[22] J. H. W. S. Y. H. Gou, "A generalized mean distance-based k-nearest neighbor classifier," Expert Systems with Applications, vol. 115, pp. 356-372, 2019.
[23] D. Delen, Real-world data mining: applied business analytics and decision making, Upper Saddle River, New Jersey: FT Press, 2014.
[24] A. Fargus, Optimisation of Correlation Matrix Memory Prognostic and Diagnostic Systems, University of York, 2015.
[25] P. Zerzucha and B. Walczak, "Concept of (dis) similarity in data analysis," TrAC Trends in Analytical Chemistry, vol. 38, pp. 116-128, 2012.
[26] J. Han, J. Pei and M. Kamber, Data mining: concepts and techniques, Elsevier, 2011
[27] T. J. T. R. J. J. H. Hastie, The elements of statistical learning: data mining, inference, and prediction, Second Edition ed., springer, 2011.
[28] R. J. Eberhart, "A new optimizer using particle swarm theory," in Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya,Japan, 1995.
[29] M. R. N. E. A. Imran, "An Overview of Particle Swarm Optimization Variants," Procedia Engineering, pp. 491-496, 2013.
[30] K. Kameyama, "Particle Swarm Optimization - A Survey," IEICE Transactions on Information and Systems, pp. 1354-1361, 2009.
[31] W. F. A. A.-S. M. A. Abd-El-Wahed, "Integrating particle swarm optimization with genetic algorithms for solving nonlinear optimization problems," Journal of Computational and Applied Mathematics, vol. 236, no. 5, pp. 1446-1453, 2011.
[32] A. S. K. H. C. Ratnaweera, "Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients,"IEEE Transactions on evolutionary computation, pp. 240-255, 2004.
[33] F. T. Provost, Data Science for Business: What you need to know about data mining and data-analytic thinking, United States of America: O’Reilly Media, Inc., 2013.
[34] N. L. A. S. Z. Z. N. E. A. Ghani, "Accuracy Assessment of Urban Growth Pattern Classification Methods Using Confusion Matrix and ROC Analysis," in International Conference on Soft Computing in Data Science, Singapore, 2015.