Improving the Performance of the k-Nearest Neighbors Algorithm with Utilization of the PSO Metaheuristic Algorithm

Document Type : Persian Original Article

Authors

1 PhD Student, Department of Industrial Management, Faculty of Economics, Management and Accounting, Yazd University, Yazd, Iran

2 Faculty of Economics, Management and Accounting, Yazd university,Yazd, Iran, Post Code: 89158_18411

3 Faculty of Economics, Management and Accounting, Yazd university,Yazd, Iran

Abstract

The k-nearest neighbor's algorithm (KNN) is one of the most widely used and useful nonparametric classification algorithms. The classification mechanism of this algorithm involves computing the distance between new instances and the instances whole classes are known. When the dataset contains non-numerical (ordinal and nominal) attributes, the performance of the algorithm can be significantly affected by how this distance is measured. In this paper, we attempt to improve the performance of the KNN algorithm by presenting a new solution for computing the distance of non-numerical traits. For this purpose, the Particle Swarm Optimization (PSO) algorithm is used. The task of this algorithm is to determine the best value of the distance between two states in a non-integer trait so that the accuracy of the KNN algorithm is increased. UCI University Learning Repository Data is used to test this idea. The results obtained from the proposed algorithm are compared with several other improved algorithms and show the useful improvement of this mechanism.

Keywords


[1] X. V. Wu, "The top ten algorithm in data mining," International Standard Book, vol. 13, pp. 978-1, 2009.
[2] E. J. J. L. Fix, "Discriminatory analysis-nonparametric discrimination: consistency properties," California Univ Berkeley, Texas, 1951.
[3] T. P. Cover, "Nearest neighbor pattern classification," IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967.
[4] A. N. Y. Papadopoulos, Nearest Neighbor Search: A Database Perspective, New York: Springer Science & Business Media, 2006.
[5] D. T. Larose, Discovering knowledge in data: an introduction to data mining, John Wiley & Sons, 2014.
[6] K. G. Q. L. J. Z. J. X. W. M. Zheng, "Applications of support vector machine and improved k-Nearest Neighbor algorithm in fault diagnosis and fault degree evaluation of gas insulated switchgear," in 2017 1st International Conference on Electrical Materials and Power Equipment, Xian, PEOPLES R CHINA, 2017.
[7] M. P.-N. Steinbach, "kNN: k-nearest neighbors," in The top ten algorithms in data mining, Chapman and Hall/CRC, 2009, pp. 165-176..
[8] Y. Lin, J. Li, M. Lin and J. Chen, "A new nearest neighbor classifier via fusing neighborhood information," Neurocomputing, pp. 164-169, 2014.
[9] S. V. V. Boriah, "Similarity measures for categorical data: A comparative evaluation," in Proceedings of the 2008 SIAM international conference on data mining, 2008.
[10] Z. H. Šulc, "Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering," Journal of Classification, vol. 36, no. 1, pp. 58-72, 2019.
[11] S. D. Z. Z. Luo, "Non-Numerical Nearest Neighbor Classifiers with Value-Object Hierarchical Embedding," Expert Systems with Applications, vol. 150, p. 113206, 2020.
[12] A. H. V. Desai, "Disc: Data-intensive similarity measure for categorical data," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Berlin, Heidelberg, 2011.
[13] L. Y. G. J. Chen, "Kernel-based linear classification on categorical data," Soft Computing, vol. 20, no. 8, pp. 2981-2993, 2016.
[14] D. Z. Z. Luo, "Non-Numerical Nearest Neighbor Classifiers with Value-Object Hierarchical Embedding," Expert Systems with Applications, p. 113206, 2020.
[15] L. G. Chen, "Nearest neighbor classification of categorical data by attributes weighting," Expert Systems with Applications, vol. 42, no. 6, pp. 3142-3149, 2015.
[16] J. P. G. C. Domeniconi, "Locally adaptive metric nearest-neighbor classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1281-1285, 2002.
[17] M. S. A. K. M. R. M. K. R. M. K. S. Huda and C. M. Rahman, "A dynamic k-nearest neighbor algorithm for pattern analysis problem," in 3rd International Conference on Electrical & Computer Engineering, 2004.
[18] J. Hocke and T. Martinetz, "Feature weighting by maximum distance minimization," in International Conference on Artificial Neural Networks, 2013.
[19] M. R. H. M. M. B. J. R. K. Hassan, "Improving k-nearest neighbour classification with distance functions based on receiver operating characteristics," in Machine Learning and Knowledge Discovery in Databases, Berlin, Heidelberg, 2008.
[20] K. U. E. B. O. S. Syaliman, "Improving the accuracy of k-nearest neighbor using local mean based and distance weight," in 2nd International Conference on Computing and Applied Informatics 2017, ICCAI 2017, Medan, INDONESIA, 2018.
[21] G. D. Tutz, "Improved nearest neighbor classifiers by weighting and selection of predictors," Statistics and Computing, vol. 26, no. 5, pp. 1039-1057, 2016.
[22] J. H. W. S. Y. H. Gou, "A generalized mean distance-based k-nearest neighbor classifier," Expert Systems with Applications, vol. 115, pp. 356-372, 2019.
[23] D. Delen, Real-world data mining: applied business analytics and decision making, Upper Saddle River, New Jersey: FT Press, 2014.
[24] A. Fargus, Optimisation of Correlation Matrix Memory Prognostic and Diagnostic Systems, University of York, 2015.
[25] P. Zerzucha and B. Walczak, "Concept of (dis) similarity in data analysis," TrAC Trends in Analytical Chemistry, vol. 38, pp. 116-128, 2012.
[26] J. Han, J. Pei and M. Kamber, Data mining: concepts and techniques, Elsevier, 2011
[27] T. J. T. R. J. J. H. Hastie, The elements of statistical learning: data mining, inference, and prediction, Second Edition ed., springer, 2011.
[28] R. J. Eberhart, "A new optimizer using particle swarm theory," in Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya,Japan, 1995.
[29] M. R. N. E. A. Imran, "An Overview of Particle Swarm Optimization Variants," Procedia Engineering, pp. 491-496, 2013.
[30] K. Kameyama, "Particle Swarm Optimization - A Survey," IEICE Transactions on Information and Systems, pp. 1354-1361, 2009.
[31] W. F. A. A.-S. M. A. Abd-El-Wahed, "Integrating particle swarm optimization with genetic algorithms for solving nonlinear optimization problems," Journal of Computational and Applied Mathematics, vol. 236, no. 5, pp. 1446-1453, 2011.
[32] A. S. K. H. C. Ratnaweera, "Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients,"IEEE Transactions on evolutionary computation, pp. 240-255, 2004.
[33] F. T. Provost, Data Science for Business: What you need to know about data mining and data-analytic thinking, United States of America: O’Reilly Media, Inc., 2013.
[34] N. L. A. S. Z. Z. N. E. A. Ghani, "Accuracy Assessment of Urban Growth Pattern Classification Methods Using Confusion Matrix and ROC Analysis," in International Conference on Soft Computing in Data Science, Singapore, 2015.