Intelligent detection of breast cancer with feature selection based on logistic regression and support vector machine Classification

Document Type : Persian Original Article

Authors

Department of Computer Engineering, Faculty of Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran

Abstract

Breast cancer is the most common cancer among women and the existence of a precise and reliable system for the diagnosis of benign or malignant of this cancer is essential. Nowadays, using the results of needle aspiration cytology, data mining and machine learning techniques, early diagnosis of breast cancer can be done with greater accuracy. In this study, we propose a method consisting of two steps: in the first step, to eliminate the less important features, logistic regression has been used to select more important features. In the second step, the Support Vector Machine (SVM) classification algorithm has been used with three different kernel functions for the diagnosis of benign and malignant samples. To evaluate the performance of the proposed method, two data sets, WBCD and WDBC have been used with investigation of several metrics such as precision, the Area Under the ROC (AUC), true positive rate, false positive rate, accuracy and the F-measure. The results show that using the logistic regression method, it is possible to select the more efficient features, such that the proposed method reaches 98.69% in terms of classification accuracy.

Keywords


R. Sheikhpour, R. Sheikhpour, “Breast cancer diagnosis using non-parametric kernel density estimation,” Razi Journal on Medical Sciences (RJMS), Iran University of Medical Sciences, Vol. 23, No. 144, pp. 30-40, 2016.
R. Sheikhpour, M. Agha Sarram, R. Sheikhpour, “Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer,” Applied Soft Computing. Vol. 40, pp. 113-131, 2016.
S. Aalaei, H. Shahraki, AR. Rowhanimanesh, S. Eslami, “Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets,” Iran Journal on Basic Medical Sciences, Vol. 19, No. 5, pp. 476-482, 2016.
G. RMA Sizilio, C. RM Leite, A. MG Guerreiro, A. DD Neto, “Fuzzy method for pre-diagnosis of breast cancer from the Fine Needle Aspirate analysis,” BioMedical Engineering Online 11, 83, 2012. doi.org/10.1186/1475-925X-11-83
M. Karabatak. “A new classifier for breast cancer detection based on Naïve Bayesian,” Measurement, Vol. 72, pp. 32-36, 2015.
H. Asri, H. Mousannif, H. Al Moatassime, T. Noel, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis,” Procedia Computer Science. Vol. 83, pp. 1064-1069, 2016.
M. Nilshahi, O. Ibrahim, H. Ahmadi, L.A. Shahmoradi, “Knowledge-Based System for Breast Cancer Classification Using Fuzzy Logic Method,” Telematics and Informatics, Vol. 34, No. 4, pp. 133–144, 2017.
B. M. Gayathri, C. P. Sumathi, “Mamdani Fuzzy Inference system for Breast cancer risk detection,” IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Madurai, India, Dec. 2015.
B. M. Gayathri, C. P. Sumathi, “An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer,” International Journal of Computer Application, Vol. 148, No. 6, pp. 16-21, 2016.
A. Hazra, S. Kumar Mandal, A. Gupta, “Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms,” International Journal of Computer Applications, Vol. 145, No. 2, pp. 39-45, 2016.
A. Bhardwaj, A. Tiwari, “Breast Cancer Diagnosis Using Genetically Optimized Neural Network Model,” Expert Systems with Applications, Vol. 42, No. 10, pp. 4611-4620, 2015.
N. Modi, K. Ghanchi, “Comparative Analysis of Feature Selection Methods and Associated Machine Learning Algorithms on Wisconsin Breast Cancer Dataset (WBCD),” Proceedings of International Conference on ICT for Sustainable Development, Advances in Intelligent Systems and Computing, Vol. 408, pp. 215-224, 2016.
L. Abdel-Ilah, L. Sahinbegoviü, “Using machine learning tool in classification of breast cancer,” International Conference in Medical and Biological Engineering in Bosnia and Herzegovina, IFMBE proceedings, Vol. 62, pp. 3-8, March 2017.
UCI Machine Learning Repository, Breast Cancer Wisconsin(Original)Dataset https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
H. Hannah Inbarani, M. Bagyamathi, A. T. Azar, “A novel hybrid feature selection method based on rough set and improved harmony search,” Neural Computing and Applications, Vol. 26, No. 8, pp. 1859-1880, 2015
F. Ahmad, N. A. Mat Isa, Z. Hussain, M. K. Osman, S. N. Sulaiman, “A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer,” Pattern Analysis and Applications, Vol. 18, No. 4, pp. 861-870, 2015.
E. Besharati, M. Naderan, E. Namjoo, “LR-HIDS: Logistic Regression Host-based Intrusion Detection System for Cloud Environments,” Journal of Ambient Intelligence and Humanized Computing, Vol. 10, No. 9, pp. 3669-3692, 2019.
A. Ahmadi, P. Afshar, “Intelligent breast cancer recognition using particle swarm optimization and support vector machines,” Journal of Experimental & Theoretical Artificial Intelligence, Vol. 28, No. 6, pp. 1021-1034, 2015.
L. Peng, W. Chen, W. Zhou, F. Li, J. Yang, J. Zhang, “An immune-inspired semi-supervised algorithm for breast cancer diagnosis,” Computer Methods and Programs in Biomedicine, Vol. 134, pp. 259-265, 2016.
A. Mert, N. Kılıç, N., Bilgili, E., Akan, A. “Breast Cancer Detection with Reduced Feature Set,” Computational and Mathematical Methods in Medicine, Vol. 2015, Article ID 265138.
Mert, A., Kılıc, N., Akan, A.An improved hybrid feature reduction for increased breast cancer diagnostic performance.Biomedical Engineering Letters. 2014; 4(3), 285–291.