حفظ حریم خصوصی در کلاس‌بندی Naïve Bayes با استفاده از رمزنگاری رشته بیت‌ها

نوع مقاله : مقاله پژوهشی فارسی

نویسندگان

1 دانشکده مهندسی کامپیوتر، دانشگاه تربیت دبیر شهید رجائی، تهران، ایران

2 Shahid Rajaee Teacher Training University

3 دانشگاه تربیت دبیر شهید رجایی

چکیده

ساخت مدل‌های کلاس‌بندی به طور گسترده‌ای در داده‌کاوی مورد استفاده قرار می‌گیرد. از آنجا که برای ساخت مدل‌ها نیاز به جمع‌آوری داده است، نگرانی‌هائی در زمینه‌ی حریم خصوصی مالکین داده‌ها وجود دارد. در این مقاله یک طرح ساخت مدل کلاس‌بندی Naïve Bayes ارائه شده است که با مشارکت مالکین داده‌ها و بدون نیاز به جمع‌آوری اصل داده‌ها، عملیات ساخت مدل را انجام می‌دهد. این طرح به جای جمع‌آوری داده‌ها، با استفاده از رمزنگاری رشته بیت‌های حاصل از شمارش‌ و بدون افشای داده‌ها، فرآیند ساخت مدل Naïve Bayes را انجام می‌دهد. این طرح بدون نیاز به اعتماد به شخص سوم[i]  با حداقل تعداد اجرای عملیات رمزنگاری، امکان ساخت مدل را با کارایی مناسب فراهم می‌کند به­طوری­که از نظر پیچیدگی زمانی تا ۸۷٪ بهبود در هزینه‌ی زمانی مشاهده می‌شود و حافظه‌ی مصرفی نیز افزایش چندانی نسبت به طرح‌های دارای عملیات رمزنگاری نداشته است.

کلیدواژه‌ها


  1. [1] Ricardo Mendes, João P. Vilela, “Privacy-Preserving Data Mining: Methods, Metrics, and Applications”, IEEE Access, Volume 5, Pages 10562 - 10582, June 2017.
  2. [2] Yousra Abdul Alsahib S.Aldeen, Mazleena Salleh and Mohammad Abdur Razzaque, “A comprehensive review on privacy preserving data mining”, SpringerPlus, Volume 4, Pages 1-36, November 2015.
  3. [3] Ke Wang, Philip S. Yu, Sourav Chakraborty, “Bottom-up generalization: a data mining solution to privacy protection”, Fourth IEEE International Conference on Data Mining (ICDM'04), 2004.
  4. [4] Benjamin C. M. Fung, Ke Wang, Philip S. Yu, “Top-down specialization for information and privacy preservation”, 21st IEEE International Conference on Data Engineering (ICDE'05), 2005.
  5. [5] Alexander Wood, Vladimir Shpilrain, Kayvan Najarian, Delaram Kahrobaei, “Private naive bayes classification of personal biomedical data: application in cancer data analysis”, Computers in Biology and Medicine, Volume 105, Pages 144-150, February 2019.
  6. [6] Kai Xing, Chunqiang Hu, Jiguo Yu, Xiuzhen Cheng, Fengjuan Zhang, “Mutual Privacy Preserving k-Means Clustering in Social Participatory Sensing IEEE Transactions on Industrial Informatics, Volume 13, Pages 2066-2076, August 2017.
  7. [7] Xiaoxia Liu, Hui Zhu, Rongxing Lu, Hui Li, “Efficient privacy-preserving online medical primary diagnosis scheme on naive bayesian classification”, Peer-to-Peer Networking and Applications, Volume 11, Pages 334-347, March 2018.
  8. [8] R Bost, Raluca Ada Popa, Stephen Tu, Shafi Goldwasser, “Machine learning classification over encrypted data”, Network and Distributed System Security Symposium (NDSS), 2015.
  9. [9] Alexey Gribov, Delaram Kahrobaei, Vladimir Shpilrain, “Private-keyfully, homomorphic encryption in rings”, Groups Complexity Cryptology, Volume 10, Pages 17-27, March 2018.
  10. Tong Li, Jin Li, Zheli Liu, Ping Li , Chunfu Jia, “Differentially private Naive Bayes learning over multiple data sources”, Information Sciences, Volume 444, Pages 89-104, May 2018.
  11. Cynthia Dwork, Aaron Roth, “The algorithmic foundations of differential privacy”. Foundations and Trends® in Theoretical Computer Science, Volume 9, Pages 211-407, August 2014.
  12. Ping Li, Jin Li, Zhengan Huang, Chong-Zhi Gao, Wen-Bin Chen, Kai Chen, “Privacy-preserving outsourced classification in cloud computing”, Cluster Computing, Volume 21, Pages 277-286, March 2018.
  13. Harmanjeet Kaur, Neeraj Kumar, Shalini Batra, “ClaMPP: a cloud-based multi-party privacy preserving classification scheme for distributed applications”, The Journal of Supercomputing, Volume 75, Pages 3046-3075, June 2019.
  14. Radhika Kotecha, Sanjay Garg, “Preserving output-privacy in data stream classification”, Progress in Artificial Intelligence, Volume 6, Pages 87-104, June 2017.
  15. Pierangela Samarati, Latanya Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression”, IEEE Symposium on Research in Security and Privacy, Pages 188–206, 1998.
  16. Pedro Domingos, Geoff Hulten, “Mining high-speed data streams”, Proceedings of 6th ACM International Conference on Knowledge Discovery and Data Mining, August 2000.
  17. Raphael Bost, Raluca Ada Popa, Stephen Tu, Shafi Goldwasser, “Machine learning classification over encrypted data”, 22nd Annual Network and Distributed System Security Symposium (NDSS), 2015.
  18. Chong-zhi Gao, Qiong Cheng, Pei He, Willy Susilo, Jin Li, “Privacy-preserving Naive Bayes classifiers secure against the substitution-then-comparison attack”, Information Sciences, Volume 444, Pages 72-88, May 2018.
  19. Yi Liu, Yu Luo, Youwen Zhu, Yang Liu, Xingxin Li, “Secure multi-label data classification in cloud by additionally homomorphic encryption”, Information Sciences, Volume 468, Pages 89-102, November 2018.
  20. Min-Ling Zhang, Zhi-Hua Zhou, “A k-nearest neighbor based algorithm for multi-label classification”, IEEE International Conference on Granular Computing, 2005.
  21. Yiran Shen, Chengwen Luo, Dan Yin, Hongkai Wen, Rus Daniela, Wen Hu, “Privacy-preserving sparse representation classification in cloud-enabled mobile applications”, Computer Networks, Volume 133, Pages 59-72, March 2018.
  22. Kai Schramm, Gregor Leander, Patrick Felke, Christof Paar, “A collision-attack on aes”, Cryptographic Hardware and Embedded Systems-CHES, Springer, Volume 3156, Pages 163–175, 2004.
  23. Kai Xing, Chunqiang Hu, Jiguo Yu, Xiuzhen Cheng, Fengjuan Zhang, “Mutual privacy preserving k-means clustering in social participatory sensing”, IEEE Transactions on Industrial Informatics, Volume 13, Pages 2066-2076, Aug 2017.
  24. Qingchen Zhang, Hua Zhong, Laurence T. Yang, Zhikui Chen, Fanyu Bu, “PPHOCFS: privacy preserving high-order CFS algorithm on the cloud for clustering multimedia data”, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 12, Pages 66-80, November 2016.
  25. Xiaoqian Liu, Qianmu Li, Tao Li, Dong Chen, “Differentially private classification with decision tree ensemble”, Applied Soft Computing, Volume 62, Pages 807-816, January 2018.
  26. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining: Concepts and Techniques, Third Edition”, Elsevier, June 2011.
  27. Wikipedia, the free encyclopedia, “Homomorphic encryption”, September 2021, url https://en.wikipedia.org/wiki/Homomorphic_encryption.
  28. National Institute of Diabetes and Digestive and Kidney Diseases, “ PIMA Indians diabetes dataset”, License: CC: Public Domain, 1990, url https://www.kaggle.com/uciml/pima-indians-diabetes-database
  29. Nilsel Ilter, H Altay Guvenir, “Dermatology dataset”, License: Open Data Commons, 1998, url https://archive.ics.uci.edu/ml/datasets/dermatology
  30. A. Fisher, Michael Marshall, “Iris dataset”, License: Open Data Commons, 1988, url https://archive.ics.uci.edu/ml/datasets/Iris
  31. Python-paillier, “A Python 3 library for Partially Homomorphic Encryption using the Paillier crypto system”, License: GPL v3, 2021, url https://python-paillier.readthedocs.io
  32. A.P. Chamikara, Peter Bertok, Dongxi Liu, Seyit Camtepe, Ibrahim Khalil, “Efficient privacy preservation of big data for accurate data mining”, Information Sciences, Volume 527, Pages 420-443, July 2020.
  33. Duy-Hien Vu, “Privacy-Preserving Naive Bayes Classification in Semi-Fully Distributed Data Model”, Computers & Security, Volume 115, Article 102630, April 2022.
  34. Ngoc Hong Tran, Nhien-AnLe-Khac, M-Tahar Kechadi, “Lightweight privacy-Preserving data classification”, Computers & Security, Volume 97, Article 101835, April 2020.
  35. Jing Wang, Libing Wu, Sherali Zeadally, Muhammad Khurram Khan, Debiao He, “Privacy-preserving Data Aggregation against Malicious Data Mining Attack for IoT-enabled Smart Grid”, ACM Transactions on Sensor Networks, Volume 17, Pages 313-338, August 2021.
  36.