حفظ حریم خصوصی در رده‏ بندی داده‏ های توزیع شده عمودی

نوع مقاله : مقاله پژوهشی فارسی

نویسنده

گروه کامپیوتر، واحد میانه، دانشگاه آزاد اسلامی، میانه، ایران

چکیده

امروزه، حفظ حریم خصوصی داده‏ های منتشر شده، از اهمیت زیادی برخوردار است. الگوریتم‏های مختلفی برای حفظ حریم خصوصی داده ‏ها ابداع شده است. مهمترین مشکلات این الگوریتم‏ها، غیر قابل اثبات بودن حریم خصوصی آنها، اهمیت کم در نظر گرفته شده برای دانش پیش‏ زمینه فرد متخاصم و عدم کاهش ابعاد داده‏ های اولیه است. در این مقاله، برای اثبات حریم خصوصی داده‏ ها از مفهوم حریم خصوصی تفاضلی در انتشار داده‏ های توریع شده عمودی و رده‏ بندی آنها استفاده شده است. در حریم خصوصی تفاضلی، دانش پیش‏ زمینه فرد متخاصم در مورد داده‏ های منتشر شده، اهمیت ندارد. همچنین، با استفاده از تبدیل موجک هار، ابعاد داده ‏ها کاهش داده شده است. در نهایت، حریم خصوصی داده‏ های منتشر شده به صورت ریاضی اثبات شده است، همچنین، دقت داده‏ ها، با استفاده از الگوریتم رده‏ بند «K-نزدیکترین همسایه» نیز اندازه‏ گیری شده است و به صورت ریاضی اثبات شده است که الگوریتم پیشنهادی، خدشه ‏هایی با انحراف معیار کمتری را نسبت به الگوریتم مورد مقایسه، به داده‏ ها اضافه می‏کند و در نتیجه دقت رده‏ بندی بالاتری دارد. نتایج به دست آمده نشان دهنده امن‏ تر بودن الگوریتم ارائه شده نسبت به الگوریتم‏های مورد مقایسه است.

کلیدواژه‌ها


  • C. Aggarwal and P.S Yu, “Privacy-Preserving Data Mining: Models and Algorithms” Springer-Verlag, New York, 2008.
  • C.M. Fung, K. Wang, R. Chen and P.S. Yu, “Privacy-Preserving Data Publishing: A Survey on Recent Developments,” ACM Computing Surveys 42(4), Article 14, 2010.
  • Agawam, R. Srikant, “Privacy-Preserving Data Mining,” Proceedings of ACM SIGMOD Conference, pp. 439-450, 2000.
  • Sweeney “k-Anonymity: A Model for Protecting Privacy,” Int. J. Unc. Fuzz. Knowl. Based Syst., 10 (5), pp. 557-570, 2002.
  • Samarati, L. Sweeney, “Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Suppression,” IEEE Symp. on Security and Privacy, pp. 188-199, 1998.
  • J. Bayardo, R. Agrawal, “Data Privacy through Optimal K-Anonymization,” Proceedings of ICDE Conference, pp. 217-228, 2005.
  • Machanavajjhala, J. Gehrke, and D. Kifer, “L-diversity: Privacy beyond k-anonymity,” IEEE ICDE Conference, pp. 24, 2006.
  • Gu, M. Li, L. Xiong and Y. Cao, "Providing Input-Discriminative Protection for Local Differential Privacy," 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 505-516, 2020.
  • Dwork, “Differential Privacy,“ Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP’06), pp. 1–12, 2006.
  • C.M. Fung, K. Wang, A.W.C Fu and P.S. Yu, “Introduction to Privacy-Preserving Data Publishing Concepts and Techniques,”  Taylor & Francis Group, 2011.
  • Dwork, F. McSherry, K. Nissim and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in TCC, pp. 265–284 2006.
  • Kenthapadi, A. Korolova, I. Mironov and N. Mishra,”Privacy via the Johnson-Lindenstrauss Transform,” CoRR, abs/1204.2606, 2012.
  • Xiaokui, W. Guozhang, G.Johannes, “Differential Privacy via Wavelet Transforms,” IEEE Transactions on Knowledge and Data Engineering 23(8), pp. 1200-1214, 2011.
  • G. Divanis, V.S. Verykios,” Association Rule Hiding for Data Mining,” Springer, 2010.
  • Cui, W.K. Wong, and D.W. Cheung, “Privacy-Preserving Clustering with High Accuracy and Low Time Complexity,” In Proceedings of the 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009, Brisbane, Australia, 2009.
  • R.M. Oliveira, and O.R. Za¨iane, “ Privacy preserving clustering by data transformation,” Proc. of the 18th Brazilian Symposium on Databases, Manaus, Amazonas, Brazil, pp. 304–318, 2003.
  • Du and Z. Zhan, “Using randomized response techniques for privacy-preserving data mining,” In Proc. of the 9th International Conf. on Knowledge Discovery and Data Mining (KDD’03), pp. 505–510, 2003.
  • Huang, W. Du, and B. Chen, “Deriving Private Information from Randomized Data,” In Proceedings of the ACM SIGMOD Conference on Management of Data, Baltimore, Maryland, USA, pp. 37-48, 2005.
  • Vaidya and C. Clifton, “Privacy-Preserving K-means Clustering over Vertically Partitioned Data,” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206 – 215, 2003.
  • B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz Maps into a Hilbert Space,” Contemporary Mathematics 26, 189–206, 1984.
  • Bingham and H. Mannila, “Random Projection In Dimensionality Reduction: Applications To Image And Text Data,” In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250, 2001.
  • Z. Fern, and C.E. Brodley, “Random projection for high dimensional data clustering: a cluster ensemble approach,” In Proc. of the 20th international conf. on machine learning, Washington DC, USA, pp. 102-110, 2003.
  • R.M. Oliveira and O.R. Za¨ıane, “A Privacy-Preserving Clustering Approach Toward Secure and Effective Data Analysis for Business Collaboration,” Computers & Security 26(1), 81-93, 2007.
  • Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The SuLQ Framework,” In: Proceedings of the 2005 SIGMOD/PODS Conference, pp. 128–138, 2005.
  • C. Dwork, and N. Nissim, “Privacy-Preserving Data mining on Vertically Partitioned Databases,” In: Proceedings of the 24th Annual International Cryptology Conference Advances in Cryptology (CRYPTO’04), Santa Barbara, California, USA, pp. 528-544, 2004.
  • Nissim, S. Raskhodnikova, and A. Smith, “Smooth Sensitivity and Sampling in Private Data Analysis,” In: Proceedings of the 39th ACM Symposium on the Theory of Computing (STOC’07), pp. 75-84, 2007.
  • McSherry, “Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis,” In: Proceedings of the 2009 SIGMOD/PODS Conference, pp. 19-30, 2009.
  • Feldman, A. Fiat, H. Kaplan, and K. Nissim, “Private Coresets.” In: Proceedings of the 41st Symposium on Theory of Computing (STOC), pp. 361–370, 2009.
  • Xu, X. Cheng, S. Su, et al., Differentially private frequent sequence mining, IEEE Trans. Knowl. Data Eng. 28 (11), 2016.
  • Zhou, S. Qin, H. Zhou, et al., A differential privacy noise dynamic allocation algorithm for big multimedia data, Multimedia Tools Appl. (8), pp. 1-19, 2018.
  • Shen, Z. Lu, A new lower bound of privacy budget for distributed differential privacy, in: proceedings of International Conference on IEEE Parallel and Distributed Computing, Applications and Technologies. pp. 25–32, 2018.
  • Wang, Y. Zhang, X. X. Lu, Z. Wang, Real-time and spatiotemporal crowd-sourced social network data publishing with differential privacy, IEEE Trans. Dependable Secure Comput. 15 (4), 2018.
  • Ou, Z. Qin, S. Liao, T. Li, and D. Zhang, “Singular spectrum analysis for local differential privacy of classifications in the smart grid,” IEEE Internet of Things Journal, 2020.
  • Sun, X., Xu, R., Wu, L. et al. A differentially private distributed data mining scheme with high efficiency for edge computing. J Cloud Comp 10, 7, 2021.
  • Mukherjee, et al, “A privacy preserving technique for distance-based classification with worst case privacy guarantees,” Data & Knowledge Engineering, 66, pp. 264–288, 2008.
  • Fan, J. He, M. Guo, P. Li, Z. Han, R. Wang, “Privacy preserving classification on local differential privacy in data centers,” Journal of Parallel and Distributed Computing, Vol. 135, pp. 70-82, 2020.
  • Li, M. Sun, “Privacy-Preserving Classification of Personal Data with Fully Homomorphic Encryption: An Application to High-Quality Ionospheric Data Prediction,” International Conference on Machine Learning for Cyber Security, pp. 437-446, 2020.
  • Li, J. Li, Z. Huang, CZ. Gao, W.B. Chen, K. Chen, “Privacy-preserving outsourced classification in cloud computing,” Cluster Computing, Vol. 21, pp. 277–286, 2018.
  • R.E. Dishabi, M.A. Azgomi, "Differential privacy preserving clustering in distributed datasets using Haar wavelet transform," Intelligent Data Analysis, vol. 19, no. 6, pp. 1323-1353, 2015.
  • Canetti, U. Feige, O. Goldreich, and M. Naor, “Adaptively Secure Multi-Party Computation,” Proceedings of the 28th Annual Symposium on Theory of Computing (STOC’96), ACM Press, pp. 639–648, 1996.
  • S. Burrus, R.A. Gopinath, and H. Guo, “Introduction to Wavelets and Wavelet Transforms,” A Primer Prentice Hall, 1997.
  • Zhang, T.B. Ho, Y. Zhang and M.S. Lin, “Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform,” Journal Informatica 30(3), pp. 305-319, 2006.
  • “Machine Learning Repository,” URL: http://archive.ics.uci.edu/ml, Visited: 2020-05-20.
  • F. Vysochanskij, and Y.I. Petunin, “Justification of the Three Sigma Rule for Unimodal Distributions,” Theory of Probability and Mathematical Statistics 21, 25–36, 1980.