Privacy Preserving Classification in Vertically Distributed Datasets

Document Type : Persian Original Article

Author

Department of Computer Engineering, Miyaneh Branch, Islamic Azad University, Miyaneh, Iran.

Abstract

Nowadays, preserving the privacy of data is an important issue during data mining techniques. several algorithms have been proposed to preserve the privacy of data. The most important problems with these algorithms are their unprovable privacy, the low importance of considering the adversary’s background knowledge and lack of dimensionality reduction process over the original data. In this paper, differential privacy mechanism has been used to prove the privacy of vertical distributed data to be used for classification. In differential privacy, it is no matter what background knowledge has an adversary about the published data. Also, Haar wavelet transform has been used for dimension reduction of the original data.The privacy of data has been proved mathematically and the accuracy of data measured using the K-NN algorithm. Finally, it has been mathematically proved that the proposed algorithm adds noise with less standard deviation to the data than the compared algorithm, resulting in higher classification accuracy. The result shows that our algorithm has more secure compared to previous algorithms.

Keywords


  • C. Aggarwal and P.S Yu, “Privacy-Preserving Data Mining: Models and Algorithms” Springer-Verlag, New York, 2008.
  • C.M. Fung, K. Wang, R. Chen and P.S. Yu, “Privacy-Preserving Data Publishing: A Survey on Recent Developments,” ACM Computing Surveys 42(4), Article 14, 2010.
  • Agawam, R. Srikant, “Privacy-Preserving Data Mining,” Proceedings of ACM SIGMOD Conference, pp. 439-450, 2000.
  • Sweeney “k-Anonymity: A Model for Protecting Privacy,” Int. J. Unc. Fuzz. Knowl. Based Syst., 10 (5), pp. 557-570, 2002.
  • Samarati, L. Sweeney, “Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Suppression,” IEEE Symp. on Security and Privacy, pp. 188-199, 1998.
  • J. Bayardo, R. Agrawal, “Data Privacy through Optimal K-Anonymization,” Proceedings of ICDE Conference, pp. 217-228, 2005.
  • Machanavajjhala, J. Gehrke, and D. Kifer, “L-diversity: Privacy beyond k-anonymity,” IEEE ICDE Conference, pp. 24, 2006.
  • Gu, M. Li, L. Xiong and Y. Cao, "Providing Input-Discriminative Protection for Local Differential Privacy," 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 505-516, 2020.
  • Dwork, “Differential Privacy,“ Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP’06), pp. 1–12, 2006.
  • C.M. Fung, K. Wang, A.W.C Fu and P.S. Yu, “Introduction to Privacy-Preserving Data Publishing Concepts and Techniques,”  Taylor & Francis Group, 2011.
  • Dwork, F. McSherry, K. Nissim and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in TCC, pp. 265–284 2006.
  • Kenthapadi, A. Korolova, I. Mironov and N. Mishra,”Privacy via the Johnson-Lindenstrauss Transform,” CoRR, abs/1204.2606, 2012.
  • Xiaokui, W. Guozhang, G.Johannes, “Differential Privacy via Wavelet Transforms,” IEEE Transactions on Knowledge and Data Engineering 23(8), pp. 1200-1214, 2011.
  • G. Divanis, V.S. Verykios,” Association Rule Hiding for Data Mining,” Springer, 2010.
  • Cui, W.K. Wong, and D.W. Cheung, “Privacy-Preserving Clustering with High Accuracy and Low Time Complexity,” In Proceedings of the 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009, Brisbane, Australia, 2009.
  • R.M. Oliveira, and O.R. Za¨iane, “ Privacy preserving clustering by data transformation,” Proc. of the 18th Brazilian Symposium on Databases, Manaus, Amazonas, Brazil, pp. 304–318, 2003.
  • Du and Z. Zhan, “Using randomized response techniques for privacy-preserving data mining,” In Proc. of the 9th International Conf. on Knowledge Discovery and Data Mining (KDD’03), pp. 505–510, 2003.
  • Huang, W. Du, and B. Chen, “Deriving Private Information from Randomized Data,” In Proceedings of the ACM SIGMOD Conference on Management of Data, Baltimore, Maryland, USA, pp. 37-48, 2005.
  • Vaidya and C. Clifton, “Privacy-Preserving K-means Clustering over Vertically Partitioned Data,” In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206 – 215, 2003.
  • B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz Maps into a Hilbert Space,” Contemporary Mathematics 26, 189–206, 1984.
  • Bingham and H. Mannila, “Random Projection In Dimensionality Reduction: Applications To Image And Text Data,” In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250, 2001.
  • Z. Fern, and C.E. Brodley, “Random projection for high dimensional data clustering: a cluster ensemble approach,” In Proc. of the 20th international conf. on machine learning, Washington DC, USA, pp. 102-110, 2003.
  • R.M. Oliveira and O.R. Za¨ıane, “A Privacy-Preserving Clustering Approach Toward Secure and Effective Data Analysis for Business Collaboration,” Computers & Security 26(1), 81-93, 2007.
  • Blum, C. Dwork, F. McSherry, and K. Nissim, “Practical Privacy: The SuLQ Framework,” In: Proceedings of the 2005 SIGMOD/PODS Conference, pp. 128–138, 2005.
  • C. Dwork, and N. Nissim, “Privacy-Preserving Data mining on Vertically Partitioned Databases,” In: Proceedings of the 24th Annual International Cryptology Conference Advances in Cryptology (CRYPTO’04), Santa Barbara, California, USA, pp. 528-544, 2004.
  • Nissim, S. Raskhodnikova, and A. Smith, “Smooth Sensitivity and Sampling in Private Data Analysis,” In: Proceedings of the 39th ACM Symposium on the Theory of Computing (STOC’07), pp. 75-84, 2007.
  • McSherry, “Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis,” In: Proceedings of the 2009 SIGMOD/PODS Conference, pp. 19-30, 2009.
  • Feldman, A. Fiat, H. Kaplan, and K. Nissim, “Private Coresets.” In: Proceedings of the 41st Symposium on Theory of Computing (STOC), pp. 361–370, 2009.
  • Xu, X. Cheng, S. Su, et al., Differentially private frequent sequence mining, IEEE Trans. Knowl. Data Eng. 28 (11), 2016.
  • Zhou, S. Qin, H. Zhou, et al., A differential privacy noise dynamic allocation algorithm for big multimedia data, Multimedia Tools Appl. (8), pp. 1-19, 2018.
  • Shen, Z. Lu, A new lower bound of privacy budget for distributed differential privacy, in: proceedings of International Conference on IEEE Parallel and Distributed Computing, Applications and Technologies. pp. 25–32, 2018.
  • Wang, Y. Zhang, X. X. Lu, Z. Wang, Real-time and spatiotemporal crowd-sourced social network data publishing with differential privacy, IEEE Trans. Dependable Secure Comput. 15 (4), 2018.
  • Ou, Z. Qin, S. Liao, T. Li, and D. Zhang, “Singular spectrum analysis for local differential privacy of classifications in the smart grid,” IEEE Internet of Things Journal, 2020.
  • Sun, X., Xu, R., Wu, L. et al. A differentially private distributed data mining scheme with high efficiency for edge computing. J Cloud Comp 10, 7, 2021.
  • Mukherjee, et al, “A privacy preserving technique for distance-based classification with worst case privacy guarantees,” Data & Knowledge Engineering, 66, pp. 264–288, 2008.
  • Fan, J. He, M. Guo, P. Li, Z. Han, R. Wang, “Privacy preserving classification on local differential privacy in data centers,” Journal of Parallel and Distributed Computing, Vol. 135, pp. 70-82, 2020.
  • Li, M. Sun, “Privacy-Preserving Classification of Personal Data with Fully Homomorphic Encryption: An Application to High-Quality Ionospheric Data Prediction,” International Conference on Machine Learning for Cyber Security, pp. 437-446, 2020.
  • Li, J. Li, Z. Huang, CZ. Gao, W.B. Chen, K. Chen, “Privacy-preserving outsourced classification in cloud computing,” Cluster Computing, Vol. 21, pp. 277–286, 2018.
  • R.E. Dishabi, M.A. Azgomi, "Differential privacy preserving clustering in distributed datasets using Haar wavelet transform," Intelligent Data Analysis, vol. 19, no. 6, pp. 1323-1353, 2015.
  • Canetti, U. Feige, O. Goldreich, and M. Naor, “Adaptively Secure Multi-Party Computation,” Proceedings of the 28th Annual Symposium on Theory of Computing (STOC’96), ACM Press, pp. 639–648, 1996.
  • S. Burrus, R.A. Gopinath, and H. Guo, “Introduction to Wavelets and Wavelet Transforms,” A Primer Prentice Hall, 1997.
  • Zhang, T.B. Ho, Y. Zhang and M.S. Lin, “Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform,” Journal Informatica 30(3), pp. 305-319, 2006.
  • “Machine Learning Repository,” URL: http://archive.ics.uci.edu/ml, Visited: 2020-05-20.
  • F. Vysochanskij, and Y.I. Petunin, “Justification of the Three Sigma Rule for Unimodal Distributions,” Theory of Probability and Mathematical Statistics 21, 25–36, 1980.