Feature ranking for Persian Review Spam detection

Document Type : Persian Original Article

Authors

1 Department of Computer، Safahan Institute of Higher Educations، Janbazan St.، Isfahan، Iran

2 Department of computer, Faculty of engineering, Shahrekord university, Sharekord, Iran

3 Computer Department، Shahrekord University، Rahbar Blvd.، Shahrekord، Iran

Abstract

Using online reviews is one of the main factors in customers’ decision making for buying a product or using a service. These reviews are valuable sources of information which can be used for detecting public opinion about products or services. Although online reviews are useful, trusting them blindly is dangerous for both costumers and sellers as they may be manipulated to earn profit; such reviews are called spam reviews. The current study addresses Persian reviews about cell-phone extracted from Digikala.com and investigates spam type 1 and type 2 which are fake reviews and reviews describing brands’ names only, respectively. Features used in this study, due to their efficiency, are review-based and metadata features. These features and their combinations in detecting Persian spam reviews, also their effect on the accuracy of classifier are assessed. Spam classification is performed using decision tree, support vector machines, and naïve Bayes classifiers and their accuracy are compared using different features’ combinations. The highest accuracy is obtained using the decision tree classifier which achieves 0.778 in terms of F-measure. In ranking features, again the decision tree outperforms the other two classifiers by achieving 0.824 F-measure by combining the positive feedback, overall score, and review polarity features.

Keywords


[1]          B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions, First. Cambridge University Press, 2015.
[2]          I. Chaturvedi, E. Cambria, R. E. Welsch, and F. Herrera, “Distinguishing between facts and opinions for sentiment analysis: Survey and challenges,” Information Fusion, vol. 44, pp. 65–77, Nov. 2018.
[3]          M. E. Basiri, N. Ghasem-Aghaee, and A. Reza, “Lexicon-based Sentiment Analysis in Persian,” Current and Future Developments in Artificial Intelligence, p. 154, 2017.
[4]          M. E. Basiri, A. R. Naghsh-Nilchi, and N. Ghasem-Aghaee, “Sentiment prediction based on dempster-shafer theory of evidence,” Mathematical Problems in Engineering, 2014.
[5]          A. Heydari, M. ali Tavakoli, N. Salim, and Z. Heydari, “Detection of review spam: A survey,” Expert Systems with Applications, vol. 42, no. 7, pp. 3634–3642, May 2015.
[6]          R. Ghai, S. Kumar, and A. C. Pandey, “Spam Detection Using Rating and Review Processing Method,” in Smart Innovations in Communication and Computational Sciences, pp. 189–198, 2019.
[7]          G. Fei, H. Li, and B. Liu, “Opinion Spam Detection in Social Networks,” in Sentiment Analysis in Social Networks, pp. 141–156, 2017.
[8]          S. Dixit and A. J. Agrawal, “Survey on review spam detection,” Int J Comput Commun Technol ISSN (PRINT), vol. 4, pp. 975–7449, 2013.
[9]          N. Jindal and B. Liu, “Opinion spam and analysis,” in Proceedings of the international conference on Web search and web data mining - WSDM ’08, p. 219, 2008.
[10]        Y. Ren and D. Ji, “Neural networks for deceptive opinion spam detection: An empirical study,” Information Sciences, vol. 385–386, pp. 213–224, Apr. 2017.
[11]        S. Shojaee, A. Azman, M. Murad, N. Sharef, and N. Sulaiman, “A framework for fake review annotation,” in Proceedings of the 2015 17th UKSIM-AMSS International Conference on Modelling and Simulation, pp. 153–158, 2015.
[12]        A. Mukherjee, V. Venkataraman, B. Liu, and N. S. Glance, “What yelp fake review filter might be doing?,” in ICWSM, pp. 409–418, 2013.
[13]        N. Jindal and B. Liu, “Review spam detection,” in Proceedings of the 16th international conference on World Wide Web - WWW ’07, p. 1189, 2007.
[14]        F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to identify review spam,” in IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 3, p. 2488, 2011.
[15]        M. Ott, Y. Choi, C. Cardie, and J. T. Hancock, “Finding Deceptive Opinion Spam by Any Stretch of the Imagination,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pp. 309–319, 2011.
[16]        H. Faris et al., “An Intelligent System for Spam Detection and Identification of the most Relevant Features based on Evolutionary Random Weight Networks,” Information Fusion, Aug. 2018.
[17]        R. Y. K. Lau, S. Y. Liao, R. C.-W. Kwok, K. Xu, Y. Xia, and Y. Li, “Text mining and probabilistic language modeling for online review spam detection,” ACM Transactions on Management Information Systems, vol. 2, no. 4, pp. 1–30, Dec. 2011.
[18]        J. Li, M. Ott, C. Cardie, and E. Hovy, “Towards a general rule for identifying deceptive opinion spam,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1566–1576, 2014.
[19]        A. Karami and B. Zhou, “Online review spam detection by new linguistic features,” iConference 2015 Proceedings, 2015.
[20]        E. F. Cardoso, R. M. Silva, and T. A. Almeida, “Towards automatic filtering of fake reviews,” Neurocomputing, vol. 309, pp. 106–116, Oct. 2018.
[21]        Y. Lin, T. Zhu, X. Wang, J. Zhang, and A. Zhou, “Towards online review spam detection,” in Proceedings of the 23rd International Conference on World Wide Web - WWW ’14 Companion, pp. 341–342, 2014.
[22]        Y. Lin, T. Zhu, H. Wu, J. Zhang, X. Wang, and A. Zhou, “Towards online anti-opinion spam: Spotting fake reviews from the review sequence,” in Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 261–264, 2014.
[23]        W. Chang, Z. Xu, S. Zhou, and W. Cao, “Research on detection methods based on Doc2vec abnormal comments,” Future Generation Computer Systems, vol. 86, pp. 656–662, Sep. 2018.
[24]        D. Hernández, R. Guzmán, M. M. y Gomez, and P. Rosso, “Using PU-learning to detect deceptive opinion spam,” in Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp. 38–45, 2013.
[25]        S. Shehnepoor, M. Salehi, R. Farahbakhsh, and N. Crespi, “NetSpam: A Network-Based Spam Detection Framework for Reviews in Online Social Media,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 7, pp. 1585–1595, Jul. 2017.
[26]        J. K. Rout, A. Dalmia, K.-K. R. Choo, S. Bakshi, and S. K. Jena, “Revisiting Semi-Supervised Learning for Online Deceptive Review Detection,” IEEE Access, vol. 5, pp. 1319–1327, 2017.
[27]        “Digikala,” 2017. [Online]. Available: http://www.digikala.com. [Accessed: 15-Feb-2017].
[28]        D. Savage, X. Zhang, X. Yu, P. Chou, and Q. Wang, “Detection of opinion spam based on anomalous rating deviation,” Expert Systems with Applications, vol. 42, no. 22, pp. 8650–8657, Dec. 2015.
[29]        I. Dematis, E. Karapistoli, and A. Vakali, “Fake Review Detection via Exploitation of Spam Indicators and Reviewer Behavior Characteristics,” in nternational Conference on Current Trends in Theory and Practice of Informatics, pp. 581–595, 2018.
[30]        E. Asgarian, R. Saeedi, A. Stiri, B. Bahmadi, and H. Ghaemi, “NLPTools.” [Online]. Available: https://wtlab.um.ac.ir. [Accessed: 01-Jul-2016].
[31]        M. E. Basiri, A. R. Naghsh-Nilchi, and N. Ghassem-Aghaee, “A framework for sentiment analysis in persian,” Open Transactions on Information Processing, vol. 1, no. 3, pp. 1–14, 2014.
[32]        M. E. Basiri and A. Kabiri, “Sentence-Level Sentiment Analysis in Persian,” in The 3rd International Conference on Pattern Recognition and Image Processing, no. 1, pp. 84–89, 2017.
[33]        M. E. Basiri and A. Kabiri, “Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 17, no. 4, p. 26, 2018.
[34]        Y. Liu and B. Pang, “A unified framework for detecting author spamicity by modeling review deviation,” Expert Systems with Applications, vol. 112, pp. 148–155, Dec. 2018.
[35]        L. Li, B. Qin, W. Ren, and T. Liu, “Document representation and feature combination for deceptive spam review detection,” Neurocomputing, vol. 254, pp. 33–41, Sep. 2017.