An improved link-based method for spam detection in Persian web graph

Author

Department of Electrical and Computer, Yazd University, Yazd, Iran.

Abstract

Today using the internet has spread wildly, and increasing number of web pages leads to importance of using search engines, therefore some people try to misguide search engines to have more customers and benefit. They increase the rank of their pages by some illegal ways. search engines to. Identify of this kind of web pages can improve search engines and attract confidence to user.
According to importance of finding spam pages, the research is presented a new linke-based way to detect spam pages in Persian web graph. This way, first link farms detectes. Finally, the negative scores of spam pages propagate in whole of web graph.
This way was implemented on data of Parsijoo search engine and the result of data analyses indicates 21.2% improvement in p@n factor.

Keywords


[1]          M. Luckner, M. Gad and P. Sobkowiak, "Stable web spam detection using features based on lexical items", Computers & Security, vol. 46, pp. 79–93, 2014.
[2]       A.M. ZarehBidoki, M.A. Golshani, and E. Mousakazemi-Mohammadi ", Design and Implementation of Persian document crawling/ranking system and Implementation of a Persian Search Engine", Itre,Tehran, Iran, 2012.(in persian)
[3]       G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen, "Exploiting the hierarchical structure for link analysis", Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval,  2005.
[4]           L. Page, S. Brin, R. Motwani, and T. Winograd, "The pagerank citation algorithm: bringing order to the web," Technical Report, Standford Univ.,1998.
[5]       B. Wu and B. D. Davison, "Identifying link farm spam pages", Special interest tracks and posters of the 14th international conference on World Wide Web,pp. 820-829, 2005.
 
 
شکل 9: مقایسه نتایج قبل از اعمال الگوریتم شناسایی صفحات فریب­آمیز و پس از اعمال الگوریتم شناسایی صفحات فریب­آمیز
[6]       L. Becchetti, C. Castillo, D. Donato, S. Leonardi and R. Baeza-Yate,",  Link-Based Characterization and Detection of Web Spam",  Proceeding of the 6th International Workshop
                on Adversarial Information Retrival on the Web (AIRWEB), 2006.
[7]       Z. Gyongyi, H. Garcia-Molina and J. Peddersen, "Combating web spam with trustrank", Proceedings of the Thirtieth international conference on Very large data bases volume 30.VLDB Endowment, Torento, Canada, pp. 576-587, 2004.
[8]       V. Krishnan, R. Raj, "Web spam detection with anti-TrustRank" , Proceeding of the 2nd International Workshop
                on Adversarial Information Retrival on the Web (AIRWEB),pp. 37-40, 2006.
[9]            www.parsijoo.ir