On Mon, Jul 12, 2021 at 05:01:35PM +0530, Apratim Ranjan Chakrabarty wrote: > ** Looking forward for suggestions and comments as to how to improve on it. > Also materials like research paper in this domain would be helpful **
Section IV-C of the ICLab paper has discussion of block page detection. The first pass is regex for known block pages, but there is also clustering by similar HTML structure and text. https://censorbib.nymity.ch/#Niaki2020a https://github.com/net4people/bbs/issues/52 The 2016 "Do You See What I See?" study seems to be in line with your project. "The second-class treatment of anonymous users ranges from outright rejection to ... imposing hurdles such as CAPTCHA-solving.... Our study draws upon ... scans of the home pages of top-1,000 Alexa websites through every Tor exit..." Section V-A has to do with scans of top-ranked sites. https://www.ndss-symposium.org/wp-content/uploads/2017/09/do-you-see-what-i-see-differential-treatment-anonymous-users.pdf https://archive.org/details/ndss16doyousee _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev