On Fri, Dec 28, 2012 at 10:24 AM, John Vandenberg <jay...@gmail.com> wrote:
> Is favicon only in the Chinese Wikipedia top 100? > > It seems so, and is odd if the problem is a web browser bug. > > John Vandenberg. > sent from Galaxy Note > On Dec 28, 2012 4:07 PM, "Johan Gunnarsson" <johan.gunnars...@gmail.com> > wrote: > >> On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg <jay...@gmail.com> >> wrote: >> > Hi Johan, >> > >> > Thank you for the lovely data at >> > >> > https://toolserver.org/~johang/2012.html >> > >> > I posted that link to my facebook (below if you want to join in >> > there), and a few language specific facebook groups, and there have >> > been some concerns raised about the results, which I'll list below. >> > >> > These lists are getting some traction in the press so it would be good >> > to understand it better. >> > >> > http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed >> >> Cool, cool. >> >> >> > >> > Why is [[zh:Favicon]] #2? >> > >> > The data doesnt appear to support that >> > >> > http://stats.grok.se/zh/201201/Favicon >> > http://stats.grok.se/zh/latest90/Favicon >> >> My post-processing filtering follows redirects to find the "true" >> title. In this case the page Favicon.ico redirects to Favicon. This is >> probably due to broken browsers trying to load the icon. >> >> >> > >> > Number 1 in French is a plant native to asia. The stats for December >> disagree >> > https://en.wikipedia.org/wiki/Ilex_crenata >> > http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9 >> >> French's Ilex_crenata redirects to Houx_crénelé. >> >> Ilex_crenata had huge traffic in April: >> http://stats.grok.se/fr/201204/Ilex_crenata >> >> There are a bunch of spikes like this. I can't really explain it. I >> talked to Domas Mituzas (the maintainer of the original dumps I use) >> yesterday and he suggested it might be bots going crazy for whatever >> reason. I'd love to filter all these false positives, but haven't been >> able to come up with an easy way to do it. >> >> Might be possible with access to logs with the user-agent string, but >> that would probably inflate the dataset size even more. It's already >> past the terabyte. However that could probably be solved by sampling >> (for example) 1/100 of the entries. >> >> Comments and ideas are welcome! >> >> >> > >> > Number 1 in German is Cul de sac. This is odd, but matches the stats >> > http://stats.grok.se/de/201207/Sackgasse >> >> RIght. This one is funny. It has huge traffic on weekdays only. >> Deserted on weekends. > > This has been noted on the dewiki village pump before. The most interesting guess there<https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#Sackgasse_als_Top_Artikel_.3F.21>(by Benutzer:YMS): There might be a web filtering software installed on workplace PCs in companies which redirects all prohibited URLs to the German Wikipedia on cul-de-sac. This would explain the weekly pattern, and also http://stats.grok.se/de/201112/Sackgasse (December 25-26 are holidays in Germany, and many employees take the rest of the year off). > >> >> > >> > Number 1 in Dutch is a Chinese mountain. The stats for December >> disagree >> > http://stats.grok.se/nl/201212/Hua_Shan >> >> July/August agree: http://stats.grok.se/nl/201208/Hua_Shan >> >> >> > >> > Number 4 in Hebrew is zipper. The stats for December disagree >> > http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F >> >> April agrees: >> http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F >> >> >> > >> > Number 2 in Spanish is '@'. This is odd, but matches the stats >> > http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29 >> > >> > -- >> > John Vandenberg >> > https://www.facebook.com/johnmark.vandenberg >> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > -- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l