Is favicon only in the Chinese Wikipedia top 100? It seems so, and is odd if the problem is a web browser bug.
John Vandenberg. sent from Galaxy Note On Dec 28, 2012 4:07 PM, "Johan Gunnarsson" <johan.gunnars...@gmail.com> wrote: > On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg <jay...@gmail.com> wrote: > > Hi Johan, > > > > Thank you for the lovely data at > > > > https://toolserver.org/~johang/2012.html > > > > I posted that link to my facebook (below if you want to join in > > there), and a few language specific facebook groups, and there have > > been some concerns raised about the results, which I'll list below. > > > > These lists are getting some traction in the press so it would be good > > to understand it better. > > > > http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed > > Cool, cool. > > > > > Why is [[zh:Favicon]] #2? > > > > The data doesnt appear to support that > > > > http://stats.grok.se/zh/201201/Favicon > > http://stats.grok.se/zh/latest90/Favicon > > My post-processing filtering follows redirects to find the "true" > title. In this case the page Favicon.ico redirects to Favicon. This is > probably due to broken browsers trying to load the icon. > > > > > Number 1 in French is a plant native to asia. The stats for December > disagree > > https://en.wikipedia.org/wiki/Ilex_crenata > > http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9 > > French's Ilex_crenata redirects to Houx_crénelé. > > Ilex_crenata had huge traffic in April: > http://stats.grok.se/fr/201204/Ilex_crenata > > There are a bunch of spikes like this. I can't really explain it. I > talked to Domas Mituzas (the maintainer of the original dumps I use) > yesterday and he suggested it might be bots going crazy for whatever > reason. I'd love to filter all these false positives, but haven't been > able to come up with an easy way to do it. > > Might be possible with access to logs with the user-agent string, but > that would probably inflate the dataset size even more. It's already > past the terabyte. However that could probably be solved by sampling > (for example) 1/100 of the entries. > > Comments and ideas are welcome! > > > > > Number 1 in German is Cul de sac. This is odd, but matches the stats > > http://stats.grok.se/de/201207/Sackgasse > > RIght. This one is funny. It has huge traffic on weekdays only. > Deserted on weekends. > > > > > Number 1 in Dutch is a Chinese mountain. The stats for December disagree > > http://stats.grok.se/nl/201212/Hua_Shan > > July/August agree: http://stats.grok.se/nl/201208/Hua_Shan > > > > > Number 4 in Hebrew is zipper. The stats for December disagree > > http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F > > April agrees: > http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F > > > > > Number 2 in Spanish is '@'. This is odd, but matches the stats > > http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29 > > > > -- > > John Vandenberg > > https://www.facebook.com/johnmark.vandenberg >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l