Is favicon only in the Chinese Wikipedia top 100?

It seems so, and is odd if the problem is a web browser bug.

John Vandenberg.
sent from Galaxy Note
On Dec 28, 2012 4:07 PM, "Johan Gunnarsson" <johan.gunnars...@gmail.com>
wrote:

> On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg <jay...@gmail.com> wrote:
> > Hi Johan,
> >
> > Thank you for the lovely data at
> >
> > https://toolserver.org/~johang/2012.html
> >
> > I posted that link to my facebook (below if you want to join in
> > there), and a few language specific facebook groups, and there have
> > been some concerns raised about the results, which I'll list below.
> >
> > These lists are getting some traction in the press so it would be good
> > to understand it better.
> >
> > http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed
>
> Cool, cool.
>
> >
> > Why is [[zh:Favicon]] #2?
> >
> > The data doesnt appear to support that
> >
> > http://stats.grok.se/zh/201201/Favicon
> > http://stats.grok.se/zh/latest90/Favicon
>
> My post-processing filtering follows redirects to find the "true"
> title. In this case the page Favicon.ico redirects to Favicon. This is
> probably due to broken browsers trying to load the icon.
>
> >
> > Number 1 in French is a plant native to asia.  The stats for December
> disagree
> > https://en.wikipedia.org/wiki/Ilex_crenata
> > http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9
>
> French's Ilex_crenata redirects to Houx_crénelé.
>
> Ilex_crenata had huge traffic in April:
> http://stats.grok.se/fr/201204/Ilex_crenata
>
> There are a bunch of spikes like this. I can't really explain it. I
> talked to Domas Mituzas (the maintainer of the original dumps I use)
> yesterday and he suggested it might be bots going crazy for whatever
> reason. I'd love to filter all these false positives, but haven't been
> able to come up with an easy way to do it.
>
> Might be possible with access to logs with the user-agent string, but
> that would probably inflate the dataset size even more. It's already
> past the terabyte. However that could probably be solved by sampling
> (for example) 1/100 of the entries.
>
> Comments and ideas are welcome!
>
> >
> > Number 1 in German is Cul de sac. This is odd, but matches the stats
> > http://stats.grok.se/de/201207/Sackgasse
>
> RIght. This one is funny. It has huge traffic on weekdays only.
> Deserted on weekends.
>
> >
> > Number 1 in Dutch is a Chinese mountain.  The stats for December disagree
> > http://stats.grok.se/nl/201212/Hua_Shan
>
> July/August agree: http://stats.grok.se/nl/201208/Hua_Shan
>
> >
> > Number 4 in Hebrew is zipper.  The stats for December disagree
> > http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>
> April agrees:
> http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>
> >
> > Number 2 in Spanish is '@'.  This is odd, but matches the stats
> > http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29
> >
> > --
> > John Vandenberg
> > https://www.facebook.com/johnmark.vandenberg
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to