On Fri, Dec 28, 2012 at 10:24 AM, John Vandenberg <jay...@gmail.com> wrote:

> Is favicon only in the Chinese Wikipedia top 100?
>
> It seems so, and is odd if the problem is a web browser bug.
>
> John Vandenberg.
> sent from Galaxy Note
> On Dec 28, 2012 4:07 PM, "Johan Gunnarsson" <johan.gunnars...@gmail.com>
> wrote:
>
>>  On Fri, Dec 28, 2012 at 5:33 AM, John Vandenberg <jay...@gmail.com>
>> wrote:
>> > Hi Johan,
>> >
>> > Thank you for the lovely data at
>> >
>> > https://toolserver.org/~johang/2012.html
>> >
>> > I posted that link to my facebook (below if you want to join in
>> > there), and a few language specific facebook groups, and there have
>> > been some concerns raised about the results, which I'll list below.
>> >
>> > These lists are getting some traction in the press so it would be good
>> > to understand it better.
>> >
>> > http://guardian.co.uk/technology/blog/2012/dec/27/wikipedia-most-viewed
>>
>> Cool, cool.
>>
>>
>> >
>> > Why is [[zh:Favicon]] #2?
>> >
>> > The data doesnt appear to support that
>> >
>> > http://stats.grok.se/zh/201201/Favicon
>> > http://stats.grok.se/zh/latest90/Favicon
>>
>> My post-processing filtering follows redirects to find the "true"
>> title. In this case the page Favicon.ico redirects to Favicon. This is
>> probably due to broken browsers trying to load the icon.
>>
>>
>> >
>> > Number 1 in French is a plant native to asia.  The stats for December
>> disagree
>> > https://en.wikipedia.org/wiki/Ilex_crenata
>> > http://stats.grok.se/fr/201212/Houx_cr%C3%A9nel%C3%A9
>>
>> French's Ilex_crenata redirects to Houx_crénelé.
>>
>> Ilex_crenata had huge traffic in April:
>> http://stats.grok.se/fr/201204/Ilex_crenata
>>
>> There are a bunch of spikes like this. I can't really explain it. I
>> talked to Domas Mituzas (the maintainer of the original dumps I use)
>> yesterday and he suggested it might be bots going crazy for whatever
>> reason. I'd love to filter all these false positives, but haven't been
>> able to come up with an easy way to do it.
>>
>> Might be possible with access to logs with the user-agent string, but
>> that would probably inflate the dataset size even more. It's already
>> past the terabyte. However that could probably be solved by sampling
>> (for example) 1/100 of the entries.
>>
>> Comments and ideas are welcome!
>>
>>
>> >
>> > Number 1 in German is Cul de sac. This is odd, but matches the stats
>> > http://stats.grok.se/de/201207/Sackgasse
>>
>> RIght. This one is funny. It has huge traffic on weekdays only.
>> Deserted on weekends.
>
> This has been noted on the dewiki village pump before. The most
interesting guess
there<https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#Sackgasse_als_Top_Artikel_.3F.21>(by
Benutzer:YMS): There might be a web filtering software installed on
workplace PCs in companies which redirects all prohibited URLs to the
German Wikipedia on cul-de-sac. This would explain the weekly pattern, and
also http://stats.grok.se/de/201112/Sackgasse (December 25-26 are holidays
in Germany, and many employees take the rest of the year off).


>
>>
>> >
>> > Number 1 in Dutch is a Chinese mountain.  The stats for December
>> disagree
>> > http://stats.grok.se/nl/201212/Hua_Shan
>>
>> July/August agree: http://stats.grok.se/nl/201208/Hua_Shan
>>
>>
>> >
>> > Number 4 in Hebrew is zipper.  The stats for December disagree
>> > http://stats.grok.se/he/201212/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>>
>> April agrees:
>> http://stats.grok.se/he/201204/%D7%A8%D7%95%D7%9B%D7%A1%D7%9F
>>
>>
>> >
>> > Number 2 in Spanish is '@'.  This is odd, but matches the stats
>> > http://stats.grok.se/es/201212/Arroba_%28s%C3%ADmbolo%29
>> >
>> > --
>> > John Vandenberg
>> > https://www.facebook.com/johnmark.vandenberg
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 
Tilman Bayer
Senior Operations Analyst (Movement Communications)
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to