Great work!

One way for further analysis of such kind of geolinguistic aggregate is to
do some data normalization, or geographic normalization, as demonstrated by my
previous work <http://www.opensym.org/os2014-files/proceedings/p611.pdf>:
http://www.opensym.org/os2014-files/proceedings/p611.pdf

Any one is welcome to do some data normalization using the geolinguistic
size indicators here
<https://github.com/hanteng/pyGeolinguisticSize/blob/master/size_geolinguistic.tsv>:
https://github.com/hanteng/pyGeolinguisticSize/blob/master/size_geolinguistic.tsv


Currently, it has Population (LP), Internet users (IPop), Economy Size
(PPPGDP), etc. estimation based on "even distribution" across percentage
share of language population per country based on the Unicode CLDR 25
Territory-Language Information.

A simple linear regression can reveal, say, which geo-linguistic,
geographic, or linguistic category has less-than-expected or
more-than-expected proportional of viewing traffic, with the expected
values being generated according to the sizes of population, Internet
population, economy.

I hope this great work by Nemo can be extended to cover

(1) time-series report and data release

(2) edits aggregate


Altogether the tools and datasets will be a major milestone to monitor the
language/project development across Wikimedia projects. Congrats!

Best,
han-teng liao

2015-02-26 8:31 GMT+01:00 Federico Leva (Nemo) <nemow...@gmail.com>:

> Erik Zachte, 25/02/2015 23:34:
>
>> Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/  and
>> http://stats.wikimedia.org/wikimedia/squids/
>> SquidReportPageViewsPerLanguageBreakdown.htm
>>
>
> Ironholds' looks more vulnerable to bots, it's easier to see in small
> wikis (though, kudos! many more small wikis are included than in
> wikistats). For instance, 20 more percentage points for USA on Breton and
> Bavarian Wikipedias, 30 on Welsh, 40 on Alemannic, almost 50 on Kurdish.
> For Chinese bots they look similar, though in some cases I'm not sure
> what's going on: for instance als.wiki also sees CH and RO emerge.
>
> Will the new pageviews definition use the same bot filtering method?
>
> Nemo
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to