Great work! One way for further analysis of such kind of geolinguistic aggregate is to do some data normalization, or geographic normalization, as demonstrated by my previous work <http://www.opensym.org/os2014-files/proceedings/p611.pdf>: http://www.opensym.org/os2014-files/proceedings/p611.pdf
Any one is welcome to do some data normalization using the geolinguistic size indicators here <https://github.com/hanteng/pyGeolinguisticSize/blob/master/size_geolinguistic.tsv>: https://github.com/hanteng/pyGeolinguisticSize/blob/master/size_geolinguistic.tsv Currently, it has Population (LP), Internet users (IPop), Economy Size (PPPGDP), etc. estimation based on "even distribution" across percentage share of language population per country based on the Unicode CLDR 25 Territory-Language Information. A simple linear regression can reveal, say, which geo-linguistic, geographic, or linguistic category has less-than-expected or more-than-expected proportional of viewing traffic, with the expected values being generated according to the sizes of population, Internet population, economy. I hope this great work by Nemo can be extended to cover (1) time-series report and data release (2) edits aggregate Altogether the tools and datasets will be a major milestone to monitor the language/project development across Wikimedia projects. Congrats! Best, han-teng liao 2015-02-26 8:31 GMT+01:00 Federico Leva (Nemo) <nemow...@gmail.com>: > Erik Zachte, 25/02/2015 23:34: > >> Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and >> http://stats.wikimedia.org/wikimedia/squids/ >> SquidReportPageViewsPerLanguageBreakdown.htm >> > > Ironholds' looks more vulnerable to bots, it's easier to see in small > wikis (though, kudos! many more small wikis are included than in > wikistats). For instance, 20 more percentage points for USA on Breton and > Bavarian Wikipedias, 30 on Welsh, 40 on Alemannic, almost 50 on Kurdish. > For Chinese bots they look similar, though in some cases I'm not sure > what's going on: for instance als.wiki also sees CH and RO emerge. > > Will the new pageviews definition use the same bot filtering method? > > Nemo > > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l