yay, shiny! The map is a pretty compelling way to show how dominant traffic from the US is, even for very minor languages (say bi.wikipedia.org), I wonder how many requests from US-based bots/automata we’re still failing to detect.
> On Mar 3, 2015, at 9:29 PM, Oliver Keyes <oke...@wikimedia.org> wrote: > > Update: the original Shiny instance went down due to server load soon > after release. It's now up again at http://datavis.wmflabs.org/where/ > on a dedicated Labs machine, where we hope to put...many more > visualisations. It also now has mapping, largely thanks to Sarah > Laplante (http://sarahlaplante.com/), and soon it will hopefully be > /non-hideous/ mapping (the current mass of blue and grey is because my > aesthetic tastes are...I don't actually have any aesthetic tastes) > > On 2 March 2015 at 22:36, Oliver Keyes <oke...@wikimedia.org> wrote: >> Indeed! Orienting it that way (pivoting on language rather than >> project) is something several people have asked for; I plan to spend a >> chunk of my spare time (that is, recreational time) trying to make it >> work. Should be fairly trivial. >> >> On 2 March 2015 at 09:55, h <hant...@gmail.com> wrote: >>> Hello Finn, >>> I do not have a specific answer to your question. However, it might be >>> worthwhile to add Finnish in to the comparison as according to the CLDR 26 >>> T-L information >>> http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html >>> >>> You have some sizable Finnish language speakers in Sweden: >>> >>> Swedish {O} sv 95.0% 99.0% >>> Finnish {OR} fi 2.2% >>> >>> So if the similar query is executed on Finnish language, and the results >>> also show some "undue" proportion of visits from Sweden, then what you >>> observed as anomaly is the that unique. We probably need many iterations of >>> comparative outcomes and normalization of data (Sweden does have higher >>> population). Also, it might be handy to have some statistics on immigration >>> or residence, it is EU. I will not be surprised that for example the visits >>> from Oxford to Wikipedia website have sizable German language requests. >>> >>> I am still a bit bothered by the number "1" in the current dataset. It >>> does not feel right since the numbers of 1.4% and 0.6% is a notable >>> difference in this regard. Perhaps we need some high precision "universal >>> percentage" number for each territory-language pair. It would be also great >>> to do another set of aggregation: i.e. given a territory, which language >>> versions of Wikipedia are accessed.... >>> >>> Best, >>> han-teng liao >>> >>> 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen <f...@imm.dtu.dk>: >>>> >>>> Hi Oliver, >>>> >>>> >>>> Interesting dataset! I am curious about why the Danish Wikipedia is so >>>> highly acccessed from Sweden. Could it be an error, e.g., with Telia >>>> IP-numbers? >>>> >>>> In Python: >>>> >>>>>>> import pandas as pd >>>>>>> df = >>>>>>> pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv', >>>>>>> sep='\t') >>>>>>> df.ix[df.project == 'da.wikipedia.org', ['country', >>>>>>> 'pageviews_percentage']].set_index('country') pageviews_percentage >>>> country >>>> Austria 1 >>>> China 1 >>>> Denmark 61 >>>> Estonia 1 >>>> France 1 >>>> Germany 2 >>>> Netherlands 2 >>>> Norway 1 >>>> Sweden 18 >>>> United Kingdom 3 >>>> United States 3 >>>> Other 5 >>>> >>>> >>>> MaxMind has some numbers on their own accuracy: >>>> >>>> https://www.maxmind.com/en/geoip2-city-database-accuracy >>>> >>>> For Denmark 85% is "Correctly Resolved", for Sweden only 68%. I wonder if >>>> this really could bias the result so much. >>>> >>>> If the numbers are correct why would the Swedish read the Danish Wikipedia >>>> so much? Bots? It does not apply the other way around: Only 2% of the >>>> traffic to Swedish Wikipedia comes from Denmark. >>>> >>>> >>>> >>>> best regards >>>> Finn >>>> >>>> >>>> >>>> On 02/25/2015 10:06 PM, Oliver Keyes wrote: >>>>> >>>>> Hey all! >>>>> >>>>> We've released a highly-aggregated dataset of readership data - >>>>> specifically, data about where, geographically, traffic to each of our >>>>> projects (and all of our projects) comes from. The data can be found >>>>> at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've >>>>> put together an exploration tool for it at >>>>> https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ >>>>> >>>>> Hope it's useful to people! >>>>> >>>> >>>> >>>> -- >>>> Finn Årup Nielsen >>>> http://people.compute.dtu.dk/faan/ >>>> >>>> >>>> _______________________________________________ >>>> Wiki-research-l mailing list >>>> Wiki-research-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >>> >>> >>> _______________________________________________ >>> Wiki-research-l mailing list >>> Wiki-research-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >>> >> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l