Re: [Wiki-research-l] [Analytics] [Offline-l] Fwd: Reasons you use the XML dumps or want to, but can't?
Thanks for doing that Andrew! On Tue, Feb 24, 2015 at 1:41 PM, Andrew Otto ao...@wikimedia.org wrote: I also added some Hadoop based used cases to that document. https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1422073oldid=1421455 On Feb 21, 2015, at 05:03, Emmanuel Engelhart kel...@kiwix.org wrote: Hi Thank you Nemo for adverting that interesting page about how to improve Wikimedia dumping processes. This topic is of course a primary concern for the Kiwix developer team. Here my contribution: https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1417187oldid=1415717 Hope to see things going forward on this, I will help as much as I can. Regards Emmanuel On 21.02.2015 08:44, Federico Leva (Nemo) wrote: FYI Messaggio inoltrato Oggetto: [Xmldatadumps-l] Your comments needed (long term dumps rewrite?) Data: Thu, 19 Feb 2015 12:30:01 +0200 Mittente: Ariel Glenn WMF ar...@wikimedia.org A: xmldatadump...@lists.wikimedia.org The MediaWiki Core team has opened a discussion about getting more involved in and maybe redoing the dumps infrastructure. A good starting point is to understand how folks use the dumps already or want to use them but can't, and some questions about that are listed here: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improve_dumps I've added some notes but please go weigh in. Don't be shy about what you do/what you need, this is the time to get it all on the table. Ariel ___ Offline-l mailing list offlin...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l -- Kiwix - Wikipedia Offline more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] [Release]
Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] ICWSM Workshop Announcement and Call for Papers
Hi, Bob West, Jure Leskovec, and myself are organizing a workshop in ICWSM focused on the challenges and opportunities of Wikipedia. You can find more information about the workshop and call for papers below. Looking forward to seeing many of you in person in the workshop. Best, Leila *Call for Workshop Papers* Workshop on Wikipedia, a Social Pedia: Research Challenges and Opportunities May 26, Oxford, England co-located with the 9th International Conference on Weblogs and Social Media (ICWSM 2015) http://snap.stanford.edu/wiki-icwsm15/ Deadline for papers: Tuesday, March 24, 2015, 23:59 AoE Wikipedia is one of the most popular sites on the Web, a main source of knowledge for a large fraction of Internet users, and, in the light of its collaborative nature, an inherently social medium. Therefore, and since not only all content but also many activity logs are available to the public, Wikipedia has become an important object of study for researchers across many subfields of the computational and social sciences, such as social-network analysis, social psychology, education, anthropology, political science, human-computer interaction, cognitive science, artificial intelligence, linguistics, and natural-language processing. This workshop is a venue for all researchers exploring social aspects of Wikipedia. The workshop will feature high-profile speakers from academia and the Wikimedia Foundation and aims to create a forum where participants can connect both among each other and with researchers at the Wikimedia Foundation. Topics of interest include, but are not limited to: - Collaborative content creation - Consensus-finding and conflict resolution on editorial issues - Content consumption on Wikipedia - Participation in discussions and their dynamics - Collaborative task management - Evolution of hierarchies - Wikipedia as a sensor for real-world events, culture, etc. - Demographics of Wikipedia readers and editors - Engagement and incentivization of editors We invite the submission of regular research papers (6–8 pages) as well as position papers (2–4 pages). Authors whose papers are accepted to the workshop will have the opportunity to participate in a poster session. *Submission instructions* Regular and position papers should be formatted according to AAAI formatting guidelines (http://www.aaai.org/Publications/Author/author.php). Please submit papers using EasyChair at https://easychair.org/conferences/? conf=wikiicwsm2015 *Review and the archival of papers* Authors will be notified of acceptance or rejection on or before Tuesday, March 31, 2015. The accepted papers will be published on the workshop webpage (unless the authors object), and authors whose papers are accepted will have the opportunity to participate in a poster session. *Organizing committee* Robert West, Stanford University Jure Leskovec, Stanford University Leila Zia, Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Analytics] [Release]
Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery? Thanks! Pine Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Analytics] [Release]
Excellent! Pine On Feb 25, 2015 1:26 PM, Oliver Keyes oke...@wikimedia.org wrote: Totally! I'm also going to get together with some NEU hackers tomorrow and work on actually visualising the data on *drumroll* maps, which'd probably be more interesting eye candy than infinite bar plots :) On 25 February 2015 at 16:19, Pine W wiki.p...@gmail.com wrote: Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery? Thanks! Pine Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
Great job. Who knew Esperanto was big in Japan and China at #2 and #3? On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
The one major caveat, I think, is that the danger of proportionate data is that it makes small projects very vulnerable to artificial traffic spikes. I'd go out on a limb and say that some of the massive bumps in popularity we see in particular combinations are likely due to either undetected automata or simply a project having so little traffic that a small number of people can sway the results outlandishly. On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote: Great job. Who knew Esperanto was big in Japan and China at #2 and #3? On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Analytics] [Release]
Totally! I'm also going to get together with some NEU hackers tomorrow and work on actually visualising the data on *drumroll* maps, which'd probably be more interesting eye candy than infinite bar plots :) On 25 February 2015 at 16:19, Pine W wiki.p...@gmail.com wrote: Very nice. Do you think that you could pick out a few of your favorite graphs and add them to this week's Recent Research report in a gallery? Thanks! Pine Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Signpost readership survey results
Hello all, I have uploaded the results from the *Signpost *readership survey to Wikimedia Commons in PDF format: https://commons.wikimedia.org/wiki/File:Signpost_February_2015_survey_results.pdf Thanks very much to the WMF Learning and Evaluation Team for letting us use Qualtrics. The *Signpost* management team recently agreed to cross-post selected content from the Wikimedia Blog into the *Signpost*. By doing this we can both increase the exposure of Blog content (many *Signpost *readers don't read the blog) and enhance the value of the *Signpost *to its current readers (some of whom would like to see more coverage of sister projects and other, diverse parts of the Wikimedia ecosystem). Your comments on the survey results would be appreciated. The *Signpost *management team will have more to say after we study these results in more detail, and we will publish our comments in a future *Signpost *issue. Cheers, Pine *Signpost *Publication and Newsroom Manager *This is an Encyclopedia* https://www.wikipedia.org/ *One gateway to the wide garden of knowledge, where lies The deep rock of our past, in which we must delve The well of our future,The clear water we must leave untainted for those who come after us,The fertile earth, in which truth may grow in bright places, tended by many hands,And the broad fall of sunshine, warming our first steps toward knowing how much we do not know.* *—Catherine Munro* ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Signpost readership survey results
On Wed, Feb 25, 2015 at 2:03 PM, Pine W wiki.p...@gmail.com wrote: Hello all, I have uploaded the results from the Signpost readership survey to Wikimedia Commons in PDF format: https://commons.wikimedia.org/wiki/File:Signpost_February_2015_survey_results.pdf Thanks very much to the WMF Learning and Evaluation Team for letting us use Qualtrics. Thanks for doing this and sending it around, Pine. I just read through all the comments and it's fascinating -- some people love the op-eds and want more coverage of debates and disputes, but another large group of people want the Signpost to be neutral and stay away from drama! I was also a little disheartened by the lackluster response about what would motivate readers to contribute -- it seems everyone agrees the Signpost is useful, but few people want to put the time into making it that way. It's true that it's a lot of work -- I wrote News Notes for a couple of years and it was hugely time-consuming. But it was also a lot of fun! Regardless, congratulations on keeping up the 'Post and trying to make it better. best, Phoebe -- * I use this address for lists; send personal messages to phoebe.ayers at gmail.com * ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Release]
This is really, really cool, great job guys! G Giovanni Luca Ciampaglia ✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA ☞ http://www.glciampaglia.com/ ✆ +1 812 855-7261 ✉ gciam...@indiana.edu 2015-02-25 16:06 GMT-05:00 Oliver Keyes oke...@wikimedia.org: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Analytics] [Release]
Yours is looking at just December, while mine is looking at the entire year, for starters. Also, what's the apps/mobile web inclusion for that report? On 25 February 2015 at 17:34, Erik Zachte ezac...@wikimedia.org wrote: I am surprised that the new data, with crawlers excluded, show more wp:en traffic from US (43%) than the old data (36.4% for 2014), which contained much crawler traffic, presumably most of that from US. Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm Any thoughts? Erik -Original Message- From: analytics-boun...@lists.wikimedia.org [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Oliver Keyes Sent: Wednesday, February 25, 2015 22:37 To: Research into Wikimedia content and communities Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Wiki-research-l] [Release] The one major caveat, I think, is that the danger of proportionate data is that it makes small projects very vulnerable to artificial traffic spikes. I'd go out on a limb and say that some of the massive bumps in popularity we see in particular combinations are likely due to either undetected automata or simply a project having so little traffic that a small number of people can sway the results outlandishly. On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote: Great job. Who knew Esperanto was big in Japan and China at #2 and #3? On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote: Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list analyt...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Analytics] [Release]
Erik Zachte, 25/02/2015 23:34: Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm Ironholds' looks more vulnerable to bots, it's easier to see in small wikis (though, kudos! many more small wikis are included than in wikistats). For instance, 20 more percentage points for USA on Breton and Bavarian Wikipedias, 30 on Welsh, 40 on Alemannic, almost 50 on Kurdish. For Chinese bots they look similar, though in some cases I'm not sure what's going on: for instance als.wiki also sees CH and RO emerge. Will the new pageviews definition use the same bot filtering method? Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l