Re: [Wiki-research-l] [Analytics] [Offline-l] Fwd: Reasons you use the XML dumps or want to, but can't?

2015-02-25 Thread Toby Negrin
Thanks for doing that Andrew!

On Tue, Feb 24, 2015 at 1:41 PM, Andrew Otto ao...@wikimedia.org wrote:

 I also added some Hadoop based used cases to that document.


 https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1422073oldid=1421455


  On Feb 21, 2015, at 05:03, Emmanuel Engelhart kel...@kiwix.org wrote:
 
  Hi
 
  Thank you Nemo for adverting that interesting page about how to improve
 Wikimedia dumping processes. This topic is of course a primary concern for
 the Kiwix developer team.
 
  Here my contribution:
 
 https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1417187oldid=1415717
 
  Hope to see things going forward on this, I will help as much as I can.
 
  Regards
  Emmanuel
 
  On 21.02.2015 08:44, Federico Leva (Nemo) wrote:
  FYI
 
 
   Messaggio inoltrato 
  Oggetto: [Xmldatadumps-l] Your comments needed (long term dumps
  rewrite?)
  Data: Thu, 19 Feb 2015 12:30:01 +0200
  Mittente: Ariel Glenn WMF ar...@wikimedia.org
  A: xmldatadump...@lists.wikimedia.org
 
 
 
  The MediaWiki Core team has opened a discussion about getting more
  involved in and maybe redoing the dumps infrastructure.  A good starting
  point is to understand how folks use the dumps already or want to use
  them but can't, and some questions about that are listed here:
 
 https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improve_dumps
 
  I've added some notes but please go weigh in.  Don't be shy about what
  you do/what you need, this is the time to get it all on the table.
 
  Ariel
 
 
 
 
  ___
  Offline-l mailing list
  offlin...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/offline-l
 
 
 
  --
  Kiwix - Wikipedia Offline  more
  * Web: http://www.kiwix.org
  * Twitter: https://twitter.com/KiwixOffline
  * more: http://www.kiwix.org/wiki/Communication
 
  ___
  Analytics mailing list
  analyt...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] [Release]

2015-02-25 Thread Oliver Keyes
Hey all!

We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

Hope it's useful to people!

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] ICWSM Workshop Announcement and Call for Papers

2015-02-25 Thread Leila Zia
Hi,

   Bob West, Jure Leskovec, and myself are organizing a workshop in ICWSM
focused on the challenges and opportunities of Wikipedia. You can find more
information about the workshop and call for papers below.

   Looking forward to seeing many of you in person in the workshop.

Best,
Leila


*Call for Workshop Papers*
Workshop on Wikipedia, a Social Pedia: Research Challenges and Opportunities
May 26, Oxford, England
co-located with the 9th International Conference on Weblogs and Social
Media (ICWSM 2015)
http://snap.stanford.edu/wiki-icwsm15/
Deadline for papers: Tuesday, March 24, 2015, 23:59 AoE

Wikipedia is one of the most popular sites on the Web, a main source of
knowledge for a large fraction of Internet users, and, in the light of its
collaborative nature, an inherently social medium. Therefore, and since not
only all content but also many activity logs are available to the public,
Wikipedia has become an important object of study for researchers across
many subfields of the computational and social sciences, such as
social-network analysis, social psychology, education, anthropology,
political science, human-computer interaction, cognitive science,
artificial intelligence, linguistics, and natural-language processing.
This workshop is a venue for all researchers exploring social aspects of
Wikipedia. The workshop will feature high-profile speakers from academia
and the Wikimedia Foundation and aims to create a forum where participants
can connect both among each other and with researchers at the Wikimedia
Foundation.
Topics of interest include, but are not limited to:

   - Collaborative content creation
   - Consensus-finding and conflict resolution on editorial issues
   - Content consumption on Wikipedia
   - Participation in discussions and their dynamics
   - Collaborative task management
   - Evolution of hierarchies
   - Wikipedia as a sensor for real-world events, culture, etc.
   - Demographics of Wikipedia readers and editors
   - Engagement and incentivization of editors

We invite the submission of regular research papers (6–8 pages) as well as
position papers (2–4 pages). Authors whose papers are accepted to the
workshop will have the opportunity to participate in a poster session.

*Submission instructions*
Regular and position papers should be formatted according to AAAI
formatting guidelines (http://www.aaai.org/Publications/Author/author.php).
Please submit papers using EasyChair at https://easychair.org/conferences/?
conf=wikiicwsm2015

*Review and the archival of papers*
Authors will be notified of acceptance or rejection on or before Tuesday,
March 31, 2015.
The accepted papers will be published on the workshop webpage (unless the
authors object), and authors whose papers are accepted will have the
opportunity to participate in a poster session.

*Organizing committee*
Robert West, Stanford University
Jure Leskovec, Stanford University
Leila Zia, Wikimedia Foundation
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Pine W
Very nice. Do you think that you could pick out a few of your favorite
graphs and add them to this week's Recent Research report in a gallery?

Thanks!
Pine
Hey all!

We've released a highly-aggregated dataset of readership data -
specifically, data about where, geographically, traffic to each of our
projects (and all of our projects) comes from. The data can be found
at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
put together an exploration tool for it at
https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

Hope it's useful to people!

--
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Analytics mailing list
analyt...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Pine W
Excellent!

Pine
On Feb 25, 2015 1:26 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Totally! I'm also going to get together with some NEU hackers tomorrow
 and work on actually visualising the data on *drumroll* maps, which'd
 probably be more interesting eye candy than infinite bar plots :)

 On 25 February 2015 at 16:19, Pine W wiki.p...@gmail.com wrote:
  Very nice. Do you think that you could pick out a few of your favorite
  graphs and add them to this week's Recent Research report in a gallery?
 
  Thanks!
  Pine
 
  Hey all!
 
  We've released a highly-aggregated dataset of readership data -
  specifically, data about where, geographically, traffic to each of our
  projects (and all of our projects) comes from. The data can be found
  at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
  put together an exploration tool for it at
  https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
 
  Hope it's useful to people!
 
  --
  Oliver Keyes
  Research Analyst
  Wikimedia Foundation
 
  ___
  Analytics mailing list
  analyt...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
  ___
  Analytics mailing list
  analyt...@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/analytics
 



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Andrew Lih
Great job.

Who knew Esperanto was big in Japan and China at #2 and #3?



On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Oliver Keyes
The one major caveat, I think, is that the danger of proportionate
data is that it makes small projects very vulnerable to artificial
traffic spikes. I'd go out on a limb and say that some of the massive
bumps in popularity we see in particular combinations are likely due
to either undetected automata or simply a project having so little
traffic that a small number of people can sway the results
outlandishly.

On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote:
 Great job.

 Who knew Esperanto was big in Japan and China at #2 and #3?



 On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Oliver Keyes
Totally! I'm also going to get together with some NEU hackers tomorrow
and work on actually visualising the data on *drumroll* maps, which'd
probably be more interesting eye candy than infinite bar plots :)

On 25 February 2015 at 16:19, Pine W wiki.p...@gmail.com wrote:
 Very nice. Do you think that you could pick out a few of your favorite
 graphs and add them to this week's Recent Research report in a gallery?

 Thanks!
 Pine

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics




-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Signpost readership survey results

2015-02-25 Thread Pine W
Hello all,

I have uploaded the results from the *Signpost *readership survey to
Wikimedia Commons in PDF format:
https://commons.wikimedia.org/wiki/File:Signpost_February_2015_survey_results.pdf

Thanks very much to the WMF Learning and Evaluation Team for letting us use
Qualtrics.

The *Signpost* management team recently agreed to cross-post selected
content from the Wikimedia Blog into the *Signpost*. By doing this we can
both increase the exposure of Blog content (many *Signpost *readers don't
read the blog) and enhance the value of the *Signpost *to its current
readers (some of whom would like to see more coverage of sister projects
and other, diverse parts of the Wikimedia ecosystem).

Your comments on the survey results would be appreciated. The
*Signpost *management
team will have more to say after we study these results in more detail, and
we will publish our comments in a future *Signpost *issue.

Cheers,

Pine
*Signpost *Publication and Newsroom Manager

*This is an Encyclopedia* https://www.wikipedia.org/






*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Signpost readership survey results

2015-02-25 Thread phoebe ayers
On Wed, Feb 25, 2015 at 2:03 PM, Pine W wiki.p...@gmail.com wrote:
 Hello all,

 I have uploaded the results from the Signpost readership survey to Wikimedia
 Commons in PDF format:
 https://commons.wikimedia.org/wiki/File:Signpost_February_2015_survey_results.pdf

 Thanks very much to the WMF Learning and Evaluation Team for letting us use
 Qualtrics.


Thanks for doing this and sending it around, Pine. I just read through
all the comments and it's fascinating -- some people love the op-eds
and want more coverage of debates and disputes, but another large
group of people want the Signpost to be neutral and stay away from
drama!

I was also a little disheartened by the lackluster response about what
would motivate readers to contribute -- it seems everyone agrees the
Signpost is useful, but few people want to put the time into making it
that way. It's true that it's a lot of work -- I wrote News  Notes
for a couple of years and it was hugely time-consuming. But it was
also a lot of fun!

Regardless, congratulations on keeping up the 'Post and trying to make
it better.

best,
Phoebe


-- 
* I use this address for lists; send personal messages to phoebe.ayers
at gmail.com *

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-02-25 Thread Giovanni Luca Ciampaglia
This is really, really cool, great job guys!

G


Giovanni Luca Ciampaglia

✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
☞ http://www.glciampaglia.com/
✆ +1 812 855-7261
✉ gciam...@indiana.edu

2015-02-25 16:06 GMT-05:00 Oliver Keyes oke...@wikimedia.org:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Oliver Keyes
Yours is looking at just December, while mine is looking at the entire
year, for starters. Also, what's the apps/mobile web inclusion for
that report?

On 25 February 2015 at 17:34, Erik Zachte ezac...@wikimedia.org wrote:
 I am surprised that the new data, with crawlers excluded, show more wp:en 
 traffic from US (43%) than the old data (36.4% for 2014), which contained 
 much crawler traffic, presumably most of that from US.

 Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and
 http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm

 Any thoughts?

 Erik

 -Original Message-
 From: analytics-boun...@lists.wikimedia.org 
 [mailto:analytics-boun...@lists.wikimedia.org] On Behalf Of Oliver Keyes
 Sent: Wednesday, February 25, 2015 22:37
 To: Research into Wikimedia content and communities
 Cc: A mailing list for the Analytics Team at WMF and everybody who has an 
 interest in Wikipedia and analytics.
 Subject: Re: [Analytics] [Wiki-research-l] [Release]

 The one major caveat, I think, is that the danger of proportionate data is 
 that it makes small projects very vulnerable to artificial traffic spikes. 
 I'd go out on a limb and say that some of the massive bumps in popularity we 
 see in particular combinations are likely due to either undetected automata 
 or simply a project having so little traffic that a small number of people 
 can sway the results outlandishly.

 On 25 February 2015 at 16:32, Andrew Lih andrew@gmail.com wrote:
 Great job.

 Who knew Esperanto was big in Japan and China at #2 and #3?



 On Wed, Feb 25, 2015 at 4:06 PM, Oliver Keyes oke...@wikimedia.org wrote:

 Hey all!

 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of
 our projects (and all of our projects) comes from. The data can be
 found at http://dx.doi.org/10.6084/m9.figshare.1317408 -
 additionally, I've put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/

 Hope it's useful to people!

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] [Release]

2015-02-25 Thread Federico Leva (Nemo)

Erik Zachte, 25/02/2015 23:34:

Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/  and
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm


Ironholds' looks more vulnerable to bots, it's easier to see in small 
wikis (though, kudos! many more small wikis are included than in 
wikistats). For instance, 20 more percentage points for USA on Breton 
and Bavarian Wikipedias, 30 on Welsh, 40 on Alemannic, almost 50 on 
Kurdish. For Chinese bots they look similar, though in some cases I'm 
not sure what's going on: for instance als.wiki also sees CH and RO emerge.


Will the new pageviews definition use the same bot filtering method?

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l