Re: [Wiki-research-l] Big data benefits and limitations (relevance: WMF editor engagement, fundraising, and HR practices)

2013-01-02 Thread Everton Zanella Alvarenga
Dear Pine,

thank you for sharing these links. I cannot read everything now, but one of
these warticles was also recommended by a friend, Sure, Big Data Is Great.
But So Is 
Intuition.http://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-forget-intuition.html?_r=1;,
by Steve Lohr, that reminded me a case in Brazil which avoided previous
mistakes, that was a collaborative process of
hiringhttp://blog.wikimedia.org/2012/01/11/brazil-recruiting-and-partnership-with-the-community-moves-forward/a
consultant for the WMF programs in Brazil, i. e., the community was
listened. (See the full discussion that resulted in a better process here 
http://comments.gmane.org/gmane.org.wikimedia.brazil/161, although it
still can improve.)

It’s encouraging that thoughtful data scientists like Ms. Perlich and Ms.
Schutt recognize the limits and shortcomings of the Big Data technology
that they are building. Listening to the data is important, they say, but
so is experience and intuition. After all, what is intuition at its best
but large amounts of data of all kinds filtered through a human brain
rather than a math model?

As Alexandre Abdo pointed
outhttp://permalink.gmane.org/gmane.org.wikimedia.brazil/358in this
not so old discussion, we, the Brazilian community, were being
handled as consummated facts, and the community experience and intuition
was not being taken into account as it could - although I must tell a lot
of efforts were done in this direction. I hope a lesson was /learned/ and
this can help to the direction the organization is taking with its
grantmaking and learnings. :)

This also reminds me that there is no mathematical model that explains now
(maybe there never will...) the kind of system Wikimedia projects deal with
and sometimes lovely graphics and data interpretations are assumed as
scientific statements, regardless of their scientifically underpinnings.

Have a good year,

Tom


On Sun, Dec 30, 2012 at 1:26 AM, ENWP Pine deyntest...@hotmail.com wrote:

  I'm sending this to Wikimedia-l, Wikitech-l, and Research-l in case
 other people in the Wikimedia movement or staff are interested in big
 data as it relates to Wikimedia. I hope that those who are interested in
 discussions about WMF editor engagement efforts, WMF fundraising, or WMF HR
 practices will also find that this email interests them. Feel free to skip
 straight to the links in the latter portion of this email if you're already
 familiar with big data and its analysis and if you just want to see what
 other people are writing about the subject.

 * Introductory comments / my personal opinion

 Big data refers to large quantities of information that are so large
 that they are difficult to analyze and may not be related internally in an
 obvious way. See https://en.wikipedia.org/wiki/Big_data

 I think that most of us would agree that moving much of an organization's
 information into the Cloud, and/or directing people to analyze massive
 quantities of information, will not automatically result in better, or even
 good, decisions based on that information. Also, I think that most of us
 would agree that bigger and/or more accessible quantities of data does not
 necessarily imply that the data are more accurate or more relevant for a
 particular purpose. Another concern is the possibility of unwelcome
 intrusions into sensitive information, including the possibility of data
 breaches; imagine the possible consequences if a hacker broke into
 supposedly secure databases held by Facebook or the Securities and Exchange
 Commission.

 We have an enormous quantity of data on Wikimedia projects, and many ways
 that we can examine those data. As this  Dilbert strip points out, context
 is important, and looking at statistics devoid of their larger contexts can
 be problematic. http://dilbert.com/strips/comic/1993-02-07/

 Since data analysis is also something that Wikipedia does in the areas I
 mentioned previously, I'm passing along a few links for those who may be
 interested about the benefits and limitations of big data.

 * Links:

 From the Harvard Business Review
 http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1


 From the New York Times

 https://www.nytimes.com/2012/12/30/technology/big-data-is-great-but-dont-forget-intuition.html
 and

 https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html


 From the Wall Street Journal. This may be especially interesting to those
 who are participating in the discussions on Wikimedia-l regarding how
 Wikimedia selects, pays, and manages its staff.

 http://online.wsj.com/article/SB1872396390443890304578006252019616768.html


 And from English Wikipedia (:
 https://en.wikipedia.org/wiki/Big_data
 and
 https://en.wikipedia.org/wiki/Data_mining
 and
 https://en.wikipedia.org/wiki/Business_intelligence


 Cheers,

 Pine

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 

Re: [Wiki-research-l] 2012 top pageview list

2013-01-02 Thread Kerry Raymond
The problem (as always) is that there is a difference between pages served
(by the web server) and pages actually wanted and read by the user. 

It would be interesting to have referrer statistics. I'm guessing that many
of Wikipedia pages are being referred by Google (and other general search
engines). If so, people may just be clicking through a list of search
results, which causes them to download a WP page but then immediately move
onto the next search result because it isn't what they are looking for. I
rather suspect the prominence of Facebook in the English Wikipedia results
is due to this effect, as I often find myself on the Wikipedia page for
Facebook instead of Facebook itself following a google search. I think the
use of mobile devices (with small screens) probably encourages this sort of
behaviour.

Kerry


-Original Message-
From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Andrew G.
West
Sent: Sunday, 30 December 2012 2:06 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] 2012 top pageview list

The WMF aggregates them as (page,views) pairs on an hourly basis:

http://dumps.wikimedia.org/other/pagecounts-raw/

I've been parsing these and storing them in a query-able DB format (for 
en.wp exclusively; though the files are available for all projects I 
think) for about two years. If you want to maintain such a fine 
granularity, it can quickly become a terrabyte scale task that eats up a 
lot of processing time.

If your looking for more coarse granularity reports (like top views for 
day, week, month) a lot of efficient aggregation can be done.

See also: http://en.wikipedia.org/wiki/Wikipedia:5000

Thanks, -AW


On 12/28/2012 07:28 PM, John Vandenberg wrote:
 There is a steady stream of blogs and 'news' about these lists


https://encrypted.google.com/search?client=ubuntuchannel=fsq=%22Sean+hoyla
nd%22ie=utf-8oe=utf-8#q=wikipedia+top+2012hl=ensafe=offclient=ubuntutb
o=dchannel=fstbm=nwssource=lnttbs=qdr:wsa=Xpsj=1ei=GzjeUOPpAsfnrAeQk4
DgCgved=0CB4QpwUoAwbav=on.2,or.r_gc.r_pw.r_cp.r_qf.bvm=bv.1355534169,d.aW
Mfp=4e60e761ee133369bpcl=40096503biw=1024bih=539

 How does a researcher go about obtaining access logs with useragents
 in order to answer some of these questions?


-- 
Andrew G. West, Doctoral Candidate
Dept. of Computer and Information Science
University of Pennsylvania, Philadelphia PA
Website: http://www.andrew-g-west.com

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] looking for paper on the new editor experience

2013-01-02 Thread Aaron Halfaker
I think you're talking about a paper that I just finished editing American
Behavioral Scientist: *The Rise and Decline of an Open Collaboration
Community: **How Wikipedia's reaction to sudden popularity is causing its
decline**
*

Summary of findings (and free to download pre-print):
http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/

Official listing:
http://abs.sagepub.com/content/early/2012/12/26/0002764212469365

 For quick reference, here's my *TL;DR: *

To deal with the massive influx of new editors between 2004 and 2007,
Wikipedians built automated quality control tools and solidified their
rules of governance. These reasonable and effective strategies for
maintaining the quality of the encyclopedia have come at the cost of
decreased retention of desirable newcomers.

   1. The decline represents a change in the rate of retention of
   desirable, *good-faith* newcomers.
  - The proportion of newcomers that edit in good-faith has not changed
  since 2006.
  - These desirable newcomers are more likely to have their work
  rejected since 2007.
  - This increased rejection predicts the observed decline in retention.
   2. Semi-autonomous vandal fighting tools (like
Hugglehttp://en.wikipedia.org/wiki/Wikipedia:Huggle)
   are partially at fault.
  - An increasing proportion of desirable newcomers are having their
  work rejected by automated tools.
  - These automated reverts exacerbate the predicted negative effects
  of rejection on retention.
  - Users of Huggle tend to not engage in the best
practiceshttp://en.wikipedia.org/wiki/WP:BRD for
  discussing the reverts they perform.
   3. New users are being pushed out of policy articulation.
  - The formalized process for vetting new policies and changes to
  policies ensures that newcomers' edits do not survive.
  - Both newcomers and experienced editors are moving increasingly
  toward less formal spaces http://en.wikipedia.org/wiki/WP:ESSAYS.


-Aaron


On Wed, Jan 2, 2013 at 8:47 PM, Everton Zanella Alvarenga 
ezalvare...@wikimedia.org wrote:

 Hi,

 I don't know the article, but check if searching here helps

 http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/

 http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html

 I don't know why I cannot use google.com with the parameter site:
 for this mailing list archive
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l.

 Tom


 On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com
 wrote:
  In the past few months, I read a paper (or a draft paper?) that I think
 was
  shared on this mailing list. Unfortunately I seem to have lost both the
  paper and the email (job change), so I would be grateful if anyone could
  send me the paper or a link or whatever.
 
 
 
  IIRC, the paper was looking at editor retention, particularly the
 retention
  of new editors. I think there were about 8 hypotheses given and some
  experiments conducted to test these. The one I remember most clearly was
 the
  finding that new good-faith editors were highly likely to see their
  contributions deleted, by either bots or more experienced editors, and
 this
  was likely to be de-motivating for them.
 
 
 
  If anyone can help with this, it would be much appreciated.
 
 
 
  Kerry
 
 
 
 
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 



 --
 Everton Zanella Alvarenga (also Tom)
 A life spent making mistakes is not only more honorable, but more
 useful than a life spent doing nothing.

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] CFP: TAEECE2013 - Turkey -IEEE

2013-01-02 Thread The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE2013)
The International Conference on Technological Advances in Electrical,
Electronics and Computer Engineering (TAEECE2013)

Konya, Turkey - May 9-11, 2013 | Mevlana University
All papers will be submitted to IEEE for potential inclusion to IEEE Xplore.

http://sdiwc.net/conferences/2013/taeece2013/

The conference welcome papers on the following (but not limited to)
research topics:
- 3D Semiconductor Device Technology
- Advanced Electromagnetics
- Component Technology of MEMS
- Computer Engineering
- Electronics System-Level Based Design
- Electronics-Medical Electronics
- Fiber Optics and Fiber Devices
- Intelligent Transportation Systems
- Software Assurance
- Assembly and Packaging
- Antenna and Propagation
- Circuits and Electronics
- Electric Energy Processing
- Electro-optical Phenomena of Semiconductors
- Microwave Theory and Techniques
- Algorithms
- Automated Software Engineering
- Computer-aided Design
- Computer Architecture
- Computer Modeling
- Computer Security
- Database and Data Mining

Researchers are encouraged to submit their work electronically. All papers
will be fully refereed by a minimum of two specialized referees. Before
final acceptance, all referees comments must be considered. Accepted
papers will be published in the conference proceedings. Best papers awards
will be distributed during the conference.

If you have any problems with the above web submission form, you can send
the manuscript by email to t...@sdiwc.net (subject: TAEECE2013 Paper
Submission).

Submission Deadline: March 20, 2013
Notification of Acceptance: April 20, 2013 or Before
Camera Ready Submission: April 30, 2013
Registration: April 30, 2013


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] looking for paper on the new editor experience

2013-01-02 Thread Kerry Raymond

Found it via Google Scholar

http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/

Sent from my iPad

On 03/01/2013, at 12:47 PM, Everton Zanella Alvarenga 
ezalvare...@wikimedia.org wrote:

 Hi,
 
 I don't know the article, but check if searching here helps
 
 http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/
 
 http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html
 
 I don't know why I cannot use google.com with the parameter site:
 for this mailing list archive
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l.
 
 Tom
 
 
 On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com 
 wrote:
 In the past few months, I read a paper (or a draft paper?) that I think was
 shared on this mailing list. Unfortunately I seem to have lost both the
 paper and the email (job change), so I would be grateful if anyone could
 send me the paper or a link or whatever.
 
 
 
 IIRC, the paper was looking at editor retention, particularly the retention
 of new editors. I think there were about 8 hypotheses given and some
 experiments conducted to test these. The one I remember most clearly was the
 finding that new good-faith editors were highly likely to see their
 contributions deleted, by either bots or more experienced editors, and this
 was likely to be de-motivating for them.
 
 
 
 If anyone can help with this, it would be much appreciated.
 
 
 
 Kerry
 
 
 
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 -- 
 Everton Zanella Alvarenga (also Tom)
 A life spent making mistakes is not only more honorable, but more
 useful than a life spent doing nothing.
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] looking for paper on the new editorexperience

2013-01-02 Thread Kerry Raymond
Thanks to all of you who replied to me on and off-list. This is indeed the
paper I was looking for!

 

Aaron, your research looked at new editors. Have you thought about
undertaking a similar study about loss of seasoned editors (the
beyond-newbie phase)? I note that there are a number of hypotheses on this
topic

 

http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Editor_Retention#Reasons_
editors_leave

 

but not much evidence to provide any guidance. There was a survey sent to
formerly active editors (can't find the URL) which gathered some data
which indicated that, apart from personal reasons, seasoned editors
primarily left because of community issues, e.g. the behaviour of other
editors. So I was wondering if the use of edit logs, user contributions, etc
could be used (quantitatively or qualitatively) to provide insights into
patterns of behaviour (of the editor or others interacting with that editor
via contributions or talk pages etc) that might provide clues into the
departure of seasoned editors and/or early warning signs of someone about
to walk.

 

Kerry

 

 

  _  

From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Aaron
Halfaker
Sent: Thursday, 3 January 2013 1:36 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] looking for paper on the new
editorexperience

 

I think you're talking about a paper that I just finished editing American
Behavioral Scientist: The Rise and Decline of an Open Collaboration
Community: How Wikipedia's reaction to sudden popularity is causing its
decline

 

Summary of findings (and free to download pre-print):
http://www-users.cs.umn.edu/~halfak/publications/The_Rise_and_Decline/

 

Official listing:
http://abs.sagepub.com/content/early/2012/12/26/0002764212469365

 

For quick reference, here's my TL;DR: 

 

To deal with the massive influx of new editors between 2004 and 2007,
Wikipedians built automated quality control tools and solidified their rules
of governance. These reasonable and effective strategies for maintaining the
quality of the encyclopedia have come at the cost of decreased retention of
desirable newcomers.

1.  The decline represents a change in the rate of retention of
desirable, good-faith newcomers.

*   The proportion of newcomers that edit in good-faith has not changed
since 2006.
*   These desirable newcomers are more likely to have their work
rejected since 2007.
*   This increased rejection predicts the observed decline in retention.

2.  Semi-autonomous vandal fighting tools (like Huggle
http://en.wikipedia.org/wiki/Wikipedia:Huggle ) are partially at fault.

*   An increasing proportion of desirable newcomers are having their
work rejected by automated tools.
*   These automated reverts exacerbate the predicted negative effects of
rejection on retention.
*   Users of Huggle tend to not engage in the best practices
http://en.wikipedia.org/wiki/WP:BRD  for discussing the reverts they
perform.

3.  New users are being pushed out of policy articulation.

*   The formalized process for vetting new policies and changes to
policies ensures that newcomers' edits do not survive.
*   Both newcomers and experienced editors are moving increasingly
toward less formal http://en.wikipedia.org/wiki/WP:ESSAYS  spaces.

 

-Aaron

 

On Wed, Jan 2, 2013 at 8:47 PM, Everton Zanella Alvarenga
ezalvare...@wikimedia.org wrote:

Hi,

I don't know the article, but check if searching here helps

http://www.mail-archive.com/wiki-research-l@lists.wikimedia.org/

http://wikimedia.7.n6.nabble.com/WikiMedia-Research-f1477409.html

I don't know why I cannot use google.com with the parameter site:
for this mailing list archive
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l.

Tom



On Wed, Jan 2, 2013 at 10:48 PM, Kerry Raymond kerry.raym...@gmail.com
wrote:
 In the past few months, I read a paper (or a draft paper?) that I think
was
 shared on this mailing list. Unfortunately I seem to have lost both the
 paper and the email (job change), so I would be grateful if anyone could
 send me the paper or a link or whatever.



 IIRC, the paper was looking at editor retention, particularly the
retention
 of new editors. I think there were about 8 hypotheses given and some
 experiments conducted to test these. The one I remember most clearly was
the
 finding that new good-faith editors were highly likely to see their
 contributions deleted, by either bots or more experienced editors, and
this
 was likely to be de-motivating for them.



 If anyone can help with this, it would be much appreciated.



 Kerry







 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Everton Zanella Alvarenga (also Tom)
A life spent making mistakes is not only more honorable, but more
useful