Pine, that's a good point. The entire purpose of my research is to try to 
propose a new better metric for editors, because the ones we have at the moment 
are incomplete as you pointed out. The reason I need these imperfect ones is to 
calibrate my model against the best metrics we have at the moment.


This economic model tries to capture hidden capabilities. The intuition is that 
edits to articles that are not less heavily edited by others show that you are 
doing good background reading, or have some obscure knowledge. You know, it's 
only the Swiss that make Swiss watches, but every country and their mother can 
export Apples. Similarly, in this case I would say you're edits to Obama's page 
are less import than your edits to "Non-euclidean geometry".  How much should 
the number of other editors on the articles you edit count? Well I "calibrate" 
that variable such that my model most closely correlates with the best 
available metrics we have so far.


Side note: It is still very network heavy to compute labour-hours over the 
network via API. Did not even manage 600 users in 8 hours. So its germane to 
use the SQL replicas. I have been trying to do this over an ssh tunnel, locally 
or on wmflabs directly.



Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


________________________________
From: wiki-research-l-boun...@lists.wikimedia.org 
<wiki-research-l-boun...@lists.wikimedia.org> on behalf of Aaron Halfaker 
<aaron.halfa...@gmail.com>
Sent: Saturday, February 08, 2014 8:24 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

> However, measuring productivity by the difference of the times of first and 
> last edits won't do much for those of us who work on pages for hours before 
> pressing the save button and only save once.

Agreed.  This is a limitation.  However, if you're doing other work while 
writing the article or making intermittent saves as you go, then it will be 
captured.  Ethnographic work suggests that what you describe is uncommon, but 
present.  for this reason and others, it's important to see this "labor hours" 
estimate as a lower bound.  There's a lot of off-wiki work that isn't accounted 
for in any candidate measures using Wikipedia data.  For example, you'd have 
the exact same issue with edit counts and content persistence.


On Fri, Feb 7, 2014 at 9:29 PM, ENWP Pine 
<deyntest...@hotmail.com<mailto:deyntest...@hotmail.com>> wrote:
However, measuring productivity by the difference of the times of first and 
last edits won't do much for those of us who work on pages for hours before 
pressing the save button and only save once. (: It also doesn't measure time 
spent on private wikis or discussions on email and IRC, which also are not 
countable as productivity if you look only at public edit counts and logged 
actions.

I'm assuming that login and logout times on all wikis are not available for 
research use. If they were there would be privacy issues although mitigation is 
possible.

Pine

________________________________
From: aaron.halfa...@gmail.com<mailto:aaron.halfa...@gmail.com>
Date: Fri, 7 Feb 2014 17:15:36 -0600
To: 
wiki-research-l@lists.wikimedia.org<mailto:wiki-research-l@lists.wikimedia.org>

Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

I talked to Max on IRC, but I'm pointing here for the lurkers :)

I think that measuring labor hours via edit sessions is a great idea and I have 
python library to help extract sessions from edit histories.  See 
https://bitbucket.org/halfak/mediawiki-utilities.

Assuming that you have a list of a user's revisions from the API, using the 
session extractor to build a set of session start and end timestamps for a user 
would look like this:

----------------------------
from mwutil.lib import sessions

# Get your revisions ordered by timestamp
# revisions = <some API call result>

events = (rev['user'], rev['timestamp'], rev) for rev in revisions

for user, session in sessions.sessions(events):

    # write out a TSV file
    print "\t".join(
        str(v) for v in
        [user, len(session), session[0]['timestamp'], session[-1]['timestamp']
    )
---------------------------


On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max 
<kle...@oclc.org<mailto:kle...@oclc.org>> wrote:
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I 
became tentative of using bytes or edit counts.

Aaron, in my own search I also noticed you wrote with Geiger. About counting 
edit hour and edit sessions. [1]  Calculating content persistence is a bit too 
heavyweight for me right now since I am trying to submit to ACM Web Science in 
2 weeks (hose CFP was just on this list). The technique looks great though, and 
I would like to help support making a WMFlabs tool that can return this measure.

It seems like I could calculate approximate edit-hours from just looking at 
Special:Contributions timestamps. Is that correct? Would you suggest this route?


[1] 
http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf





Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


________________________________
From: 
wiki-research-l-boun...@lists.wikimedia.org<mailto:wiki-research-l-boun...@lists.wikimedia.org>
 
<wiki-research-l-boun...@lists.wikimedia.org<mailto:wiki-research-l-boun...@lists.wikimedia.org>>
 on behalf of Aaron Halfaker 
<aaron.halfa...@gmail.com<mailto:aaron.halfa...@gmail.com>>
Sent: Friday, February 07, 2014 7:12 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

Hey Max,

There's a class of metrics that might be relevant to your purposes.  I refer to 
them as "content persistence" metrics and wrote up some docs about how they 
work including an example.  See 
https://meta.wikimedia.org/wiki/Research:Content_persistence.

I gathered a list of papers below to provide a starting point.  I've included 
links to open access versions where I could.  These metrics are a little bit 
painful to compute due to the computational complexity of diffs, but I have 
some hardware to throw at the problem and another project that's bringing me in 
this direction, so I'd be interested in collaborating.

Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in 
Wikipedia." Proceedings of the 2007 international ACM conference on Supporting 
group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:

  *   Describes "Persistent word views" which is a measure of value added per 
editor.  (IMO, value actualized)

B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, 
and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In 
Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New 
York, NY, USA, , Article 26 , 12 pages. 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=rep1&type=pdf

  *   Describes a complex strategy for assigning trustworthiness to content 
based on implicit review.  See http://wikitrust.soe.ucsc.edu/

Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of 
your peers: quality, experience and ownership in Wikipedia. In Proceedings of 
the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfaker09jury-personal.pdf

  *   Describes the use of "Persistent word revisions per word" as a measure of 
article contribution quality.

Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: 
how reverts affect the quantity and quality of Wikipedia work. In Proceedings 
of the 7th International Symposium on Wikis and Open Collaboration (pp. 
163-172). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/Don't_Bite_the_Newbies/halfaker11bite-personal.pdf<http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/halfaker11bite-personal.pdf>

  *   Describes the use of raw "Persistent work revisions" as a measure of 
editor productivity
  *   Looking back on the study, I think I'd rather use log(# of revisions a 
word persists) * words.

-Aaron


On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) 
<nemow...@gmail.com<mailto:nemow...@gmail.com>> wrote:
Sort of related, an ongoing education@ discussion "student evaluation 
criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854

Nemo

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________ Wiki-research-l mailing list 
Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org> 
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org<mailto:Wiki-research-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to