Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-08 Thread Aaron Halfaker
 However, measuring productivity by the difference of the times of first
and last edits won't do much for those of us who work on pages for hours
before pressing the save button and only save once.

Agreed.  This is a limitation.  However, if you're doing other work while
writing the article or making intermittent saves as you go, then it will be
captured.  Ethnographic work suggests that what you describe is uncommon,
but present.  for this reason and others, it's important to see this labor
hours estimate as a lower bound.  There's a lot of off-wiki work that
isn't accounted for in any candidate measures using Wikipedia data.  For
example, you'd have the exact same issue with edit counts and content
persistence.


On Fri, Feb 7, 2014 at 9:29 PM, ENWP Pine deyntest...@hotmail.com wrote:

 However, measuring productivity by the difference of the times of first
 and last edits won't do much for those of us who work on pages for hours
 before pressing the save button and only save once. (: It also doesn't
 measure time spent on private wikis or discussions on email and IRC, which
 also are not countable as productivity if you look only at public edit
 counts and logged actions.

 I'm assuming that login and logout times on all wikis are not available
 for research use. If they were there would be privacy issues although
 mitigation is possible.

 Pine

 --
 From: aaron.halfa...@gmail.com
 Date: Fri, 7 Feb 2014 17:15:36 -0600
 To: wiki-research-l@lists.wikimedia.org

 Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for
 Users?

 I talked to Max on IRC, but I'm pointing here for the lurkers :)

 I think that measuring labor hours via edit sessions is a great idea and I
 have python library to help extract sessions from edit histories.  See
 https://bitbucket.org/halfak/mediawiki-utilities.

 Assuming that you have a list of a user's revisions from the API, using
 the session extractor to build a set of session start and end timestamps
 for a user would look like this:

 
 *from mwutil.lib import sessions*

 # Get your revisions ordered by timestamp
 # revisions = some API call result

 events = (rev['user'], rev['timestamp'], rev) for rev in revisions

 for user, session in *sessions.sessions*(events):

 # write out a TSV file
 print \t.join(
 str(v) for v in
 [user, len(session), session[0]['timestamp'],
 session[-1]['timestamp']
 )
 ---


 On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max kle...@oclc.org wrote:

  Thanks Nemo, I'll re-read that discussion. I think that conversation is
 where I became tentative of using bytes or edit counts.

 Aaron, in my own search I also noticed you wrote with Geiger. About
 counting edit hour and edit sessions. [1]  Calculating content persistence
 is a bit too heavyweight for me right now since I am trying to submit to
 ACM Web Science in 2 weeks (hose CFP was just on this list). The technique
 looks great though, and I would like to help support making a WMFlabs tool
 that can return this measure.

 It seems like I could calculate approximate edit-hours from just looking
 at Special:Contributions timestamps. Is that correct? Would you suggest
 this route?


 [1]
 http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf





  Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023


  --
 *From:* wiki-research-l-boun...@lists.wikimedia.org 
 wiki-research-l-boun...@lists.wikimedia.org on behalf of Aaron Halfaker 
 aaron.halfa...@gmail.com
 *Sent:* Friday, February 07, 2014 7:12 AM
 *To:* Research into Wikimedia content and communities
 *Subject:* Re: [Wiki-research-l] Preexsiting Researchers on Metrics for
 Users?

   Hey Max,

  There's a class of metrics that might be relevant to your purposes.  I
 refer to them as content persistence metrics and wrote up some docs about
 how they work including an example.  See
 https://meta.wikimedia.org/wiki/Research:Content_persistence.

  I gathered a list of papers below to provide a starting point.  I've
 included links to open access versions where I could.  These metrics are a
 little bit painful to compute due to the computational complexity of diffs,
 but I have some hardware to throw at the problem and another project that's
 bringing me in this direction, so I'd be interested in collaborating.

  Priedhorsky, Reid, et al. Creating, destroying, and restoring value in
 Wikipedia. *Proceedings of the 2007 international ACM conference on
 Supporting group work*. ACM, 2007.
 http://reidster.net/pubs/group282-priedhorsky.pdf:

- Describes Persistent word views which is a measure of value added
per editor.  (IMO, value *actualized*)

  B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella,
 Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content.
 In Proceedings

Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-07 Thread Aaron Halfaker
Hey Max,

There's a class of metrics that might be relevant to your purposes.  I
refer to them as content persistence metrics and wrote up some docs about
how they work including an example.  See
https://meta.wikimedia.org/wiki/Research:Content_persistence.

I gathered a list of papers below to provide a starting point.  I've
included links to open access versions where I could.  These metrics are a
little bit painful to compute due to the computational complexity of diffs,
but I have some hardware to throw at the problem and another project that's
bringing me in this direction, so I'd be interested in collaborating.

Priedhorsky, Reid, et al. Creating, destroying, and restoring value in
Wikipedia. *Proceedings of the 2007 international ACM conference on
Supporting group work*. ACM, 2007.
http://reidster.net/pubs/group282-priedhorsky.pdf:

   - Describes Persistent word views which is a measure of value added
   per editor.  (IMO, value *actualized*)

B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian
Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In
Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM,
New York, NY, USA, , Article 26 , 12 pages.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047rep=rep1type=pdf

   - Describes a complex strategy for assigning trustworthiness to content
   based on implicit review.  See http://wikitrust.soe.ucsc.edu/

Halfaker, A., Kittur, A., Kraut, R.,  Riedl, J. (2009, October). A jury of
your peers: quality, experience and ownership in Wikipedia. In *Proceedings
of the 5th International Symposium on Wikis and Open Collaboration* (p.
15). ACM.
http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfaker09jury-personal.pdf

   - Describes the use of Persistent word revisions per word as a measure
   of article contribution quality.

Halfaker, A., Kittur, A.,  Riedl, J. (2011, October). Don't bite the
newbies: how reverts affect the quantity and quality of Wikipedia
work. In *Proceedings
of the 7th International Symposium on Wikis and Open Collaboration* (pp.
163-172). ACM.
http://www-users.cs.umn.edu/~halfak/publications/Don't_Bite_the_Newbies/halfaker11bite-personal.pdf

   - Describes the use of raw Persistent work revisions as a measure of
   editor productivity
   - Looking back on the study, I think I'd rather use log(# of revisions a
   word persists) * words.

-Aaron


On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) nemow...@gmail.comwrote:

 Sort of related, an ongoing education@ discussion student evaluation
 criteria. http://thread.gmane.org/gmane.org.wikimedia.education/854

 Nemo

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-07 Thread Klein,Max
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I 
became tentative of using bytes or edit counts.

Aaron, in my own search I also noticed you wrote with Geiger. About counting 
edit hour and edit sessions. [1]  Calculating content persistence is a bit too 
heavyweight for me right now since I am trying to submit to ACM Web Science in 
2 weeks (hose CFP was just on this list). The technique looks great though, and 
I would like to help support making a WMFlabs tool that can return this measure.

It seems like I could calculate approximate edit-hours from just looking at 
Special:Contributions timestamps. Is that correct? Would you suggest this route?


[1] 
http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf



Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023



From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Aaron Halfaker 
aaron.halfa...@gmail.com
Sent: Friday, February 07, 2014 7:12 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

Hey Max,

There's a class of metrics that might be relevant to your purposes.  I refer to 
them as content persistence metrics and wrote up some docs about how they 
work including an example.  See 
https://meta.wikimedia.org/wiki/Research:Content_persistence.

I gathered a list of papers below to provide a starting point.  I've included 
links to open access versions where I could.  These metrics are a little bit 
painful to compute due to the computational complexity of diffs, but I have 
some hardware to throw at the problem and another project that's bringing me in 
this direction, so I'd be interested in collaborating.

Priedhorsky, Reid, et al. Creating, destroying, and restoring value in 
Wikipedia. Proceedings of the 2007 international ACM conference on Supporting 
group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:

  *   Describes Persistent word views which is a measure of value added per 
editor.  (IMO, value actualized)

B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, 
and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In 
Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New 
York, NY, USA, , Article 26 , 12 pages. 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047rep=rep1type=pdf

  *   Describes a complex strategy for assigning trustworthiness to content 
based on implicit review.  See http://wikitrust.soe.ucsc.edu/

Halfaker, A., Kittur, A., Kraut, R.,  Riedl, J. (2009, October). A jury of 
your peers: quality, experience and ownership in Wikipedia. In Proceedings of 
the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfaker09jury-personal.pdf

  *   Describes the use of Persistent word revisions per word as a measure of 
article contribution quality.

Halfaker, A., Kittur, A.,  Riedl, J. (2011, October). Don't bite the newbies: 
how reverts affect the quantity and quality of Wikipedia work. In Proceedings 
of the 7th International Symposium on Wikis and Open Collaboration (pp. 
163-172). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/Don't_Bite_the_Newbies/halfaker11bite-personal.pdf

  *   Describes the use of raw Persistent work revisions as a measure of 
editor productivity
  *   Looking back on the study, I think I'd rather use log(# of revisions a 
word persists) * words.

-Aaron


On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) 
nemow...@gmail.commailto:nemow...@gmail.com wrote:
Sort of related, an ongoing education@ discussion student evaluation 
criteria. http://thread.gmane.org/gmane.org.wikimedia.education/854

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-07 Thread ENWP Pine
However, measuring productivity by the difference of the times of first and 
last edits won't do much for those of us who work on pages for hours before 
pressing the save button and only save once. (: It also doesn't measure time 
spent on private wikis or discussions on email and IRC, which also are not 
countable as productivity if you look only at public edit counts and logged 
actions.

I'm assuming that login and logout times on all wikis are not available for 
research use. If they were there would be privacy issues although mitigation is 
possible.

Pine

From: aaron.halfa...@gmail.com
Date: Fri, 7 Feb 2014 17:15:36 -0600
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

I talked to Max on IRC, but I'm pointing here for the lurkers :) 
I think that measuring labor hours via edit sessions is a great idea and I have 
python library to help extract sessions from edit histories.  See 
https://bitbucket.org/halfak/mediawiki-utilities. 


Assuming that you have a list of a user's revisions from the API, using the 
session extractor to build a set of session start and end timestamps for a user 
would look like this:


from mwutil.lib import sessions


# Get your revisions ordered by timestamp# revisions = some API call result


events = (rev['user'], rev['timestamp'], rev) for rev in revisions
for user, session in sessions.sessions(events):

# write out a TSV fileprint \t.join(

str(v) for v in[user, len(session), session[0]['timestamp'], 
session[-1]['timestamp']

)---

On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max kle...@oclc.org wrote:









Thanks Nemo, I'll re-read that discussion. I think that conversation is where I 
became tentative of using bytes or edit counts.



Aaron, in my own search I also noticed you wrote with Geiger. About counting 
edit hour and edit sessions. [1]  Calculating content persistence is a bit too 
heavyweight for me right now since I am trying to submit to ACM Web Science in 
2 weeks (hose CFP was
 just on this list). The technique looks great though, and I would like to help 
support making a WMFlabs tool that can return this measure.



It seems like I could calculate approximate edit-hours from just looking at 
Special:Contributions timestamps. Is that correct? Would you suggest this route?





[1] 
http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf













Maximilian Klein

Wikipedian in Residence, OCLC

+17074787023









From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Aaron Halfaker 
aaron.halfa...@gmail.com



Sent: Friday, February 07, 2014 7:12 AM

To: Research into Wikimedia content and communities

Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
 



Hey Max,



There's a class of metrics that might be relevant to your purposes.  I refer to 
them as content persistence metrics and wrote up some docs about how they work
 including an example.  See 
https://meta.wikimedia.org/wiki/Research:Content_persistence.  





I gathered a list of papers below to provide a starting point.  I've included 
links to open access versions where I could.  These metrics are a little bit 
painful to compute due
 to the computational complexity of diffs, but I have some hardware to throw at 
the problem and another project that's bringing me in this direction, so I'd be 
interested in collaborating. 




Priedhorsky, Reid, et al. Creating, destroying, and restoring value in 
Wikipedia. Proceedings of the 2007 international ACM conference on Supporting 
group work.
 ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:



Describes Persistent word views which is a measure of value added per editor. 
 (IMO, value
actualized)

B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, 
and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In 
Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New 
York, NY, USA, , Article
 26 , 12 pages. 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047rep=rep1type=pdf


Describes a complex strategy for assigning trustworthiness to content based on 
implicit review.  See http://wikitrust.soe.ucsc.edu/




Halfaker, A., Kittur, A., Kraut, R.,  Riedl, J. (2009, October). A jury of 
your peers: quality, experience and ownership in Wikipedia. In Proceedings
 of the 5th International Symposium on Wikis and Open Collaboration (p. 15). 
ACM. 
http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfaker09jury-personal.pdf








Describes the use of Persistent word revisions per word as a measure of 
article contribution quality.

Halfaker, A., Kittur, A.,  Riedl, J. (2011, October). Don't bite the newbies: 
how reverts affect

Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-06 Thread Federico Leva (Nemo)
Sort of related, an ongoing education@ discussion student evaluation 
criteria. http://thread.gmane.org/gmane.org.wikimedia.education/854


Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l