[Wiki-research-l] Re: Wikimedia natural experiments

2022-04-26 Thread Nate TeBlunthuis
Thanks for compiling these Isaac! Public domain day seems like a 
particularly interesting one to look into more deeply.


--

Nate

On 4/26/22 11:14, Isaac Johnson wrote:

As part of some discussions with Jim Maddock (see his recent thread
<https://lists.wikimedia.org/hyperkitty/list/wiki-research-l@lists.wikimedia.org/thread/INT2D2XGJ4YO2NVYBNKOOQMVI6OXKG32/>),
I started a meta page to catalog some of the many natural experiments
<https://en.wikipedia.org/wiki/Natural_experiment> that impact the
Wikimedia projects. I've seeded it with some examples (censorship,
Wikipedia Zero, lockdowns, Public Domain Day, new data centers, etc.) but
am certain that there are many more out there. If folks on this list know
of other examples, please edit the page below boldly or respond on thread
and I'll migrate examples onto the page.

https://meta.wikimedia.org/wiki/Research:Natural_experiments

Best,
Isaac


--
Nate TeBlunthuis PhD Candidate Department of Communication University of 
Washington https://teblunthuis.cc

___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


Re: [Wiki-research-l] How to quantifying "effort" or "time spent" put into articles?

2020-10-20 Thread Nate TeBlunthuis
Thanks everyone for sharing your ideas. I really appreciate you taking 
the time! This discussion raised a lot of useful points.


I think that as a first approximation, an edit-sessions approach seems 
okay. I think it is reasonable for the kind of purposes I have in mind 
to ignore time spent incidental to an edit (but not directly spent 
editing) like walking to a building or reading an article before 
beginning to edit. Cases like adding references or images from commons 
seem potentially important. Something like what Isaac or Aaron suggested 
like using a model to better estimate the amount of time that different 
kinds of edit takes (potentially using task categories, links, images, 
text diff metrics, references as features) would be a good but 
higher-effort measurement approach. Ignoring obvious vandalism is 
obviously an important step.


--

Nate


On 10/20/20 3:28 PM, Isaac Johnson wrote:

Thanks for raising this question Nate! Really interested in this
discussion. Another option to throw into the mix though it would require a
fair bit of work:

The Growth team put together a taxonomy of tasks that editors do and their
perceived difficulty level:
https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks#/media/File:Newcomer_tasks_-_Difficulty_filters.png
I'm not sure how complete the taxonomy is, but you could:

- come up with a complete-ish taxonomy of edit types (another option to
consider: https://github.com/diyiy/Wiki_Semantic_Intention)
- assign each edit type a general difficulty level and time estimate
(hopefully backed up with some empirical data from edit session data or
which user groups engage in a given type of edit though for all the reasons
mentioned by you and others, that can be really hard to calculate)
- build detectors for each type of edit (unfortunately this is going to
require parsing a lot of wikitext but hopefully you can simply do things
like just compare the count # of links, images, templates, etc. in the
previous revision and current revision with mwparserfromhell)
- classify each edit based on what changes it had to the difficulty
level and therefore estimated time/expertise involved.

Alternatively, you could just count up # of links etc. for the current
version of the page and multiply each link etc. by estimated time to add.
This would be highly conservative though because it would miss all the
collaboration / updating / adjusting / etc. so it would be more of an
estimate of minimum time to build a page.

On Tue, Oct 20, 2020 at 5:43 PM Ziko van Dijk  wrote:


Hello Nate,

Thank you for your interesting question, and thank you for your paper
with Shaw and Mako Hill 2018 on the rise and decline of populations.

Your endeavour seems to be most difficult and hardly possible. My
thinking would be the following: there are certain patterns behind an
edit, or: editing activity. For example, imagine someone who reads an
article and corrects some minor typos and linguistic issues on the
going. How long is the article, how long may it take to read it? How
long may it take to make those edits (or, one big edit)?

On the one hand, you may ask editors or observe them to find out how
much time they need for this kind of activity. On the other hand, you
may try to find this pattern back in certain characteristics of the
edit (edit of the whole page; small changes of letters at several
locations of the text).

It would be a philosophical question what is exactly part of the
editing activity. If I read a whole article for my own purposes, as a
reader, without intention to edit, and then I find a small error and
quickly correct it - does that make my whole reading of the article a
part of my editing activity? I would have read the article anyway.

There would be many other patterns. E.g., someone adds a picture. How
much time this takes, that depends on whether the editor has searched
for it on Commons, or took the same one he found in a different
language version. So, if the picture appears in other language
versions, you assume that the editor needed 10 minutes to find it, and
otherwise, that he needed only two minutes to find the picture on a
different language version?

A last example: On a meeting of administrators I remember an admin
explaining that dealing with one vandalism report on the list of
incidents costs him for about half an hour. Maybe a useful starting
point for further considerations?

Good luck, and kind regards
Ziko










Am Di., 20. Okt. 2020 um 23:18 Uhr schrieb Su-Laine Brodsky
:

Further to Joan’s comment, there are some other ways to stratify edits:

- Whether an edit is vandalism, a vandalism revert, an “actual" change.

Vandal edits and reverts are both quick compared to good-faith additions
and changes. Heavily vandalized articles will have long edit histories,
even though sometimes not much effort was put into them.

- Whether the edit was made by a human or bot.

- Whether a