Re: [Wiki-research-l] Power law and contributions:

2020-01-23 Thread Kerry Raymond
If anyone was wondering, I was not confusing power law with power dynamics, but 
Jan's original question talked about community health, redistribution of power 
and things like training, user friendliness and documentation as strategies,  
so I assumed power dynamics were in play in the conversation.

The power in Wikipedia is held by administrators and other functionaries, the 
loudest and most persistent (and most willing to canvass openly or off-wiki) in 
consensus building and, as I have already argued, in the latitude given to the 
*mass* of very occasional contributors (whether good faith or bad faith) to do 
bad or low quality edits which others have to deal with. 

If you redefine power in other ways, such as impact (or influence) on readers 
(those who we serve), then of course the more active users do have the 
*potential* for that power provided their contributions are made to 
creating/expanding content rather than fiddling with existing content (most 
edits are "fiddling"). An occasional contributor has much more limited power 
wrt to impacting/influence the reader (probably disproportionately lower that 
their number of edits would suggest as they are more likely to be reverted).

But equally not all very active editors get to shape the reader view. If you 
look at the activities of the top editors by edit count, they tend to do a lot 
of very repetitive and arguably more administratively-focused edits with the 
reorganisation of category system being a major activity. Studies of readers 
show they don't look much beyond the References and hence aren't looking at the 
categories so if power is about reader impact/influence, then this group have 
very little power relative to their number of contributions.

If we talk about community health in Wikipedia (specifically English 
Wikipedia), we all know it's a massive problem and somewhat independent of 
power (by any definition).  It's an abrasive environment with far more 
criticism than praise/appreciation across the board. Active contributors 
regularly burn-outand dealing with "the community" is often given as a reason. 
While there is always one final issue that is the straw that breaks the camel's 
back, it's rarely just about that issue but a level of frustration that 
develops over a long time. Good faith newcomers get turned off by bad initial 
experiences. Unfortunately this group mean well but often make bad edits. I do 
outreach to new good-faith contributors in my topic space, but it is a 
WikiProject Australia message delivered by Twinkle (nobody has time to write 
personal messages to new contributors each day) but I do try to ask them where 
they got their information from (failure to cite being a big problem with this 
group) but rarely do they reply or make further edits.

Jan mentioned training. I also do outreach which is mostly face-to-face edit 
training and supporting editathons (generally working with a library or 
university as the partner organisation) so I do a lot of work with new users in 
face-to-face situations, but for all my efforts (for which the feedback is 
always very positive), these new users rarely contribute again after these 
sessions and this experience is common to most people doing outreach work, 
leading to the belief that "Wikipedians are born not made". There are efforts 
already taking place to provide online training or on-boarding systems 
(currently being trialled in some other language Wikipedias) but, even if shown 
to be effective, I don't think there is much likelihood of mandating any of 
such things on English Wikipedia with its strong libertarian ideology and most 
new users are "on a mission" to make a particular change/addition to an 
article. I don't think most of them will voluntarily do some kind of training 
or on-boarding first (people on a mission are not easy to deflect in general). 
I like to believe training in some forms helps individuals but at that end of 
the power graph the effort/return on individuals is poor. To work with that 
mass of folk it must be scalable and that tends to rule out anything personal 
(like a buddy system).

As to user friendliness, there isn't a lot of it on Wikipedia. I recollect 
someone did a study to see if welcome messages helped improve newcomer 
retention and found they didn't. Indeed our watchlist/welcome system can easily 
be perceived by new users as stalking. While a welcome message is intended to 
be encouraging, it does at the same time send the message "I am watching you" 
which has been described by some new users who receive welcome messages during 
my training sessions as "creepy". As someone who sends such messages (via 
Twinkle, there aren't the hours in the day to welcome new users in a more 
personal way), some of the responses are "creepy" (clearly they looked at my 
user account and knew I was a woman and felt it was OK to make some 
inappropriate remark). There are friendships of course between some users, but 
you don't come

Re: [Wiki-research-l] Power law and contributions:

2020-01-22 Thread Kerry Raymond
As someone who would qualify as a "very active editor"

https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits

I can honestly say that power and activity are definitely not the same thing on 
Wikipedia.  

Do I have power? I don't think so. I am not an administrator or other 
functionary that has power over anyone else.

As a person who is principally a content writer, I get my time wasted every day 
by vandals, by content cited to reliable sources being removed by someone who 
simply doesn't agree with it but provides no sources to the contrary buts 
simply writes "Fact!" as an edit summary, that I have to explain to yet another 
American that we spell things differently in Australia and that is why there is 
a {{Use Australian English}} template on the top of that article, that "City of 
Brisbane" cannot be changed as "Brisbane City" as they are NOT the same thing 
(one is a local government area, the other a suburb, one about 100 times the 
area of the other) even if they do happen to "look like the same thing" or 
"think it reads better than way". I wish I did have the power to just "whack a 
mole" and NOT have to have these *same* conversations over and over and over 
again with me being WP:CIVIL and them often being not civil (some even track me 
down in real life and send me abusive e-mail off-wiki, including sexual remarks 
because I'm a self-identified female contributor). But in Wikipedia, that's OK 
because ArbCom decided that calling a female contributor "a cunt" isn't that 
bad. It's Wikipedia not Wokepedia! If I share the contents of that email 
on-wiki, I'm the one in trouble (their right to privacy), so I just delete 
them. If I spot a user name whitewashing a politican's article that just 
happens to be very similar indeed to the real life name of their media advisor, 
I cannot say that on-wiki, because that's WP:OUTING.

My "community health" is pretty damn poor precisely because we give the same 
power to every first time anonymous editor as we do to very active editors and 
we give it effectively to the most persistent and the most unpleasant. BRD is 
all very well if all involved are seriously trying to get the content right and 
well-cited. It fails completely when the other party is not engaging with it, 
being unpleasant, or just returning time and time again to re-do a problematic 
edit based on "I know this". We have problems with acts of vandalism that get 
repeated time and time again by a series of different IP addresses. This is 
impossible to block, we have no solution for it. If you want to see the scale 
of it, there's series of IP addresses that collectively exhibit similar 
patterns of thousands of problematic edits in my topic space going back to at 
least 2013 and were still active in 2019

https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup

Do we have the power to "whack a mole" the first time we see any of these 
behaviour YET AGAIN? No, we don't. We have a lot of tedious process of having 
to find the right admin noticeboard, submit a request with the right templates, 
provide endless diffs, and then have nothing happen. We make it easy for people 
to create problems, but extremely difficult to get them stopped and incredibly 
tedious to clean up after them (you often can't "undo" because of intervening 
edits etc and these folk can do 100s of edits in a day). Here's one:

https://xtools.wmflabs.org/ec/en.wikipedia.org/Shelati

An editor who did a mass change over every suburb of Sydney over a couple days. 
I suspected them immediately as being a sockpuppet (behaviour was 
characteristic of  "sockpuppet") but unless you can identify the sockmaster, 
you can't report it. So, instead the changes being made were discussed on the 
appropriate topic noticeboards, disagreed with, but then the editor was blocked 
by someone who figured out who the sockmaster was (a sockmaster dating back to 
2009). The account was blocked, but the problematic edits have never been 
cleaned up.

Most active contributors who retire do so because of the behaviour of other 
"contributors" wears them down.

In summary, power in Wikipedia is not where you think it is on the curve. It is 
the power we give to the many people to do the same vandalism, the same "meant 
well but I'm stupid" edits, the same "I don't know any policies and they don't 
apply to me anyway" edits, and the sockpuppets and conflict-of-interest editors 
 who carefully hide themselves among them.

I wish I had just a little power to exercise in topic spaces where I am 
knowledgeable and have a long history of positive contribution. I don't want it 
for baseball players or Icelandic musicians or Pokemon characters, just for 
Queensland history and geography. That's all I ask.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jan Dittrich
Sent: Wednesday, 22 January 2020 8:31 PM
To: Wiki Research-l 
Subject: [Wiki-research-

Re: [Wiki-research-l] New dataset of articles tagged by WikiProjects

2020-01-15 Thread Kerry Raymond
Out of idle curiosity ...

Are there significant numbers of articles NOT tagged by any WikiProject? In my 
experience on-wiki, any article (apart from ones recently created) are tagged 
by one or more WikiProjects. 

I guess the converse question is what articles are the most tagged by 
WikiProjects? I am often surprised at how many WikiProjects jump in to tag some 
article I have created (I am more likely to notice the tagging of articles I 
create because they automatically go on my watchlist).

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Isaac Johnson
Sent: Thursday, 16 January 2020 6:54 AM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] New dataset of articles tagged by WikiProjects

Hey Research Community,
TL;DR New dataset:
https://figshare.com/articles/Wikipedia_Articles_and_Associated_WikiProject_Templates/10248344

More details:

I wanted to notify everyone that we have published a dataset of the articles on 
English Wikipedia that have been tagged by WikiProjects [1] through templates 
on their associated talk pages. We are not planning to make this an ongoing 
release, but I have provided the script that I used to generate it in the 
Figshare item so that others might update / adjust to meet their needs.

As anyone who has done research on WikiProjects knows, it can be complicated to 
determine what articles fit under a particular WikiProject's purview. The 
motivation for generating this dataset was to support our work in developing 
topic models for Wikipedia (see [2] for an overview), but we imagine that there 
are many other ways in which this dataset might be
useful:

* Previous work has examined how active WikiProjects are based on edits to 
their pages in the Wikipedia namespace. This dataset makes it much easier to 
identify which Wikiprojects are managing the most valuable articles on 
Wikipedia (in terms of quality or pageviews).

* Many topic-level analyses of Wikipedia rely on the category network.
Categories can be very messy and difficult to work with, but WikiProjects 
represent an alternative that often is simpler and still quite rich. For 
instance, this could be used for temporal analyses of article quality, demand, 
or distribution by topic.

* While WikiProjects are English-only and therefore limited in their utility to 
other languages, we also provide the Wikidata ID and sitelinks
-- i.e. titles for corresponding articles in other languages -- to allow for 
multilingual analyses. This could be used to compare gaps in coverage
-- e.g., akin to past work that has used categories [3].

The main challenge, besides processing time, is how to 1) effectively extract 
the WikiProject templates from talk pages, and, 2) consistently link them to a 
canonical WikiProject name and topic. For example, the canonical template for 
WikiProject Medicine is 
https://en.wikipedia.org/wiki/Template:WikiProject_Medicine but another one 
used is https://en.wikipedia.org/w/index.php?title=Template:WPMED&redirect=no 
(and there are 13 more). To capture articles tagged with these many templates 
and all link them to the same canonical WikiProject and eventually higher-level 
topic, we built a near-complete list of WikiProjects based on the WikiProject 
Directory [4] and gathered all of their associated templates. We purposefully 
excluded WikiProjects under the assistance / maintenance category [5]. When 
parsing talk pages from the dump files then, we check for any of these 
templates and list them under their canonical name. As a backup, we also employ 
case-insensitive string matching with "WP" and "WikiProject", which helps to 
guarantee that we did not miss any WikiProjects but introduces a number of 
false positives as well. If you wish to map the WikiProjects listed in the 
dataset to their higher-level topics, the mapping is in the figshare item and 
code that allows you to do that can be found here:
https://github.com/wikimedia/drafttopic/blob/master/drafttopic/utilities/taxo_label.py


[1] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council

[2] https://dl.acm.org/doi/10.1145/3274290

[3]
https://meta.wikimedia.org/wiki/Research:Newsletter/2019/September#Wikipedia_Topic_Assessment


[4] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Directory
[5]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Council/Directory/Wikipedia

Best,
Isaac

--
Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] Active meta users v active wikimedia users

2020-01-06 Thread Kerry Raymond
I think we need to step back from the "how" for a moment in order to better 
understand the "why".

RhinosF1, what is the original question that you are really asking that 
resulted in your original request? It may be that there are other ways to 
answer your question than to obtain those particular statistics.

For example, my guess is that your original question might be along the lines 
of "is Wikimedia driven top-down or bottom-up"? Which might be answered by 
looking at who is active on meta and who is active on the projects themselves 
and seeing how they intersect. If so, that assumption of intersection may be 
flawed if users are operating with different user accounts on meta to other 
platforms. This definitely occurs with WMF staff who often have two separate 
accounts for their work role vs their personal contributions and also for 
people active in the chapters/user groups/outreach where they need to operate 
with a real world name (real world engagement tends to need real names), but 
prefer a pseudonym for their personal contributions.

If you can tell us more about what you are trying to achieve, we may be able to 
better assist you.

Kerry



-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jonathan Morgan
Sent: Tuesday, 7 January 2020 6:58 AM
To: Wiki Research-l 
Cc: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics. 
Subject: Re: [Wiki-research-l] [Analytics] Active meta users v active wikimedia 
users

(Last reply to both lists; sorry for the spam)

This sounds like it'd be a bit of work to build, and I don't think there are 
curated datasets to help out. I think you would need to...

1. get the count of active editors on Meta for [PERIOD OF TIME]. Easy.
2. perform a query or parse dumps to get the *list *of active editors from 
every individual Wikimedia project for the same [PERIOD OF TIME]. Hard.
3. de-duplicate that list (since many people edit multiple wikis in a given 
say, month, and you don't want to overcount). Pretty easy.
4. compare the resulting all-projects count with the Meta-only count. Easy.

This sounds like a lot of work to me! Again, there might be tools or resources 
for this that already exist, but I'm not aware of them.

It seems like having topline/platform-level counts for active editors could be 
useful, as a dashboard or a public dataset. You might try requesting this as a 
feature 

for WikiStats. The worst they can say is "no", or "not yet" :)

- J


On Mon, Jan 6, 2020 at 12:34 PM RhinosF1 -  wrote:

> Hi,
>
> I’ve just seen the replies and thanks to everyone whose replied.
>
> I was looking to try and work out what percent lf the active wikimedia 
> community are participating on meta and comparing to another wiki 
> farm. Any thoughts on that?
>
> RhinosF1
>
> On Mon, 6 Jan 2020 at 20:31, Aaron Halfaker 
> wrote:
>
> > It doesn't look like Active Editors works for all wikis.  I think 
> > you'd have to merge activity across all wikis to get a stat like 
> > that. I'm not sure I know of a good data strategy to get that.
> >
> > If you were to query it with quarry, you'd need to write a query for
> every
> > wiki and then write some code to merge the results.  Oof.
> >
> > If you to extract it from the XML dumps, you'd need to process each 
> > Wiki separately and then merge the results.  Oof.
> >
> > The best solution to this is to have a common table/relation across 
> > all Wikis and to aggregate from there.  I don't think there's any 
> > such cross-wiki table/relation available.
> >
> > On Mon, Jan 6, 2020 at 1:38 PM Jonathan Morgan 
> > 
> > wrote:
> >
> > > Same dashboard, but for "All wikis":
> > > https://stats.wikimedia.org/v2/#/all-projects
> > >
> > > That work?
> > >
> > > - J
> > >
> > > On Mon, Jan 6, 2020 at 11:32 AM RhinosF1 -  wrote:
> > >
> > > > Hi,
> > > >
> > > > That provides active users for meta but not globally. Anything 
> > > > for
> > > global?
> > > >
> > > > RhinosF1
> > > >
> > > > On Mon, 6 Jan 2020 at 18:10, Jonathan Morgan 
> > > > 
> > > > wrote:
> > > >
> > > > > RhinosF1,
> > > > >
> > > > > Are you looking for information like this 
> > > > > , or
> something
> > > > > different?
> > > > >
> > > > > - J
> > > > >
> > > > > On Mon, Jan 6, 2020 at 8:51 AM RhinosF1 - 
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Does anyone know a way to find out how many  wikimedia users 
> > > > > > are
> > > active
> > > > > > globally compared to active on metawiki?
> > > > > >
> > > > > > This mean they've made more than 5 edits in the last 30 days 
> > > > > > for
> > > this.
> > > > > >
> > > > > > Thanks,
> > > > > > RhinosF1
> > > > > > ___
> > > > > > Analytics mailing 

Re: [Wiki-research-l] Does wikipedians feel like commoners ?

2019-12-09 Thread Kerry Raymond
I think Wikipedians are Wikipedians for a variety of reasons. Some of them are 
more altruistic (free knowledge), others are more personal (but not necessarily 
negative), others reasons are negative (pushing a point of view, advertising, 
vandalism, stroking their own ego, etc).

But if we look at the group that are more-or-less altruistic, I sincerely doubt 
that we can attribute their motivation to a particular author, open source 
software or whatever. I've been well-aware of Richard Stallman, open source 
software, open research data, etc,  from before being a Wikipedian but those 
things didn't cause me to become a Wikipedian. I think most Wikipedians are 
simply people who can see that knowledge empowers people and enables them to 
live better lives, build a better society etc, and think that Wikipedia is 
therefore beneficial to the world and something they feel able to contribute 
to.  I don't think most of them would think themselves as "commoners", indeed I 
think most of them would be thinking you were talking about contributors to 
Wikimedia Commons rather than the meaning you intend.

It simply makes good sense to contribute to Wikipedia.

Kerry
  


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Why is Wikipedia still so often frowned upon in academic circles?

2019-12-03 Thread Kerry Raymond
Thanks for initiating this interesting conversation with your paper, Darius.

As a retired professor and researcher and now active Wikipedian, I have a foot 
in both camps.

Wearing my academic hat, the concerns I have are the ease of vandalism, the 
risk of subtle vandalism (I agree obvious vandalism will be recognised as such 
by the reader), how quickly a Wikipedia article can change from good  to bad, 
neutral to biased etc. Although as an insider to Wikipedia, I know about the 
Cluebot, the Recent Change Patrol, watchlists, etc, but to the outside world 
there does not appear to be any system of review, and I would have to admit 
that our methods of detecting vandalism are far from perfect. When I go away on 
holidays, particularly if I don't take my laptop, I stop watching my watchlist. 
Then when I get home and try to catch up on my watchlist (an enormous task), I 
am stunned to find vandalism some weeks old in articles. Am I the only active 
user watching that article? It would seem so. We have a tool (left-hand tool 
bar when you are looking at any article in desktop mode) that reports how many 
users (but not which users) are watching an article but for privacy no value is 
reported if there are less than 30 watchers (it says "less than 30").  Yet what 
difference does it make if there are 51 or 61 watchers or "less than 30" if the 
users are inactive or are active but not checking their watchlist. Since none 
of us (except developers) can gain access to the list of users watching any 
page, we have no way of measuring how many articles are being checked by others 
following changes, how quickly are they checked or are they checked it all? So 
I think we need a better "reviewing" system and one more visible to the reader 
if we want to gain respectability in academic circles. We also need to prevent 
as much vandalism as we can (why do we have "5 strikes until you are blocked" 
policy?, let's make zero tolerance, one obvious vandalism and you're blocked).

My 2nd point of difference is this. When I publish an academic paper, I put my 
real name and my institution name on it, and with that I am risking my real 
world reputation and also that of my institution. That's a powerful motivator 
to get it right. What risk does User:Blogwort432 take to their real world 
reputation? Generally none. The user name is not their real name. Even if 
blocked or banned, we know they can pop up again with a new user name or be one 
of the myriad IP addresses who contribute. One of the reasons I edit with my 
real name is precisely because I put my real world reputation on the line 
(assuming you believe my user name is my real name of course) and that's a 
powerful motivator for me to write good content AND to be civil in discussions. 
It's easy to be the opposite when you hide behind the cloak of a 
randomly-chosen user name or IP address. Also real world identities are more 
able to be checked for conflict of interest or paid editing ("so you work for 
XYZ Corp and you've just added some lavish praise to the XYZ article, hmm"). I 
think we would have a lot more credibility if we moved to having real world 
user names (optionally verified) and were encouraged to add a short CV (which 
is currently discouraged) so your credibility as a contributor could be 
assessed by readers.

3rd point. Many academics have attempted to edit Wikipedia articles and got 
their edits reverted with the usual unfriendly warnings on their User Talk 
page. When they reply (often stating that they are an expert in this field or 
whatever claim they make), they usually get a very unfriendly reaction to such 
statements. I can't imagine that academics who have tried to contribute to 
Wikipedia and experienced hostility or seen their edits reverted for reasons 
they did not understand or did not agree with are likely to run around saying 
that Wikipedia is as good as the academic literature.

I think if we want to turn around academic perception, we need to:

1. make academics welcome on Wikipedia (apart from the usual conflict of 
interests)
2. as many contributors as possible should be real-world verified and invited 
to upload their CV or link to one on another site (if we don't want them on 
Wikipedia User pages)
3. demonstrate we have a comprehensive, fast and effective review of 
changed/new content -- wouldn't be good if we could point to an edit in the 
article history and see who reviewed it and how quickly that happened (and have 
gross statistics on how many reviewed, how quickly, and tools that tell us what 
articles aren't being properly reviewed, etc),
4. eliminate vandalism (well, reduce it substantially)

Or at least demonstrate we are moving towards these goals.

Personally I think some of the "norms" of Wikipedia may have served us well in 
the early 2000s but don't serve us so well today.  To my mind moving towards 
real-world named accounts and then real-world verified accounts as a "norm" 
will make us better contribut

Re: [Wiki-research-l] [Data Release] Active Editors by country

2019-11-07 Thread Kerry Raymond
What is the definition of active editor in this data? 5 per month?

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Dan Andreescu
Sent: Friday, 8 November 2019 5:17 AM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] [Data Release] Active Editors by country

Today we are releasing a new dataset meant to help us understand the impact of 
grants and programs on editing.  This data was requested several years ago, and 
we took a long time to bring in the privacy and security experts whose help we 
needed to release it.  With that work done, you can download the data here: 
https://dumps.wikimedia.org/other/geoeditors/ and read about it here:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Public

You can send questions or comments on this thread or on the discussion page.
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Generalizability of research across different language versions

2019-10-08 Thread Kerry Raymond
Perhaps off-topic here, but when all you have is a hammer, everything looks 
like a nail ...

In the case of Wikipedia, we use templates and tracking categories as a poor 
man's solution to having any actual support for workflows and dashboards to 
manage processes. While phabricator is not great, it's still a step in the 
right direction.

When I run large projects like our heritage register article rollout, I use 
spreadsheets held on Google Drive as it is easier to collaborate that way than 
on-wiki for a couple of really simple reasons:

Wikipedia tables can't be manipulated like spreadsheets (e.g. queries like 
"which heritage entries are currently without an infobox photo and in the City 
of Sydney"). You can't store article drafts on Wikipedia in any name space 
because of the categories in them.

Oh and we use email to collaborate on these projects because Talk is useless 
and frankly you don't need the peanut gallery looking on and wasting everyone's 
time. There are plenty of people who love to demand how others should implement 
a project despite having no intention to actually contribute to the work of the 
project. I think we should have some sort of rule on Wikipedia that you can't 
write more bytes on Talk pages than you've written in article content :-)

So I think the small Wikipedias should be careful what they wish for when it 
comes to templates ... I got told off the other day for not having used the 
right presentation for an edit war report (I was a bystander not involved in a 
set of edit wars occurring across a large group of article). My reaction was 
"fine, I won't bother to report one again". Therein lie the dangers of using 
templates for business processes. 

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Amir E. Aharoni
Sent: Friday, 4 October 2019 12:23 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Generalizability of research across different 
language versions

Thanks a lot for bringing this up.

Sorry for not offering a solution, but I do want to mention a frequently-missed 
aspect of the problem: Wikis in different languages have some differences that 
are understandable because they reflect some objective cultural characteristics 
of the people who speak it. But some differences are artificial and exit 
because in the early days of Wikimedia
(mid-2000s) there were no convenient ways for wikis to communicate and share 
info. There were no global accounts and no convenient translation tools.

Templates are still not global, even though there is huge demand for it,[1] and 
a lot of community process are implemented using templates: requests for 
deletion, requests for unblocking, article sorting for WikiProjects, stub 
sorting. Many of these things could be unified, at least partially, by making 
templates global, and among many benefits, it would make research easier, too.

[1] It came at #3 in the Community Wishlist vote in 2015, and at #1 in 2016. 
Despite this demand, it was not implemented :(

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com 
‪“We're living in pieces, I want to live in peace.” – T. Moore‬


‫בתאריך יום ד׳, 2 באוק׳ 2019 ב-14:37 מאת ‪Jan Dittrich‬‏ <‪ 
jan.dittr...@wikimedia.de‬‏>:‬

> Hello  researchers,
>
>  A lot of research on Wikipedia is published in English and also uses 
> the English Wikipedia as source of data or researchers get their 
> participants via English Wikipedia [0].
>
> A frequent criticism I meet when discussing such research with 
> non-en.wp community members is that their Wikipedia is different and 
> the results of en.wp base research are problematic/incomparable/totally 
> useless.
>
> So I want to ask:
> - Do you know of research comparing different Wikis, preferably across 
> language versions? [1]
> - How would you deal with such criticism, particularly of the "if it 
> is not about 'my' wp it is useless"-kind [2]?
>
> Kind Regards,
>  Jan
>
> 
> [0] Plausible due to academi fields, particularly Computer Science, 
> publishing mainly in english, size and WMF as actor being US-based.
> [1] I know of »revisiting "The Rise and Decline" in a Population of 
> Peer Production Projects« 
> (https://dl.acm.org/citation.cfm?id=3173929),
> comparing different Wikia-Wikis; Research like "limits of 
> self-organization" (https://firstmonday.org/article/view/1405/1323) 
> that refer to general principles of peer production. Comparisons of 
> Wikipedias across languages and the impact of their different 
> contexts, languages and regulations would be very interesting to me.
> [2] I'm aware that making heterogeneous things comparable is seen as a 
> core academic/scientific activity in STS research (Law, SL Star, 
> Turnbull…) so I do not want to say, transfer to a different setting is 
> not a problem – but it is certainly not "totally useless" either.
>
> --
> Jan Dittrich
> UX Design

Re: [Wiki-research-l] gender balance of Wikipedia citations

2019-08-31 Thread Kerry Raymond
Does it expand an existing citation that someone else has created with "et al" 
which is the scenario here? My experience of it is that I can use it to expand 
an pre-existing naked URL citation (in some cases, exceptions being PDFs) but 
I've never seen a way to use it expand a partial citation to a more fullsome 
one.

Kerry 

Sent from my iPad

> On 29 Aug 2019, at 10:35 pm, Federico Leva (Nemo)  wrote:
> 
> Kerry Raymond, 29/08/19 01:26:
> > So I think a specific tag to encourage the expansion of "Bloggs et al"
> > citations to full author listings might work.
> 
> But it's easier to fix it yourself, using the citation bot:
> https://en.wikipedia.org/wiki/WP:UCB
> 
> Greg, 30/08/19 07:48:
>> If the Wikipedia
>> community is not studying its biases and designing tools and strategies for
>> addressing them, it is not reflecting the world, but lagging behind it.
> 
> However, going back to Kerry:
> 
> > In some ways, I think a better solution might be to try to get Google
> > scholar interested in the issue of gender.
> 
> I'm not aware of studies of gender bias in Google Scholar search results 
> themselves, yet we'd really need such basic information before going into 
> specifics of how the research is consumed and redistributed. There is a 
> mention of gender in https://oadoi.org/10.1017/S104909651800094 which states
> 
> > Moreover, because a GS pro-
> > file is a public signal, it can have a disproportionate effect on
> > opinions because a person seeing it knows that others also see
> > it (Chwe 2016).
> 
> Which seems to me an argument very similar to yours on Wikipedia.
> 
> Federico
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] gender balance of Wikipedia citations

2019-08-28 Thread Kerry Raymond
 that have been floated in this thread to this point? Or are you 
> > thinking
> of
> > something else?
> >
> > Greg
> >
> > On Mon, Aug 26, 2019 at 5:00 AM <
> > wiki-research-l-requ...@lists.wikimedia.org>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > > wiki-research-l@lists.wikimedia.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > 
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > > wiki-research-l-requ...@lists.wikimedia.org
> > >
> > > You can reach the person managing the list at
> > > wiki-research-l-ow...@lists.wikimedia.org
> > >
> > > When replying, please edit your Subject line so it is more 
> > > specific than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > >1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> > >2. Re: gender balance of Wikipedia citations (Greg)
> > >3. Re: sockpuppets and how to find them sooner (Federico Leva
> (Nemo))
> > >4. Re: gender balance of Wikipedia citations (Jane Darnell)
> > >5. Re: gender balance of wikipedia citations (Federico Leva 
> > > (Nemo))
> > >
> > >
> > > --
> > > 
> > >
> > > Message: 1
> > > Date: Sun, 25 Aug 2019 14:28:25 +0100
> > > From: WereSpielChequers 
> > > To: Research into Wikimedia content and communities
> > > 
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia 
> > > citations
> > > Message-ID:
> > >  > > shnonh...@mail.gmail.com>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi Greg,
> > >
> > > One of the major step changes in the early growth of the English
> > Wikipedia
> > > was when a bot called RamBot created stub articles on US places. I
> think
> > > they were cited to the census. Others have created articles on 
> > > rivers
> in
> > > countries and various other topics by similar programmatic means.
> > Nowadays
> > > such article creation is unlikely to get consensus on the English 
> > > Wikipedia, but there are some languages which are very open to 
> > > such creations and have them by the million.
> > >
> > > I'm not sure if the fastest updating of existing articles is 
> > > automated
> or
> > > just semiautomated. But looking at the bot requests page, it 
> > > certainly looks like some people are running such maintenance bots 
> > > "updating GDP
> by
> > > country" is a current bot request.
> > > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
> > >
> > > I'm not sure how "the ease of a source for purposes of converting 
> > > into
> a
> > > table and generating a separate article for each row" relates to
> gender.
> > > But i suspect "number of times cited in wikipedia" deserves less 
> > > kudos
> > than
> > > "number of times cited in academia".
> > >
> > > WSC
> > >
> > > On Sun, 25 Aug 2019 at 05:22, Greg  wrote:
> > >
> > > > Thanks again, Kerry. I am hoping that someone with access to 
> > > > more
> > > resources
> > > > (knowledge, support, etc) than I have will look into this.
> > > >
> > > > A few more thoughts/questions:
> > > >
> > > > 1. The link to the citation dataset from the Medium article 
> > > > ("What
> are
> > > the
> > > > ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> > > > 2. As far as I can tell, every named author in the top ten most 
> > > > cited sources on Wikipedia is male. One piece is by a working 
> > > > group 3. This line from the Medium piece struck me: "Many of 
> > > > these
> > publications
> > > > have been cited by Wikipedians across large series of articles 
> > > > using powerful bots and automated tools."
> > > >
> > > > Are citations being added by bots? I'm not sure that I 
> > > > understand
> that
> > &g

Re: [Wiki-research-l] gender balance of Wikipedia citations

2019-08-24 Thread Kerry Raymond
I am inclined to think that political science has more Point of View in it than 
say chemistry. I also suspect it has fewer authors per book/paper. So I can 
imagine that people citing political science literature may be more inclined to 
cherry pick the sources that support their own POV which may involve some 
gender bias in some way. I would think it less likely in chemistry to cherry 
pick sources (which is not to say there are no divided schools of thought in 
chemistry but it is a more experimental discipline with strong commitment to 
factual data and less to opinion).

But having said all that, whether and in what circumstances that the selection 
of sources in Wikipedia might be sex/gender biased, I honestly don't know. But 
if it manifests outside of Wikipedia as you suggest, then I would be very 
surprised if it wasn't replicated in Wikipedia to some extent. But I guess your 
question is whether is Wikipedia merely reflects the society it lives in 
(similar levels of gender bias) or whether there is something about Wikipedia 
which acerbates or ameliorates the situation? I am genuine curious what a small 
study would discover and agree that replicating (as much as possible) the 
existing study outside of Wikipedia) provides a good starting point. You might 
approach the authors of that study to see if they are willing to collaborate on 
such a project, either in design, data sharing or more fully. I look forward to 
seeing the results.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Greg
Sent: Friday, 23 August 2019 5:01 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations

Wow, Kerry! Thank you for taking the time to write all these thoughts out.

I'm asking the question because I'm concerned that the gender balance of the 
authors being cited on wikipedia is different from the already quite bad 
patterns in academia. My fear is that the citation gender imbalance on 
Wikipedia is more pronounced. If so, it is not just perpetuating the problem, 
but making it worse by surfacing certain authors and ideas even more 
frequently, or hardly at all. I would like to know if this is the case, and if 
so, how big the effect is.

In my last message, I mention a study about a set of award-winning political 
science books (the researchers study the citation gender imbalance for that 
set). I just saw this study today, but I began to think that it/the set of 
works--or some similar set of titles--could possibly be a good place to begin, 
especially if the original researchers were willing to share the list of 
titles/authors/gender/etc that they put together/worked with. Then it seems it 
would mostly be a matter of figuring out how to understand how those titles are 
cited on Wikipedia--through either the citation dataset or wikicite--to see 
if/how the citation patterns differ (i.e., if the works by women/men are cited 
more frequently/at the same rate/less frequently on Wikipedia than what the 
researchers found in the original study).

This seems like it would be easier to do than what you propose, but perhaps the 
idea is not sound. Until very recently, I thought I could find the answer in an 
existing paper! I honestly don't know the best way to get the answer, but I 
would like to know the answer and think it's important to look at.

All of the things you bring up--from the gender of the editor, to the type of 
editing being done, to the issues around multiple authors/paywalls/year of 
publication/field--complicate the inquiry, and in particular a larger one. I 
agree with what you say about doing something small first to see what's there.

Thanks again for all your thoughts.
Greg



On Thu, Aug 22, 2019 at 9:41 PM 
wrote:

> Send Wiki-research-l mailing list submissions to
> wiki-research-l@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific 
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>1. Re: gender balance of wikipedia citations (Greg)
>2. Re: gender balance of wikipedia citations (Kerry Raymond)
>
>
> --
>
> Message: 1
> Date: Thu, 22 Aug 2019 18:47:48 -0700
> From: Greg 
> To: wiki-research-l@lists.wikimedia.org
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
> <
>

Re: [Wiki-research-l] gender balance of wikipedia citations

2019-08-23 Thread Kerry Raymond
Hmm. I get the error that the mailing list doesn't exist. But if you write here

https://meta.wikimedia.org/wiki/Talk:WikiCite

someone should be able to point you in the right direction. There has been 
activity there within the last month.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Greg
Sent: Saturday, 24 August 2019 7:13 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] gender balance of wikipedia citations

Hi all-

One more thing: on twitter, I was advised that this list and the wikicite
google group were the best places to discuss research around citations.
Although I would like to post about this line of inquiry to the wikicite
group, it appears to be a private group. As an outsider, I have not been
able to access/view the group content/or even see who owns the group and is
the correct person to contact. I have mentioned that I can not access the
group to the wikicite twitter handle, and received a 'like' (?) but no
response, and nothing has changed.

If you are a member of both lists, would you be willing to point to this
thread from the wikicite group? If the group is not open to the public, at
least the ideas will be there.

Many thanks,

Greg

On Fri, Aug 23, 2019 at 12:01 AM <
wiki-research-l-requ...@lists.wikimedia.org> wrote:

> Send Wiki-research-l mailing list submissions to
> wiki-research-l@lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
> wiki-research-l-requ...@lists.wikimedia.org
>
> You can reach the person managing the list at
> wiki-research-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>1. sockpuppets and how to find them sooner (Kerry Raymond)
>2. Re: sockpuppets and how to find them sooner (Kerry Raymond)
>3. Re: sockpuppets and how to find them sooner (RhinosF1)
>4. Re: gender balance of wikipedia citations (Greg)
>
>
> ------
>
> Message: 1
> Date: Fri, 23 Aug 2019 15:57:15 +1000
> From: "Kerry Raymond" 
> To: "'Research into Wikimedia content and communities'"
> 
> Subject: [Wiki-research-l] sockpuppets and how to find them sooner
> Message-ID: <001b01d55977$9f51dbe0$ddf593a0$@gmail.com>
> Content-Type: text/plain;   charset="us-ascii"
>
> Currently, to open a sockpuppet investigation, you must name the two (or
> more) accounts that you believe to be sockpuppets with "clear, behavioural
> evidence of sock puppetry" which is typically in the form of pairs of edits
> that demonstrate similar edit behaviours that are unlikely to naturally
> occur. Now if you spend enough time on-wiki, you develop an intuition about
> behaviours you see on your watchlist and in article edit histories. Often I
> am highly suspicious that an account is a sockpuppet, but I cannot report
> them because I don't know which other account is involved.
>
>
>
> As a example, I recently encounted User:Shelati an account about 1 day old
> at that time with nearly 100 edits in that day all about 1-2 minutes apart,
> mostly making a similar change to a large number of Australian place
> infoboxes.
>
>
>
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
> <
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
> fset=20190728053057&limit=100&target=Shelati
> <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati>
> >
> &offset=20190728053057&limit=100&target=Shelati
>
>
>
> Genuine new users do not edit that quickly, do not use templates and do not
> mess structurally with infoboxes (at most they try to change the values).
> It
> "smelled" like a sockpuppet. However, as I did not recognise that pattern
> of
> edit behaviour as being that of any other user I was familiar with, it
> wasn't something I could report for sockpuppet investigation. Anyhow after
> about 2 weeks, the user was blocked as a sockpuppet. Someone must have
> noticed and figured out the other account:
>
>
>
>
> https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
> Archive
>
>
>
> Two weeks and 1,279 edits later . that's over 1000 possibly problematic
> edits after I first suspected t

Re: [Wiki-research-l] sockpuppets and how to find them sooner

2019-08-23 Thread Kerry Raymond
That's why I think we need "signatures" which is my shorthand for things like a 
hash function or a bounding box, a means by which many non-matching accounts 
can be eliminated at low cost, reserving the high cost comparisons (machine or 
human) only for high probability candidates. It is machine-computed and 
*stored* on the banning/blocking of a user. When a suspect user is presented, 
it calculates their signature and then compares them against the pre-calculated 
signatures of the bad users. I don't think it is too expensive if we can find 
the right "signature". CPU cycles are pretty fast. I only have an average 
laptop CPU-wise but I burn through loads of comparisons of geographic 
boundaries (complex polygons with many points) thanks to bounding boxes which 
reduce the complex shape to the smallest rectangle that contains it. Testing 
intersection of polygons is expensive, testing the intersection of rectangles 
is trivial. 

I think we can probably ignore the myriad of trivial bad guys for the purposes 
of signature collecting, eg blocked for vandalism after their first few edits. 
Sock puppets or their masters don't immediately appear as bad guys on 
individual edits. It's often more about long-term behaviours like POV pushing, 
refusal to engage in consensus building, slow burning edit wars, etc, that does 
not show on individual edits.

Kerry

Sent from my iPad

> On 23 Aug 2019, at 11:42 pm, Timothy Wood  wrote:
> 
> You are correct that in all but the most obvious cases, filing an SPI can be 
> exceptionally time consuming. I'm afraid there is no obvious technical 
> solution there that would not involve a complicated AI that is probably 
> beyond the ability of the foundation to produce. 
> 
> There is quite a bit of data available in the form of years of SPIs, but it 
> seems like you're talking about Facebook or Google levels of machine 
> learning, and even years of SPIs is tiny compared to the amount of data they 
> work with.
> 
> On a separate note, frequently changing IP adresses is most often an 
> indicator of nothing more than someone who is editing on a mobile connection. 
> This can usually be easily verified with an online IP lookup. 
> 
> V/r
> TJW/GMG
> 
> 
> 
>> On Fri, Aug 23, 2019, 02:44 RhinosF1  wrote:
>> Just a note that you can still go through warnings for vandalism etc. and
>> report to AIV.
>> 
>> Or at that edit speed, you may have a chance at AN at reporting for
>> bot-like edits which will draw attention to the account.
>> 
>> If you ever need help, things like #wikipedia-en-help on Freenode IRC exist
>> so you can ask other users.
>> 
>> RhinosF1
>> Miraheze Volunteer
>> 
>> On Fri, 23 Aug 2019 at 06:57, Kerry Raymond  wrote:
>> 
>> > Currently, to open a sockpuppet investigation, you must name the two (or
>> > more) accounts that you believe to be sockpuppets with "clear, behavioural
>> > evidence of sock puppetry" which is typically in the form of pairs of edits
>> > that demonstrate similar edit behaviours that are unlikely to naturally
>> > occur. Now if you spend enough time on-wiki, you develop an intuition about
>> > behaviours you see on your watchlist and in article edit histories. Often I
>> > am highly suspicious that an account is a sockpuppet, but I cannot report
>> > them because I don't know which other account is involved.
>> >
>> >
>> >
>> > As a example, I recently encounted User:Shelati an account about 1 day old
>> > at that time with nearly 100 edits in that day all about 1-2 minutes apart,
>> > mostly making a similar change to a large number of Australian place
>> > infoboxes.
>> >
>> >
>> >
>> > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
>> > <
>> > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
>> > fset=20190728053057&limit=100&target=Shelati
>> > <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati>
>> > >
>> > &offset=20190728053057&limit=100&target=Shelati
>> >
>> >
>> >
>> > Genuine new users do not edit that quickly, do not use templates and do not
>> > mess structurally with infoboxes (at most they try to change the values).
>> > It
>> > "smelled" like a sockpuppet. However, as I did not recognise that pattern
>> > of
>> > edit behaviour as being that of any other user I was familiar with, it
>> > wasn't something I cou

Re: [Wiki-research-l] sockpuppets and how to find them sooner

2019-08-22 Thread Kerry Raymond
To reply to my own question .

 

Can we find a way to create a "signature" of an account's pattern of
editing? Perhaps it might be a set of signatures, maybe one for the
categories that the account appears to be active in, another for the type of
edit, etc. Then if these signatures were calculated for all banned accounts
or currently blocked accounts (or at least ones with a long enough
contribution history to make it worthwhile - we're not interested in
one-edit vandals), then we could have a tool that could be run to quickly
compare one account against the signatures of banned/blocked accounts as
well as the cumulative edits of a set of known sockpuppets (i.e. treat them
as a single account) to determine if this may be a sockpuppet case meriting
further investigation. I imagine that it would be too expensive
computationally to actually run comparisons of the contribution histories of
all "bad guy" accounts against the suspicious account which is why I propose
a "signature" approach (but I'm happy to be told otherwise).

 

If we had such a tool and it proves reasonably reliable in identifying
likely sockpuppets (not asking for guarantees but close enough not to be a
waste of time to investigate), then we could routinely use it on new
accounts or reactivating accounts (i.e. possible sleeper accounts) once they
have a long enough editing history to enable the tool to operate effectively
to provide automated early warning of new/reactivating accounts appearing
suspiciously similar to "bad guy" accounts.

 

But this is a hard problem, both technically and socially (Assume Good
Faith, Privacy, etc), so I welcome the thoughts of others.

 

Kerry





 

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] sockpuppets and how to find them sooner

2019-08-22 Thread Kerry Raymond
Currently, to open a sockpuppet investigation, you must name the two (or
more) accounts that you believe to be sockpuppets with "clear, behavioural
evidence of sock puppetry" which is typically in the form of pairs of edits
that demonstrate similar edit behaviours that are unlikely to naturally
occur. Now if you spend enough time on-wiki, you develop an intuition about
behaviours you see on your watchlist and in article edit histories. Often I
am highly suspicious that an account is a sockpuppet, but I cannot report
them because I don't know which other account is involved.

 

As a example, I recently encounted User:Shelati an account about 1 day old
at that time with nearly 100 edits in that day all about 1-2 minutes apart,
mostly making a similar change to a large number of Australian place
infoboxes.

 

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati

&offset=20190728053057&limit=100&target=Shelati

 

Genuine new users do not edit that quickly, do not use templates and do not
mess structurally with infoboxes (at most they try to change the values). It
"smelled" like a sockpuppet. However, as I did not recognise that pattern of
edit behaviour as being that of any other user I was familiar with, it
wasn't something I could report for sockpuppet investigation. Anyhow after
about 2 weeks, the user was blocked as a sockpuppet. Someone must have
noticed and figured out the other account:

 

https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
Archive

 

Two weeks and 1,279 edits later . that's over 1000 possibly problematic
edits after I first suspected them. But that's nothing compared with another
ongoing situation in which a very large number of different IPs are engaged
in a pattern of problem edits on mostly Australian articles (a few different
types of edits but an obvious "quack like a duck" situation). The IP number
changes frequently (and one assumes deliberately). The edits potentially go
back to 2013 but appear to have intensified in 2018/2019. Here's one user's
summary of all the IP addresses involved, and the extent to which they have
been cleaned up, given many thousands of edits are involved, see:

 

https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup

 

As well as the damage done to the content (which harms the readers), these
IP sockpuppets are consuming enormous amounts of effort to track them down
and revert them, which could be more productively used to improve the
content. We need better tools to foil these pests. So I want to put that
challenge out to this list.

 

Kerry

 

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] gender balance of wikipedia citations

2019-08-22 Thread Kerry Raymond
, e.g. government websites. Remember that 
Wikipedia prefers open citations over paywalled citations and a lot of the 
publications behind paywalls are individually authored.

Your proposed research has a lot of interesting challenges and a number of 
limitations. I'm not saying don't do it, but I am saying start very small and 
see if you can find any evidence to support your hypothesis before embarking on 
a larger study. Because contributor behaviour is what you are trying to study, 
you probably need to do both quantitative and qualitative experiments. E.g. I 
have described the two modes of citation I do, but I cannot say how typical my 
behaviour is.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Leila Zia
Sent: Friday, 23 August 2019 3:44 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] gender balance of wikipedia citations

Hi Greg,

A few comments if you're going to go with "proportion of male vs female authors 
of the source material used as citations in arbitrary
articles":

* Please differentiate between sex (female, male, ...) and gender (woman, man, 
...). My understanding from your initial email is that you want to stay focused 
on gender, not sex.

* Unless you have reliable sources about the gender of an author, I would not 
recommend trying to predict what the gender is. (As you may know, this is not 
uncommon in social media studies, for example, to predict the gender of the 
author based on their image or their name.
These approaches introduce biases and social challenges.)

* Re your question about whether WMF has resources to look into this question 
in-house: I can't speak for the whole of WMF, however, I can share more about 
the Research team's direction. As part of our future work, we would like to 
"help contributors monitor violations of core content policies and assess 
information reliability and bias both granularly and at scale". [1] The 
question you proposed can fall under assessing bias in content (considering 
citations as part of the content). I expect us to focus first on the piece 
about violations of core content policies and information reliability and come 
back to the bias question later. As a result, we won't have bandwidth to do 
your proposal in-house at the moment. Sorry about that.

I hope this helps.

Best,
Leila

[1] Section 2 of our Knowledge Integrity whitepaper:
https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_Wikimedia_Research_2030.pdf


On Thu, Aug 22, 2019 at 9:57 AM Greg  wrote:
>
> Hi Kerry,
> Those are all very interesting ways to look at this. I was thinking 
> mostly along the lines of your first bullet point, but I'd be 
> interested in research in any of those areas.
>
> Thanks,
> Greg
>
> On Thu, Aug 22, 2019 at 5:00 AM 
> 
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l@lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-requ...@lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-ow...@lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific 
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >1. gender balance of wikipedia citations (Greg)
> >2. Re: gender balance of wikipedia citations (Kerry Raymond)
> >
> >
> > 
> > --
> >
> > Message: 1
> > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > From: Greg 
> > To: wiki-research-l@lists.wikimedia.org
> > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> > <
> > caoo9dnty+odo5oqrmzeg1nze-kynylwntd6acheytbyegk8...@mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Greetings!
> >
> > I was looking for information about the gender balance of Wikipedia 
> > citations and no one I've asked knows of any work on this topic. Do you?
> >
> > I think this is an important question.
> >
> > Here's what I've learned so far:
> >
> > Wikipedia citations are currently in the form of text strings. There 
> > is also an initiative to place citations in an annotated structured 
> > repository (wikicite). I do not know the current status of wikicite 
> > or if/wh

Re: [Wiki-research-l] gender balance of wikipedia citations

2019-08-21 Thread Kerry Raymond
Could you elaborate a bit more on what you mean by the gender balance of 
citations? 

Are you talking about:

* proportion of male vs female authors of the source material used as citations 
in arbitrary articles>
*  the quality/quantity of citations in biography articles of men vs women?
* the quality/quantity of citations in articles that are gendered by some other 
criteria (e.g. reader interest, romantic comedy vs action film)?

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Greg
Sent: Thursday, 22 August 2019 1:19 PM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] gender balance of wikipedia citations

Greetings!

I was looking for information about the gender balance of Wikipedia citations 
and no one I've asked knows of any work on this topic. Do you?

I think this is an important question.

Here's what I've learned so far:

Wikipedia citations are currently in the form of text strings. There is also an 
initiative to place citations in an annotated structured repository (wikicite). 
I do not know the current status of wikicite or if/when this could be used for 
this inquiry--either to examine all, or a sensible subset of the citations.

My perspective is that understanding the gender balance is  necessary and 
urgent. The balance could be better, the same, or worse than the citation 
balances we already know, and the scale of the effect is quite large.

Is this a line of inquiry that the wikimedia/wikicite community is interested 
in pursuing? If so, what is the best way to get started? Does the WMF have the 
resources and interest to look into this matter inhouse?

Thanks for your thoughts.

Greg
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Question on article creation policy

2019-08-09 Thread Kerry Raymond
Registered means has created a user account.

Confirmation can be done manually or automatically. From memory, it occurs 
automatically after 10 acceptable edits (I presume this means not-reverted 
edits) and/or 4 days after creating the user account. I think these values were 
chosen because past experience shows us that vandals have very little patience 
and tend to vandalise within their first few edits or first few days so anyone 
who gets past these milestones probably has good intentions. Confirmation is 
done manually if the user needs the right to create articles sooner than the 
automatic process would otherwise assign it; this situation usually arises at 
edit-a-thons where new participants always have good intentions (I've never 
seen a vandal come to an event). Administrators and those users who are 
designated event coordinators (e.g. me) have the ability to confirm an account 
earlier if there is some good reason.

Where is more information about policy changes in Wikipedia? Good question. 
After 14 years on-wiki, policies change without my knowledge all the time. It 
is a very common complaint that policy change takes place without adequate 
notification to affected WikiProjects, etc, so people can be aware of the 
proposal and participate in the discussion. And I suspect this situation suits 
a lot of people who know they are pushing through an unpopular change. There 
are lots of places where policy changes can be discussed and consensus reached, 
so it's hard to monitor all of them. Even using a watchlist, some discussion 
pages are so active, you just cannot keep watching them (when many of the page 
changes are not of any significance). We lack an effective notification system 
of issues at a higher level than "page change". There is an expectation that 
significant discussions should be drawn to the attention of others, but somehow 
this doesn't always happen.

Does the policy work? Yes and no. Very few articles survive the Articles for 
Creation process. If you review there (as I have done), you realise pretty 
quickly that new users almost always create articles about living people and 
current organisations, many of which appear to run-of-the-mill, e.g. a local 
dental practice group, a financial planner, etc. Although you usually have no 
concrete evidence (unless their user name is the same or a variant of the 
article title), nonetheless you tend to suspect this is people writing about 
themselves or their organisation and that there is conflict of interest and 
promotional intent. Some of them are well-written according to our Manual of 
Style, which tends to make you suspect undisclosed paid editing could be 
involved. So faced with the deluge of such articles, the reviewer quickly 
becomes conditioned to click "not accepted - insufficient citations to reliable 
sources" (which is code for an option you don’t get "I really doubt this is an 
encyclopaedic topic"). While the user can revise it and resubmit the draft, it 
generally gets rejected again and again. Eventually the user gives up and the 
draft is deleted after six months inactivity. Because the reviewing at AfC is 
rather soul-destroying (or so I find), reviewers decide to do something else 
with their time than monitor this stream of CoI dross, and so AfC is 
perpetually short of reviewers and there is usually a huge backlog of AfC 
reviews pending (which further discourages the contributing user). So back to 
the question of "does this policy work?". The answer is Yes it works in that it 
protects Wikipedia from content we do not want. The answer is No it doesn't 
work as genuine good faith users attempting to create an article on a topic 
that probably is encyclopedic are not noticed and given genuine assistance due 
to the conditioning of the reviewers and we lose those well-intentioned people 
as contributors because of this bad experience. It also burns out a lot of AfC 
reviewers along the way. What's the way to fix it? I wish I knew.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Haifeng Zhang
Sent: Saturday, 10 August 2019 4:48 AM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Question on article creation policy

Dear folks,

I'm checking the Article Creation page 
(https://en.wikipedia.org/wiki/Wikipedia:Article_creation), and it says:


The ability to create articles directly in mainspace is 
restricted to autoconfirmed 
users, though non-confirmed users and non-registered users can submit a 
proposed article through the Articles for 
Creation 
process, where it will be reviewed and considered for publication.


Anyone knows when the restriction (e.g., registered and auto-confirmed) become 
effective? I tracked the past revisions of the page but found no clue. A more 
general question is: where to find i

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-05 Thread Kerry Raymond
The current mechanisms would allow anyone to alter the Indigenous content in an 
article. That is unlikely to be acceptable to Ingenenous Australians. This is 
why I propose a sister project with different rules to create such content and 
then import it into en.WP etc as an unalterable unit into en.WP (from where it 
can be deleted but not changed in content). The WP gets to decide if the 
content is welcome but does not control the content. This appears to me to 
strike be an acceptable balance between the two cultures. 

If our current mechanisms and policies were working, I don't think we'd be 
having this conversation.

Sent from my iPad

> On 5 Jul 2019, at 11:18 pm, Samuel Klein  wrote:
> 
> I think we have all the mechanics needed for this.
> 
> - Individual revisions aren't editable, once posted, and stay around
> forever (unless revdeleted).
> - Each wiki can have its own guidelines for how accounts can be shared.
> - Rather than limiting who can edit, you could have a whitelist of
> contributors considered by the local community to represent their
> knowledge; and have a lens that only looks at those contributions.  (like
> flagged revs)
> 
> (@stuart - tertiary sourcing can apply to any source; it does not privilege
> print culture.  only particular standards of notability and verifiability
> start to limit which sources are preferred.)
> 
> On Thu, Jul 4, 2019 at 7:39 PM Kerry Raymond 
> wrote:
> 
>> On en.WP we prohibit shared accounts and accounts that appear to represent
>> an organisation so that's a barrier. But assuming there was some special
>> case to allow a username to represent a community of knowledge, we would
>> still have a practical problem of whether the individual creating such an
>> account or doing the edit was authorised to do so by that community, which
>> would require some kind of real-world validation. But, let's say local
>> chapters or local users could undertake that process using local knowledge
>> of how such communities identify and operate.
>> 
>> The problem it still doesn't solve is that whatever information is added
>> by that account could then be changed by anyone. We would have to have a
>> way to prevent that happening, which would be a technical problem. Also
>> could that information ever be deleted by anyone (even for purely innocent
>> purposes, e.g. splitting a large article might delete the content from one
>> article to re-insert into other article). Or is the positioning of the
>> content within a particular article a decision only that group might be
>> allowed to take?
>> 
>> A possible technical/social solution is to have traditional knowledge of
>> this nature in a sister project, where rules on user names would be
>> entirely different and obviously oral sourced material allowed.  The group
>> could then produce named units of information as a single unit (similar to
>> a File on Commons). These units could then be added to en.WP or others
>> (obviously the language the units are written would have be identified, as
>> Commons does with descriptions already) so only English content is added to
>> en.WP and so on. The content would be presented in en.WP in a way (in a
>> "traditional language" box with a link to something explaining that what
>> means) so the reader understands what this info is and is free to trust it
>> or not. The information itself cannot be modified on en.WP only on the
>> sister project (requests on talk pages of the sister project would need to
>> be allowed for anyone to make requests eg report misspelling). En.WP would
>> remain in control of whether the content was included but could not change
>> the content themselves.
>> 
>> It seems to be a sister project similar to the current Commons would be
>> what we need to make this work.
>> 
>> Sent from my iPad
>> 
>> On 4 Jul 2019, at 6:03 pm, Jan Dittrich  wrote:
>> 
>>>> Maybe not "signed" in the sense of a signature of a Talk page, but each
>>> contribution is attributed automatically to its user as seen in the
>>> history. As someone who edits under my real name, I absolutely put my
>> name
>>> to my contributions.
>>> 
>>> That is what I assumed, too, since it was coherent with some of the
>>> problems described in:
>>> 
>> https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
>>> in this interpretation, Mediawiki (and lots of other software) code-ify
>>> knowledge production as done by single people  [1]– a person can edit,
>> but
>>> not a group (which was one of

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-04 Thread Kerry Raymond
I don't think it's impossible. I think the presentation of the material in a 
box that clearly indicates the nature of the material and its provenance to 
allow the reader to decide for themselves whether they wish to read it and how 
much they wish to believe it. We already have the same problem with images; a 
random person with a pseudonym uploads a photo of a mountain to Commons and 
says "that's Mt Whatsit". It gets added to related WP articles as Mt Whatsit. 
So I don't see what I am proposing is any more risky, indeed it's considerably 
less risky if the sister project dies some real world validation.

Whether or not what I am proposing will be what indigenous communities want is 
a separate question. I suspect what they want will never be acceptable to 
en.WP. But is there a compromise?

Kerry

Sent from my iPad

> On 5 Jul 2019, at 10:19 am, Stuart A. Yeates  wrote:
> 
> At the end of the day, wikipedia is by definition a tertiary source
> source and built on concepts of Western print culture. Traditional
> knowledge is immiscible with this model.
> 
> This is exactly why I stopped promoting mi.wiki locally here --- as I
> understand the needs of mi speakers and activists wikipedias are
> incapable of meeting them.
> 
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
> 
>> On Fri, 5 Jul 2019 at 11:39, Kerry Raymond  wrote:
>> 
>> On en.WP we prohibit shared accounts and accounts that appear to represent 
>> an organisation so that's a barrier. But assuming there was some special 
>> case to allow a username to represent a community of knowledge, we would 
>> still have a practical problem of whether the individual creating such an 
>> account or doing the edit was authorised to do so by that community, which 
>> would require some kind of real-world validation. But, let's say local 
>> chapters or local users could undertake that process using local knowledge 
>> of how such communities identify and operate.
>> 
>> The problem it still doesn't solve is that whatever information is added by 
>> that account could then be changed by anyone. We would have to have a way to 
>> prevent that happening, which would be a technical problem. Also could that 
>> information ever be deleted by anyone (even for purely innocent purposes, 
>> e.g. splitting a large article might delete the content from one article to 
>> re-insert into other article). Or is the positioning of the content within a 
>> particular article a decision only that group might be allowed to take?
>> 
>> A possible technical/social solution is to have traditional knowledge of 
>> this nature in a sister project, where rules on user names would be entirely 
>> different and obviously oral sourced material allowed.  The group could then 
>> produce named units of information as a single unit (similar to a File on 
>> Commons). These units could then be added to en.WP or others (obviously the 
>> language the units are written would have be identified, as Commons does 
>> with descriptions already) so only English content is added to en.WP and so 
>> on. The content would be presented in en.WP in a way (in a "traditional 
>> language" box with a link to something explaining that what means) so the 
>> reader understands what this info is and is free to trust it or not. The 
>> information itself cannot be modified on en.WP only on the sister project 
>> (requests on talk pages of the sister project would need to be allowed for 
>> anyone to make requests eg report misspelling). En.WP would remain in 
>> control of whether the content was included but could not change the content 
>> themselves.
>> 
>> It seems to be a sister project similar to the current Commons would be what 
>> we need to make this work.
>> 
>> Sent from my iPad
>> 
>> On 4 Jul 2019, at 6:03 pm, Jan Dittrich  wrote:
>> 
>>>> Maybe not "signed" in the sense of a signature of a Talk page, but each
>>> contribution is attributed automatically to its user as seen in the
>>> history. As someone who edits under my real name, I absolutely put my name
>>> to my contributions.
>>> 
>>> That is what I assumed, too, since it was coherent with some of the
>>> problems described in:
>>> https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
>>> in this interpretation, Mediawiki (and lots of other software) code-ify
>>> knowledge production as done by single people  [1]– a person can edit, but
>>> not a group (which was one of the challenges in the project described in
>&

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-04 Thread Kerry Raymond
Sorry I meant to say "traditional knowledge" box not "traditional language" box.

Kerry

Sent from my iPad

> On 5 Jul 2019, at 9:38 am, Kerry Raymond  wrote:
> 
> On en.WP we prohibit shared accounts and accounts that appear to represent an 
> organisation so that's a barrier. But assuming there was some special case to 
> allow a username to represent a community of knowledge, we would still have a 
> practical problem of whether the individual creating such an account or doing 
> the edit was authorised to do so by that community, which would require some 
> kind of real-world validation. But, let's say local chapters or local users 
> could undertake that process using local knowledge of how such communities 
> identify and operate.
> 
> The problem it still doesn't solve is that whatever information is added by 
> that account could then be changed by anyone. We would have to have a way to 
> prevent that happening, which would be a technical problem. Also could that 
> information ever be deleted by anyone (even for purely innocent purposes, 
> e.g. splitting a large article might delete the content from one article to 
> re-insert into other article). Or is the positioning of the content within a 
> particular article a decision only that group might be allowed to take?
> 
> A possible technical/social solution is to have traditional knowledge of this 
> nature in a sister project, where rules on user names would be entirely 
> different and obviously oral sourced material allowed.  The group could then 
> produce named units of information as a single unit (similar to a File on 
> Commons). These units could then be added to en.WP or others (obviously the 
> language the units are written would have be identified, as Commons does with 
> descriptions already) so only English content is added to en.WP and so on. 
> The content would be presented in en.WP in a way (in a "traditional language" 
> box with a link to something explaining that what means) so the reader 
> understands what this info is and is free to trust it or not. The information 
> itself cannot be modified on en.WP only on the sister project (requests on 
> talk pages of the sister project would need to be allowed for anyone to make 
> requests eg report misspelling). En.WP would remain in control of whether the 
> content was included but could not change the content themselves.
> 
> It seems to be a sister project similar to the current Commons would be what 
> we need to make this work.
> 
> Sent from my iPad
> 
> On 4 Jul 2019, at 6:03 pm, Jan Dittrich  wrote:
> 
>>> Maybe not "signed" in the sense of a signature of a Talk page, but each
>> contribution is attributed automatically to its user as seen in the
>> history. As someone who edits under my real name, I absolutely put my name
>> to my contributions.
>> 
>> That is what I assumed, too, since it was coherent with some of the
>> problems described in:
>> https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
>> in this interpretation, Mediawiki (and lots of other software) code-ify
>> knowledge production as done by single people  [1]– a person can edit, but
>> not a group (which was one of the challenges in the project described in
>> the slides, if I remember correctly)
>> 
>> I would be much interested in more research on what values are "build in"
>> our software (Some Research by Heather Ford and Stuart Geiger goes in this
>> direction).
>> 
>> Best,
>> Jan
>> 
>> [1] An interesting read on the concept of "transmitting knowledge" (e.g. in
>> articles and via the web) and knowledge as inherently social would be
>> Ingold’s "From the Transmission of Representation to the Education of
>> Attention" (http://lchc.ucsd.edu/MCA/Paper/ingold/ingold1.htm).
>> 
>> Am Do., 4. Juli 2019 um 02:20 Uhr schrieb Kerry Raymond <
>> kerry.raym...@gmail.com>:
>> 
>>> Maybe not "signed" in the sense of a signature of a Talk page, but each
>>> contribution is attributed automatically to its user as seen in the
>>> history. As someone who edits under my real name, I absolutely put my name
>>> to my contributions.
>>> 
>>> Or the other possible interpretation of "signed" here may be referring to
>>> the citations which are usually sources with one or small number of
>>> individual authors, as opposed to a community of shared knowledge
>>> custodians which is the case with Aboriginal Australians.
>>> 
>>> Kerry
>>> 
>>> Sent from my iPad
>>> 
>&

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-04 Thread Kerry Raymond
On en.WP we prohibit shared accounts and accounts that appear to represent an 
organisation so that's a barrier. But assuming there was some special case to 
allow a username to represent a community of knowledge, we would still have a 
practical problem of whether the individual creating such an account or doing 
the edit was authorised to do so by that community, which would require some 
kind of real-world validation. But, let's say local chapters or local users 
could undertake that process using local knowledge of how such communities 
identify and operate.

The problem it still doesn't solve is that whatever information is added by 
that account could then be changed by anyone. We would have to have a way to 
prevent that happening, which would be a technical problem. Also could that 
information ever be deleted by anyone (even for purely innocent purposes, e.g. 
splitting a large article might delete the content from one article to 
re-insert into other article). Or is the positioning of the content within a 
particular article a decision only that group might be allowed to take?

A possible technical/social solution is to have traditional knowledge of this 
nature in a sister project, where rules on user names would be entirely 
different and obviously oral sourced material allowed.  The group could then 
produce named units of information as a single unit (similar to a File on 
Commons). These units could then be added to en.WP or others (obviously the 
language the units are written would have be identified, as Commons does with 
descriptions already) so only English content is added to en.WP and so on. The 
content would be presented in en.WP in a way (in a "traditional language" box 
with a link to something explaining that what means) so the reader understands 
what this info is and is free to trust it or not. The information itself cannot 
be modified on en.WP only on the sister project (requests on talk pages of the 
sister project would need to be allowed for anyone to make requests eg report 
misspelling). En.WP would remain in control of whether the content was included 
but could not change the content themselves.

It seems to be a sister project similar to the current Commons would be what we 
need to make this work.

Sent from my iPad

On 4 Jul 2019, at 6:03 pm, Jan Dittrich  wrote:

>> Maybe not "signed" in the sense of a signature of a Talk page, but each
> contribution is attributed automatically to its user as seen in the
> history. As someone who edits under my real name, I absolutely put my name
> to my contributions.
> 
> That is what I assumed, too, since it was coherent with some of the
> problems described in:
> https://upload.wikimedia.org/wikipedia/commons/6/6c/PG-Slides-Wikimania18.pdf
> in this interpretation, Mediawiki (and lots of other software) code-ify
> knowledge production as done by single people  [1]– a person can edit, but
> not a group (which was one of the challenges in the project described in
> the slides, if I remember correctly)
> 
> I would be much interested in more research on what values are "build in"
> our software (Some Research by Heather Ford and Stuart Geiger goes in this
> direction).
> 
> Best,
> Jan
> 
> [1] An interesting read on the concept of "transmitting knowledge" (e.g. in
> articles and via the web) and knowledge as inherently social would be
> Ingold’s "From the Transmission of Representation to the Education of
> Attention" (http://lchc.ucsd.edu/MCA/Paper/ingold/ingold1.htm).
> 
> Am Do., 4. Juli 2019 um 02:20 Uhr schrieb Kerry Raymond <
> kerry.raym...@gmail.com>:
> 
>> Maybe not "signed" in the sense of a signature of a Talk page, but each
>> contribution is attributed automatically to its user as seen in the
>> history. As someone who edits under my real name, I absolutely put my name
>> to my contributions.
>> 
>> Or the other possible interpretation of "signed" here may be referring to
>> the citations which are usually sources with one or small number of
>> individual authors, as opposed to a community of shared knowledge
>> custodians which is the case with Aboriginal Australians.
>> 
>> Kerry
>> 
>> Sent from my iPad
>> 
>>> On 4 Jul 2019, at 10:28 am, Todd Allen  wrote:
>>> 
>>> I found one error:
>>> 
>>> "Even the idea that contributions to the wiki should be signed by
>>> individuals is at odds with many traditional societies where knowledge
>>> expression is mainly collective, not individualised..."
>>> 
>>> That's already how it works. Only discussion posts and the like are
>> signed.
>>> I don't know of any language Wikipedia in which contributi

Re: [Wiki-research-l] New paper - Indigenous knowledge on Wikipedia

2019-07-03 Thread Kerry Raymond
Maybe not "signed" in the sense of a signature of a Talk page, but each 
contribution is attributed automatically to its user as seen in the history. As 
someone who edits under my real name, I absolutely put my name to my 
contributions.

Or the other possible interpretation of "signed" here may be referring to the 
citations which are usually sources with one or small number of individual 
authors, as opposed to a community of shared knowledge custodians which is the 
case with Aboriginal Australians.

Kerry

Sent from my iPad

> On 4 Jul 2019, at 10:28 am, Todd Allen  wrote:
> 
> I found one error:
> 
> "Even the idea that contributions to the wiki should be signed by
> individuals is at odds with many traditional societies where knowledge
> expression is mainly collective, not individualised..."
> 
> That's already how it works. Only discussion posts and the like are signed.
> I don't know of any language Wikipedia in which contributions to the actual
> encyclopedia articles are signed, and I know several of the largest
> (German, Spanish, and English) do not have such a practice. (If there is a
> project where individual contributions are signed, please let me know, I'd
> be interested to see how they make that work. What if it gets edited?)
> 
> Aside from that, the article seems to state that such a project is
> incompatible with both NPOV and copyleft, so I'm not sure that Wikimedia
> hosting it would be the best fit as those are fundamental requirements.
> (That's not to say it's not worth doing at all, of course.)
> 
> Todd
> 
> On Wed, Jul 3, 2019 at 5:52 PM Nathalie Casemajor 
> wrote:
> 
>> Hello,
>> 
>> For those of you who are interested in "small" Wikipedias and Indigenous
>> languages, here's a new academic paper co-signed by yours truly.
>> 
>> Published in an open access journal :)
>> 
>> Nathalie Casemajor (Seeris)
>> 
>> -
>> 
>> *Openness, Inclusion and Self-Affirmation: Indigenous knowledge in Open
>> Knowledge Projects
>> <
>> http://peerproduction.net/editsuite/issues/issue-13-open/peer-reviewed-papers/openness-inclusion-and-self-affirmation/?fbclid=IwAR3YQA3eXXZ7Z3ou6lz38_zxXsU_XZ0fu8AJVHE5EVGDil0SBa2U2q0gCKc
>>> *
>> 
>> This paper is based on an action research project (Greenwood and Levin,
>> 1998) conducted in 2016-2017 in partnership with the Atikamekw Nehirowisiw
>> Nation and Wikimedia Canada. Built into the educational curriculum of a
>> secondary school on the Manawan reserve, the project led to the launch of a
>> Wikipedia encyclopaedia in the Atikamekw Nehirowisiw language. We discuss
>> the results of the project by examining the challenges and opportunities
>> raised in the collaborative process of creating Wikimedia content in the
>> Atikamekw Nehirowisiw language. What are the conditions of inclusion of
>> Indigenous and traditional knowledge in open projects? What are the
>> cultural and political dimensions of empowerment in this relationship
>> between openness and inclusion? How do the processes of inclusion and
>> negotiation of openness affect Indigenous skills and worlding processes?
>> Drawing from media studies, indigenous studies and science and technology
>> studies, we adopt an ecological perspective (Star, 2010) to analyse the
>> complex relationships and interactions between knowledge practices,
>> ecosystems and infrastructures. The material presented in this paper is the
>> result of the group of participants’ collective reflection digested by one
>> Atikamekw Nehirowisiw and two settlers. Each co-writer then brings his/her
>> own expertise and speaks from what he or she knows and has been trained
>> for.
>> 
>> Casemajor N., Gentelet K., Coocoo C. (2019), « Openness, Inclusion and
>> Self-Affirmation: Indigenous knowledge in Open Knowledge Projects »,
>> *Journal
>> of Peer Production*, no13, pp. 1-20.
>> 
>> 
>> More info about the Atikamekw Wikipetcia project and the involvement
>> of Wikimedia Canada:
>> 
>> https://ca.wikimedia.org/…/Atikamekw_knowledge,_culture_and…
>> <
>> https://ca.wikimedia.org/wiki/Atikamekw_knowledge,_culture_and_language_in_Wikimedia_projects?fbclid=IwAR1PynlNUrZcRSIIu9WwcKhp0QjE_UqPz2O8_KNZxnsrTGQYKoLyOMuvh10
>>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Questions about SuggestBot

2019-06-26 Thread Kerry Raymond
I am familiar with the citation hunt tool, but am not much of a fan of it. 
Basically there is a tracking category built into the {{citation needed}} and 
citation hunt returns you a random article in that category showing you a few 
lines of text preceding the (I think) first needed citation in the article and 
you can choose to accept it or skip it (whereupon it offers you another one). 
Although intended as a lightweight way to try to engage librarians during 
1lib1ref to scurry unto their collections to find a citation and add it. It 
sounds superficially like a great idea, unless you do outreach with librarians 
as I do and then you see how flawed it. If fails in at least three ways.

The first is that just because there is a citation needed template present, it 
doesn’t follow that a citation will exists (it may be untrue information) so it 
is often a waste of time searching. An inherent problem with that template is 
that many people (librarians or others) may see it and try to find a citation. 
When they fail to find one, what do they do? Answer: do nothing and move on. 
Very few people feel confident to say “there is no citation, I will remove all 
the text associated with the template” as firstly they realise there may be a 
source that they failed to find and secondly they are uncertain how much 
preceding text to remove in any case. It’s hard to make that call as an 
experienced Wikipedian, it’s not an entry level task. The template does not 
have a field to record how many attempts have been made which might be 
accompanied by a (say) “3 strikes and it’s out” policy or a time-based deletion 
criteria.  Once added {{citation needed}} tends to linger for years, wasting 
people’s time trying to resolve it.

Problem 2. A librarian will be asked to find a citation for content unlikely to 
be held in their library; there are not a lot of books on baseball players in 
an Australian library. You can skips through an awful lot of suggestions before 
hitting one that might be in your collection. Although my personal observation 
is that most librarians don’t look in their collection, they look for a quick 
win with a simple google search which tends to fail (if it was that easy, it 
would already be cited).

Problem 3. It is fundamentally sexist and some librarians notice this and 
comment on it to others. Who writes Wikipedia? Well, we know it’s predominantly 
men. Who are librarians? Predominantly women. So, ... the idea of using 
Citation Hunt for 1Lib1Ref is that a woman is being asked to do a lot of work 
(scouring their collection) to clean up after a lazy man who didn’t bother to 
do their job right in the first place. Why does this scenario seem familiar to 
many women? When you put it like that, would you ever suggest it again? I 
wouldn’t but WMF does. Instead I have come up with 1lib1ref tasks that add 
content with citation in topics that are relevant to the librarians I work 
with. The librarians see this as a very positive activity. The task is almost 
always do-able and adds content to a topic they perceive as relevant to their 
library’s focus (no baseball tasks).

You can fix problem 2 though. If you use Petscan to compile some 
whole-of-category trees, you can run Citation Hunt over that smaller list of 
articles and keep the topics relevant to a particular group of librarians. I 
did this manually one year, now it’s built-in to the tool I think.

So in terms of suggesting tasks to users, as much as resolving {{citation 
needed}} is a high value result if it succeeds, the success rate of actually 
doing so is low and the task can be frustrating. The other risk with new people 
doing is that they don’t understand what a reliable citation is so they may 
link to a Facebook post or a tweet or a webpage that on close inspection is a 
mirror of Wikipedia and again even as an experienced Wikipedia, I struggle to 
determine if a random webpage out there which is more or less identical to a 
Wikipedia article is a copy from Wikipedia (and hence not a reliable source) or 
whether the Wikipedia article is a copyvio of that webpage.

I’d have to say that putting a time limit on {{citation needed}} would be a 
very good thing as it would limit the time questionable content exists without 
citation and we could use imminent deadlines as the basis for a Suggest A Task 
tool, on a “cite it or delete it” basis. This would empower people to delete. 
I’d go further and suggest that citation-needed should have short default time 
limits set on edits made by new users and the higher the importance or 
readership of the article, the lower the expiry time should be. I think this 
could be done with a bot where it is not set manually by the person who adds 
citation needed. It would be really great if Twinkle allowed you to add the 
citation needed tag and automatically set the expiry time according to whatever 
policies exist.

Sent from my iPad

> On 27 Jun 2019, at 1:49 am, Morten Wang  wrote:
> 
> As St

Re: [Wiki-research-l] Monthly/Yearly Edits per country

2019-06-25 Thread Kerry Raymond
Maybe I am missing something but, while you can geolocate anonymous 
contributors (well to the extent you can reliably geolocate any IP address), 
you cannot geolocate logged-in users. Of course the WMF servers do know the IP 
addresses used by the logged-in users but this is suppressed for privacy 
reasons. So either you are seeing data based only on anonymous users (which is 
unlikely to be a representative sample) or the WMF have chosen to compile and 
release this data set. There is no way anyone else could compile the 
geolocations of all contributions.

So I am curious what the 2014 data you are pointing us to really represents.

Kerry

Sent from my iPad

> On 25 Jun 2019, at 5:29 pm, Adam Ferris  
> wrote:
> 
> Hello all,
> 
> I am new to this mailing list, and a new researcher so apologies if this is 
> not the right mailing list for this question, but I hope you might be able to 
> help me.
> 
> I am trying to recreate the map included below, with Wikipedia edits per 
> 10,000 internet users but with newer data, I was hoping year 2018 data that I 
> can then average per month, although it doesn’t have to be a Calendar year I 
> suppose, it could be Feb-2018 to Feb-2019.
> The key is that I would like newer data. I have looked at the sources of 
> these maps, and they all seem to end in 2013 or 2014. This source got me 
> close, but has no Edit data, only Viewing data.
> https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm
> 
> Could anyone help me in finding the right dataset to recreate this map? 
> (Image too heavy to be sent in email)
> Like this image:
> https://i0.wp.com/geonet.oii.ox.ac.uk/wp-content/uploads/sites/46/2016/09/Wikipedia_EditsPerMonth-1-1.png
> 
> From this article:
> https://www.oii.ox.ac.uk/blog/the-geography-of-wikipedia-edits/
> 
> I’m very grateful for any help you can offer me.
> 
> Very best,
> Adam
> 
> 
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research on Edit Size

2019-06-07 Thread Kerry Raymond
Not answering your question about studies, but I think your assumption that an 
editor has some kind of "normal" edit size dictated solely by tenure/experience 
might not be valid.

I would note that even for the same contributor, there are different kinds of 
contribution and these will have different patterns and hence sizes. For 
example, I think of myself principally as a content writer but I also manage a 
large watchlist. I would be very surprised if my edit size didn't vary between 
depending on the task. When content writing, I am likely to  large positive 
size edits (as I am adding content), but I'm human and make mistakes, so a 
large edit might be followed by some smaller copyedits. But when I am managing 
my watchlist, my edits will most often be deleting material (vandalism, spam, 
uncited dubious claims or opinions) so I would imagine that I would mostly do 
negative size edits. When I am doing some task in AutoWikiBrowser usually to do 
maintenance across a set of articles (e.g. replace a changed domain name in 
citation URLs or rename links because of a page move), it will probably show a 
long run of same/similar sized edits, which might be positive or negative in 
size depending on the relative length of the old/new text.

You may need to consider a couple more variables that come from the tags on the 
edits, such as using visual editor and mobile editors, as the tool you use to 
edit does alter the way you edit. For example, if I click a section edit in 
source editor, I only get to edit that section so I may do a number of section 
edits to complete an overall task. If I am using visual editor, it always open 
the whole article and so I may do the complete task in a single edit. If I am 
on a mobile device, I will usually do the minimal edit necessary (because it is 
so hard to edit that way) and come back later on my laptop to finish the task 
properly, so I might remove some incorrect information with the mobile edit (as 
leaving it place misleads the reader) but wait until later to add the correct 
information as adding the citations for the correct information on a mobile 
device is just too hard for me.

Finally if I am on a poor Internet connection, I will tend to publish 
frequently for fear of losing my work. If I am on a good Internet connection, I 
become complacent and publish less frequently. If a person is only a Visual 
Editor user, then they probably rely on its ability to recover a partial edit 
if the session terminates unexpectedly and may be less inclined to publish 
frequently.

I also do training for new users. And new users exhibit a range of behaviours. 
Some publish very frequently. Add one sentence, publish, add the citation, 
publish, replace a word, publish. Others forget to publish at all.

And finally if you have an editor with edit-count-itis, expect them to do a lot 
of small edits using tools to implement lots of minor changes of little net 
value, because their goal is simply to increase their edit count (and hence 
their ego) in the guise of contributing. I often think it might be a good idea 
to hide the edit count statistic; while we might lose a lot of edits as a 
result, we probably wouldn't miss them and the rest of us would waste less time 
as our watchlists would not get inflated by these massive number of trivial 
changes.

Finally I note it is easier to know the number of bytes changed with each edit 
(the change in the size of the article wikitext) than it is to know the number 
of words changed as that involves comparison of the text. Which is easy I guess 
for straight text "how now brown cow" is 4 words but how many words change when 
using templates, citations, etc, is it the number of words in the wikitext or 
the number of words rendered to the reader? If I change a template definition, 
I can alter the number of words in thousands of Wikipedia articles that 
transclude it.

Kerry


-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Haifeng Zhang
Sent: Saturday, 8 June 2019 7:44 AM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] Research on Edit Size

Dear folks,

Are there studies that have examined what might affect edit size (e.g., # of 
words add/delete/modify in each revision). I am especially interested in the 
impact of editor's tenure/experience.

Thanks,
Haifeng Zhang
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-07 Thread Kerry Raymond
Indeed, the purpose does  matter. Is the end goal the content similarity of 
articles themselves (perhaps say to detect articles that might be merged) or is 
the end goal the relatedness of topics represented by those articles? If the 
latter is the goal, then the Wikipedia category system relates articles with 
some commonality of topic, and distance between articles via the category 
hierarchy is an indicator of levels of relatedness. Similarly navboxes relate 
articles that have something in common, as do list articles. All of these three 
things are manually curated, and may be a much cheaper way to determine 
relatedness of topics than messing about with bags of words, etc. But it all 
really depends on the end goal.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Isaac Johnson
Sent: Wednesday, 8 May 2019 1:35 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Content similarity between two Wikipedia articles

Hey Haifeng,
On top of all the excellent answers provided, I'd also add that the answer to 
your question depends on what you want to use the similarity scores for.
For some insight into what it might mean to make choose one approach over 
another, see this recent publication:
https://dl.acm.org/citation.cfm?id=3213769

At a high level, I'd say that there are three ways you might approach article 
similarity on Wikipedia:
* Reader similarity: two articles are similar if the same people who read one 
also frequently read the other. Navigation embeddings that implement this 
definition based on page views were generated last in 2017, so newer articles 
will not be represented, but here is the dataset [
https://figshare.com/articles/Wikipedia_Vectors/3146878 ] and meta page [ 
https://meta.wikimedia.org/wiki/Research:Wikipedia_Navigation_Vectors ].
The clickstream dataset [
https://dumps.wikimedia.org/other/clickstream/readme.html ], which is more 
recent, might be used in a similar way.
* Content similarity: two articles are similar if they contain similar content 
-- i.e. in most cases, similar text. This covers most of the suggestions 
provided to you in this email chain. Some are simpler but are language-specific 
unless you make substantial modifications (e.g., ESA, the LDA model described 
here:
https://cs.stanford.edu/people/jure/pubs/wikipedia-www17.pdf) while others are 
more complicated but work across multiple languages (e.g., recent WSDM
paper: https://twitter.com/cervisiarius/status/1115510356976242688).
* Link similarity: two articles are similar if they link to similar articles. 
Generally, this approach involves creating a graph of Wikipedia's link 
structure and then using an approach such as node2vec to reduce the graph to 
article embeddings. I know less about the current approaches in this space, but 
some searching should turn up a variety of approaches -- e.g., Milne and 
Witten's 2008 approach [ 
http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf ], which is 
implemented in WikiBrain as Morten mentioned.

There are also other, more structured approaches like ORES drafttopic, which 
predicts which topics (based on WikiProjects) are most likely to apply to a 
given English Wikipedia article:
https://www.mediawiki.org/wiki/Talk:ORES/Draft_topic

On Tue, May 7, 2019 at 9:54 AM  wrote:

> Dear Haifeng,
>
>
> Would you not be able to use ordinary information retrieval techniques 
> such as bag-of-words/phrases and tfidf? Explicit semantic analysis 
> (ESA) uses this approach (though its primary focus is word semantic 
> similarity).
>
> There are a few papers for ESA:
> https://tools.wmflabs.org/scholia/topic/Q5421270
>
> I have also used it in "Open semantic analysis: The case of word level 
> semantics in Danish"
>
> http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/7029/pdf/imm7
> 029.pdf
>
>
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
>
> On 04/05/2019 13:47, Haifeng Zhang wrote:
> > Dear folks,
> >
> > Is there a way to compute content similarity between two Wikipedia
> articles?
> >
> > For example, I can think of representing each article as a vector of
> likelihoods over possible topics.
> >
> > But, I wonder there are other work people have already explored in 
> > the
> past.
> >
> >
> > Thanks,
> >
> > Haifeng
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Isaac Johnson -- Research Scientist -- Wikimedia Foundation 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



Re: [Wiki-research-l] Ways thru which articles could attract editors

2019-04-27 Thread Kerry Raymond
"Article quality" is quite a wide topic. I would imagine most good faith 
contributors believe they are improving the quality of an article with every 
edit. Do you have some specific type of quality improvement in mind? E.g. more 
citations, more content, fewer spelling errors?

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Haifeng Zhang
Sent: Sunday, 28 April 2019 7:53 AM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] Ways thru which articles could attract editors

Dear folks,

I wonder what are those mechanisms/events (in Wikipedia or WikiProjects) which 
may attract editors to improve article quality.

One example is Today's articles for improvement. Within WikipProjects, GA/FA 
nominations seem useful too.


Thanks,

Haifeng Zhang
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-18 Thread Kerry Raymond
The thing about sockpuppets is that we only know about the ones that have been 
detected (and some of them have been large groups of 100s of accounts). The 
problem is that we don’t know about the undetected ones. I am sure many of us 
have had suspicions about the behaviour of certain accounts but to request a 
sockpuppet investigation requires a level of evidence above suspicious 
behaviour (specifically identifying another account). New users with 
sophisticated editing skills and writing on topics associated with living 
individuals, businesses or products in a positive way often seem to me to be 
the kind of account likely to be doing undisclosed paid editing, and almost 
therefore certainly a sockpuppet of a paid PR person, but if each account 
writes about a different topic, it is difficult to work out what the other 
accounts might be to look for evidence of sockpuppeting.

 

How far underwater does the iceberg go?

 

Kerry 

 

From: Giovanni Luca Ciampaglia [mailto:glciamp...@gmail.com] 
Sent: Tuesday, 19 March 2019 11:37 AM
To: Research into Wikimedia content and communities 

Cc: Kerry Raymond 
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

 

Does anybody know how prevalent are sockpuppets? Has anybody tried estimating 
the percentage of editors that have created at least one additional account? 
(Legitimate or otherwise.) 

 

Giovanni 

 

On Mon, Mar 18, 2019, 20:20 Stuart A. Yeates mailto:syea...@gmail.com> > wrote:

In addition to Kerry's excellent examples there are users editing
wikipedia though TOR, the anonymity and censorship circumvention
network. These users face extra scrutiny.

cheers
stuart


--
...let us be heard from red core to black sky

On Tue, 19 Mar 2019 at 13:04, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:
>
> Apart from the legitimate alternate accounts and the illegitimate sockpuppet 
> accounts, there are other ways that alternate accounts exist.
>
> Occasional contributors often forget their username and/or password. Password 
> recovery isn't possible unless you provide an email address at sign-up (it's 
> optional, but you can add it later). So what such people then  do is just 
> create a new user account (I'm not sure there is anything else they can do). 
> I see this sort of behaviour a lot at events. The other variation of the 
> problem is that they did provide an email address but it is one not easily 
> accessible to them at the event (i.e. a librarian who signed up with a work 
> email address that cannot be accessed outside of the organisation).
>
> The other group of people with multiple accounts are those who edit 
> anonymously as serial IPs. The same person can use a number of IP numbers 
> over time. Often you don't realise it is the same person unless you see a lot 
> of their work and can see a pattern in it. For example, at the moment, there 
> is a person with a series of IP accounts that is  changing a common section 
> of a Queensland place article to be a subsection of another, who I notice on 
> my watchlist . This person appears to acquire a new IP address every week or 
> so, but the pattern of editing makes it obvious it's the same person behind 
> it. Whether or not an IP address can be considered "an account" depends on 
> your purposes. The one IP address can also be used by multiple people (e.g. 
> coming through a shared organisational network in a library or school). It is 
> claimed by some people that many new users do their first edits anonymously, 
> so if you are serious about studying "new contributors", then maybe you have 
> to look at anonymous editing. And also even regular contributors may 
> sometimes choose to edit anonymously, e.g. being in an unsecure IT 
> environment and reluctant to use their username/password in that situation 
> (particularly people with administrator or other significant access rights).
>
> Because I do outreach, I look for new accounts that turn up on my watchlist 
> and send them welcome messages etc. Because I also do training, I see a lot 
> of genuinely new people in action where I can observe their edits. So when I 
> see new accounts or IPs doing far more "sophisticated" edits than I see new 
> users do, I tend to suspect they are not genuinely new contributors.
>
> I think the best you can do is look for new accounts and be prepared to omit 
> any that show signs of sophisticated editing (either in terms of they are 
> doing technically or what they say on Talk pages or in edit summaries). For 
> example, no genuine new user will mention a policy (they don't know they 
> exist). Also genuine new users don't tend to edit that quickly, so any rapid 
> fire series of successful edits is unlikely to be a genuine new user.  I 
> think this inab

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-18 Thread Kerry Raymond
Apart from the legitimate alternate accounts and the illegitimate sockpuppet 
accounts, there are other ways that alternate accounts exist.

Occasional contributors often forget their username and/or password. Password 
recovery isn't possible unless you provide an email address at sign-up (it's 
optional, but you can add it later). So what such people then  do is just 
create a new user account (I'm not sure there is anything else they can do). I 
see this sort of behaviour a lot at events. The other variation of the problem 
is that they did provide an email address but it is one not easily accessible 
to them at the event (i.e. a librarian who signed up with a work email address 
that cannot be accessed outside of the organisation).

The other group of people with multiple accounts are those who edit anonymously 
as serial IPs. The same person can use a number of IP numbers over time. Often 
you don't realise it is the same person unless you see a lot of their work and 
can see a pattern in it. For example, at the moment, there is a person with a 
series of IP accounts that is  changing a common section of a Queensland place 
article to be a subsection of another, who I notice on my watchlist . This 
person appears to acquire a new IP address every week or so, but the pattern of 
editing makes it obvious it's the same person behind it. Whether or not an IP 
address can be considered "an account" depends on your purposes. The one IP 
address can also be used by multiple people (e.g. coming through a shared 
organisational network in a library or school). It is claimed by some people 
that many new users do their first edits anonymously, so if you are serious 
about studying "new contributors", then maybe you have to look at anonymous 
editing. And also even regular contributors may sometimes choose to edit 
anonymously, e.g. being in an unsecure IT environment and reluctant to use 
their username/password in that situation (particularly people with 
administrator or other significant access rights).

Because I do outreach, I look for new accounts that turn up on my watchlist and 
send them welcome messages etc. Because I also do training, I see a lot of 
genuinely new people in action where I can observe their edits. So when I see 
new accounts or IPs doing far more "sophisticated" edits than I see new users 
do, I tend to suspect they are not genuinely new contributors.

I think the best you can do is look for new accounts and be prepared to omit 
any that show signs of sophisticated editing (either in terms of they are doing 
technically or what they say on Talk pages or in edit summaries). For example, 
no genuine new user will mention a policy (they don't know they exist). Also 
genuine new users don't tend to edit that quickly, so any rapid fire series of 
successful edits is unlikely to be a genuine new user.  I think this inability 
to know if a new account represents a genuinely new user is an inherent 
limitation for your research and should be documented as such explaining the 
many circumstances in which new accounts might belong to non-new users.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Tuesday, 19 March 2019 5:27 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

Hi Haifeng,

Some users will state on user pages that an account is an alternate account. 
However, this practice is not followed by everyone, and those who do follow 
this practice aren't required to so in a uniform way.

Alternate accounts which are not labeled as such, and which are used for 
illegitimate purposes such as double voting, are an ongoing problem. You might 
be interested in the English Wikipedia page 
https://en.wikipedia.org/wiki/Wikipedia:Sock_puppetry.

Alternate accounts can also be used for legitimate purposes, such as people who 
have one account for their professional or academic activities and another 
account for their personal use.

Good luck with your project.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Thu, Mar 14, 2019 at 1:30 PM Haifeng Zhang 
wrote:

> Stuart,
>
> I'm building an agent-based simulation of Wikipedia collaboration.
>
> I would like my model to be empirically grounded, so I need to collect 
> data for new editors.
>
> Alternative accounts can be an issue, but I wonder is there a way to 
> identify editors who have multiple account?
>
>
> Thanks,
>
> Haifeng Zhang
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What instructors think about teaching with Wikipedia AFTER having tried it?

2019-02-10 Thread Kerry Raymond
I do it as a volunteer. There are no salaried staff at Wikimedia Australia. 

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Piotr Konieczny
Sent: Monday, 11 February 2019 1:20 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] What instructors think about teaching with 
Wikipedia AFTER having tried it?

Thank you for the very detailed story!

I don't know about US/Canada(?) where Wiki Edu operates, but recently I heard 
the explanation for why there is almost no outreach to universities in Poland 
despite (occasional) interest from the universities themselves: no funds / will 
 to hire a dedicated person for this, and the current salaried staff of the 
Polish chapter does not have sufficient time to answer all requests.

--

Piotr Konieczny, PhD
http://hanyang.academia.edu/PiotrKonieczny
http://scholar.google.com/citations?user=gdV8_AEJ
http://en.wikipedia.org/wiki/User:Piotrus

On 2/10/2019 3:16 AM, Kerry Raymond wrote:
> I supported a 2nd year Gender Studies course late last year. The lecturer had 
> heard about the Gender Gap in terms of content on Wikipedia and decided that 
> there would be a student assignment in which student could singly or in a 
> group write or expand a Wikipedia article. The lecturer had broken the 
> assignment down into a number of tasks to be completed by various dates, 
> which were roughly. 1. Pick a topic and explain why you chose it. 2. Write an 
> essay about the topic with citations  3. Write/expand the Wikipedia article.
>
> The lecturer had no personal experience at contributing to Wikipedia, but 
> assumed it would not be hard to do as it's the "encyclopedia anyone can edit" 
> but was wondering if there needed to be a session to teach the students how  
> to contribute to Wikipedia. By sheer chance the lecturer happened to be 
> chatting with one of the university librarians and mentioned this Wikipedia 
> assignment and that librarian happened to have done Wikipedia training at UQ 
> for groups of librarians and suggested that I might be contacted to do the 
> Wikipedia training.
>
> So I did a Wikipedia training session with the students (because of the 
> timetabling it was not possible to do  hands-on training but I figured, 
> rightly, undergraduates would pick on the "how to" with the Visual Editors 
> just with a presentation) but also addressed the policy side of Wikipedia (of 
> which the lecturer was completely unaware). This occurred before they had to 
> submit their essays so I got to talk about writing a good lede in advance of 
> them doing it (for those planning a new article). I also attend the 
> "edit-a-thon" afternoon where the student actually created or expanded the 
> Wikipedia articles (mostly copying and pasting their essay text but of course 
> had to re-do their citations in Wikipedia format) where I dealit with all the 
> usual event problems (people who did not create their account sufficiently in 
> advance, 6 user limit, shifting new articles that were created as Draft into 
> mainspace etc).  The outcome was that the lecturer and students were all 
> happy at the end of the afternoon, feeling that there had been some "real" 
> achievement from the assignment.  The articles were not too bad (I kept them 
> on my watchlist and all have survived and in some cases have been expanded 
> further by others). I did a bit of MoS tidying afterwards of course and, as 
> photos had not been part of the assignment, I also found and added some 
> photos where I could. About the worst thing that happened was a "essay" tag 
> on one of them.
>
> Like a number of edit-a-thons where I have been parachuted in mid-process, 
> there is no doubt in my mind that having an experienced Wikipedian in the 
> loop helps a lot as the known risks can be managed. I find undergraduate 
> students (who are mostly young and digitally-savvy) take to the Visual Editor 
> very easily (I gave them a one-page cheat sheet and most were fine with that, 
> generally seeking "how to " help only to do some complex things they could 
> see in other articles, "how do I make a table of contents" being the most 
> common). When we hit the 6 new account limit on one IP address, they quickly 
> grasped my explanation of what the problem was and that they should create 
> their accounts from their phones via their mobile data not the Wifi (older 
> people don't grasp this as easily in my experience). One student choosing to 
> use her USB mobile dongle as an alternative. There were some middle-aged and 
> older people in the group who tended to ask more "how to " questions but, on 
> the flip side, had generally followed my early a

Re: [Wiki-research-l] What instructors think about teaching with Wikipedia AFTER having tried it?

2019-02-09 Thread Kerry Raymond
Young people in particular don't tend to look for Help or Instructions. They 
tend to just jump in. They are more likely to want "live chat" help when they 
are in the midst of their problem.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jesús Tramullas
Sent: Saturday, 9 February 2019 8:39 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] What instructors think about teaching with 
Wikipedia AFTER having tried it?

Dear colleagues:

I work with Wikipedia in classroom since 2015-2016 
(https://es.wikipedia.org/wiki/Wikipedia:Proyecto_educativo/WikiDoc,_Universidad_de_Zaragoza).
 
Of course, I agree with the common problems about this kind of approach...

..but now I'm working on a specific area, asking the students: As new editor, 
What do you think about the help pages in Wikipedia? Have you used them? Are 
they helpful? Are they readable? Are they understandable?

So, my approach is to analyze the "technical documentation", identify problems 
and propose improvements.

Cheers,

Jesús
--
"Investigación básica es lo que hago cuando no sé lo que estoy haciendo."
"Basic research is what I am doing when I don't know what I am doing."
Wernher von Braun (1957)

--#
Ph.D. Jesús Tramullas
http://tramullas.com
Dept. Ciencias Documentación // Dept. of Information Studies Universidad de 
Zaragoza 50009 Zaragoza (España)
#--

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What instructors think about teaching with Wikipedia AFTER having tried it?

2019-02-09 Thread Kerry Raymond
I supported a 2nd year Gender Studies course late last year. The lecturer had 
heard about the Gender Gap in terms of content on Wikipedia and decided that 
there would be a student assignment in which student could singly or in a group 
write or expand a Wikipedia article. The lecturer had broken the assignment 
down into a number of tasks to be completed by various dates, which were 
roughly. 1. Pick a topic and explain why you chose it. 2. Write an essay about 
the topic with citations  3. Write/expand the Wikipedia article.

The lecturer had no personal experience at contributing to Wikipedia, but 
assumed it would not be hard to do as it's the "encyclopedia anyone can edit" 
but was wondering if there needed to be a session to teach the students how  to 
contribute to Wikipedia. By sheer chance the lecturer happened to be chatting 
with one of the university librarians and mentioned this Wikipedia assignment 
and that librarian happened to have done Wikipedia training at UQ for groups of 
librarians and suggested that I might be contacted to do the Wikipedia 
training. 

So I did a Wikipedia training session with the students (because of the 
timetabling it was not possible to do  hands-on training but I figured, 
rightly, undergraduates would pick on the "how to" with the Visual Editors just 
with a presentation) but also addressed the policy side of Wikipedia (of which 
the lecturer was completely unaware). This occurred before they had to submit 
their essays so I got to talk about writing a good lede in advance of them 
doing it (for those planning a new article). I also attend the "edit-a-thon" 
afternoon where the student actually created or expanded the Wikipedia articles 
(mostly copying and pasting their essay text but of course had to re-do their 
citations in Wikipedia format) where I dealit with all the usual event problems 
(people who did not create their account sufficiently in advance, 6 user limit, 
shifting new articles that were created as Draft into mainspace etc).  The 
outcome was that the lecturer and students were all happy at the end of the 
afternoon, feeling that there had been some "real" achievement from the 
assignment.  The articles were not too bad (I kept them on my watchlist and all 
have survived and in some cases have been expanded further by others). I did a 
bit of MoS tidying afterwards of course and, as photos had not been part of the 
assignment, I also found and added some photos where I could. About the worst 
thing that happened was a "essay" tag on one of them.

Like a number of edit-a-thons where I have been parachuted in mid-process, 
there is no doubt in my mind that having an experienced Wikipedian in the loop 
helps a lot as the known risks can be managed. I find undergraduate students 
(who are mostly young and digitally-savvy) take to the Visual Editor very 
easily (I gave them a one-page cheat sheet and most were fine with that, 
generally seeking "how to " help only to do some complex things they could see 
in other articles, "how do I make a table of contents" being the most common). 
When we hit the 6 new account limit on one IP address, they quickly grasped my 
explanation of what the problem was and that they should create their accounts 
from their phones via their mobile data not the Wifi (older people don't grasp 
this as easily in my experience). One student choosing to use her USB mobile 
dongle as an alternative. There were some middle-aged and older people in the 
group who tended to ask more "how to " questions but, on the flip side, had 
generally followed my early advice about creating their account in advance and 
practicing on their user page (so all were autoconfirmed users and didn't have 
those problems). 

However, I can see that without an experienced Wikipedian in the loop that 
things could have gone very badly. And this is the problem for me. I can 
generally help out IF I know about the plan in the first place.

As you might have seen in Signpost recently, there was some upset over a 
proposed experiment over giving out random barnstars. As I commented there, 
instead of all the wailing and gnashing of teeth that goes on in the Wikipedia 
community about such things, we would be much better served if we tried to find 
a way to communicate with universities about both edit-a-thons and research 
projects and provide them with some entrypoints into our community so we could 
help them with such things to everyone's mutual benefit. Relying on serendipity 
and personal contacts (which is how things currently work) isn't an ideal 
solution.

Kerry



-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jonathan Morgan
Sent: Saturday, 9 February 2019 4:07 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] What instructors think about teaching with 
Wikipedia AFTER having tried it?

Piotr,

I think this is an excellent topic, FWIW.


Re: [Wiki-research-l] User type context sensitivity to introduction sections.

2019-02-08 Thread Kerry Raymond
I think we might be missing the point here of the original request. I am a 
native fluent speaker of English and I have 4 university degrees. I don't need 
Simple English Wikipedia, but there are definitiely articles on English 
Wikipedia that I cannot read because they are not sufficiently introductory in 
terms of content. For example

https://en.wikipedia.org/wiki/Nucleotide

loses me pretty quickly as I have not studied biochemistry for some number of 
decades so I don't really understand/remember the terms in which nucleotides 
are defined within the article. But I don't think we need to have a technical 
solution. We probably need some simpler introductions to some topics, either 
within one article or as two separate articles, e.g.

https://en.wikipedia.org/wiki/Genetics
https://en.wikipedia.org/wiki/Introduction_to_genetics

Perhaps we need some navboxes that provide a sequence for reading through a 
number of articles in some sensible sequence to learn about a larger topic.

I think we have plenty of solutions with the tools at hand. I think we just 
need to identify the problematic articles and hope someone with the right 
expertise is willing to write the simple introduction or suggest a sequence of 
articles to be read or whatever is deemed to be the best way to cope with 
readers coming to the topic with a range of prior knowledge.

Kerry

 



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Why the world reads Wikipedia: beyond English

2019-01-24 Thread Kerry Raymond
I agree lack of alternative sources are likely to be a factor.

Another factor might be the time available. The high s-e countries may have a 
lot more competing for their eyeballs, eg  Netflicks etc. 

Also, on my experience of foreign students attending Australian universities, 
there may also be some cultural difference in how people think about education 
and learning. Some countries do appear to put a lot of emphasis on memorising 
and regurgitating slabs of textbooks and other rote learning rather than 
developing skills in applying that knowledge to solve a problem. Whereas I find 
myself increasingly learning by trying to absorb the main concepts and 
principles of a new topic but figuring I can lookup the detail if/when I 
actually need to apply that knowledge (particularly so in the WWW era, how to 
open a milk carton?,  just look at the YouTube video!). So I think there is a 
shift to just-in-time learning happening but this relies on the meta-skill of 
information searching. I suspect poorly paid and poorly educated teachers in 
low s-e countries are more likely to teach using the same methods that they 
were taught by. This may lead to the mindset of “if I read this Wikipedia 
article many times through, I will have mastered the topic”.

Having travelled in lower s-e countries in Africa recently, many people do see 
education as the key to a better future and actively invest in their children’s 
education for that reason. In Kenya I often heard the term “lean family” which 
meant a small number of children getting the best education the family could 
afford rather than the traditional large family. I think this was something 
actively promoted by their government. Many parents proudly talked about making 
a personal sacrifice of some kind which was undertaken to have more money for 
children’s education.

I think many of us in more developed economies see Wikipedia as useful. I 
suspect there are others who see it as potentially life-changing.

Sent from my iPad

> On 24 Jan 2019, at 7:33 am, Leila Zia  wrote:
> 
> Hi Micru,
> 
> One hypothesis that we have is that in countries with lower
> socio-economic status, the reader may not have access to a variety of
> different sources for their reading needs (the availability of printed
> material, books, ... can be more limited in these countries.). At the
> moment, we don't have enough data from a diverse enough subset of
> countries to be able to look into this. I'm hoping that in the future
> iterations we can sample by country and collect enough data to be able
> to validate or reject this hypothesis.
> 
> Do you have other hypotheses as why this may be happening?
> 
> Best,
> Leila
> 
> 
>> On Wed, Jan 23, 2019 at 3:02 PM David Cuenca Tudela  
>> wrote:
>> 
>> Hi Leila,
>> 
>> I'm curious about the in-depth reading differences according to the
>> socio-economic status. Why do you think such differences exist?
>> 
>> Regards,
>> Micru
>> 
>>> On Sat, Jan 19, 2019 at 1:15 AM Leila Zia  wrote:
>>> 
>>> Hi all,
>>> 
>>> As some of you know, we started a line of research back in 2016 to
>>> understand Wikipedia readers better. We published the first taxonomy
>>> of Wikipedia readers and we studied and characterized the reader types
>>> in English Wikipedia [1]. During the past 1+ year, we focused on
>>> learning about the potential differences of Wikipedia readers across
>>> languages based on the taxonomy built in [1]. We've learned a lot, and
>>> today we're sharing what we learnt with you.
>>> 
>>> Some pointers:
>>> * Publication: https://arxiv.org/abs/1812.00474
>>> * Data:
>>> https://figshare.com/articles/Why_the_World_Reads_Wikipedia/7579937/1
>>> * (under continuous improvement) Research page on meta:
>>> 
>>> https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour
>>> * Research showcase presentation:
>>> https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#December_2018
>>> * A series of presentations to WMF teams and community: Look for tasks
>>> under https://phabricator.wikimedia.org/T201699 with title "Present
>>> the results of WtWRW" for link to slides and more info when available.
>>> * We will send out a blog post about it hopefully soon. A blog post
>>> about the intermediate results is at
>>> https://wikimediafoundation.org/2018/03/15/why-the-world-reads-wikipedia/
>>> 
>>> In a nutshell:
>>> * We ran the taxonomy of Wikipedia readers in 14 languages and
>>> measured the prevalence of Wikipedia use-cases and characterized
>>> Wikipedia readers in these languages.
>>> * While we observe similarities in terms of the prevalence of the use
>>> cases as well as the way we can characterize readers, we can see that
>>> Wikipedia languages lend themselves to different distributions of
>>> readership and characteristics. In many cases, the one-size-fit-all
>>> solutions may simply not work for readers.
>>> * Intrinsic learning remains as the number one motivation for people
>>> to come to Wikipedia in th

Re: [Wiki-research-l] Vandalism

2019-01-16 Thread Kerry Raymond
And, FWIW, I don’t think we have a flag on an edit saying that is vandalism. We 
have a history that can show an edit that is reverted. On inspection of the 
edit summary of the reversion, there may be some textual clues e.g. “rvv” a 
common abbreviation for “reverting vandalism”. There may be a message in the 
reverted IP’s talk page that uses words that suggest vandalism (noting that 
many of these messages are templates and so have highly predictable structure, 
usually with initially neutral terms like “not constructive” escalating to the 
explicit use of the word “vandalism” in some form). However, these messages may 
not specifically link to the problematic edit so you would be looking for talk 
page messages appearing “shortly” after the revert of the edit.

Not all vandalism is immediately  detected; there may be a number of other 
edits intervening, which may make it impossible to revert.

Not all vandalism is removed with revert, it may occur by “normal editing” 
perhaps as part of a larger edit.

Not all reverted edits are vandalism. They may be well-intentioned but breach a 
Wikipedia policy (eg requirement for citation, present an opinion as a fact). 
Some acceptable edits get reverted for a range of (mostly unacceptable) reasons 
like gatekeeping, style errors, UI errors (if the GUI loads slowly, my click to 
say thanks sometimes turns into a revert!), etc. 

And finally, as someone who does her watch list diligently, sometimes you just 
can’t tell if an edit is vandalism. The classic is the small change in dates. 
If there is no citation or the citation is to a off-line resource or a 
deadlink, it may be impossible to tell if the changed information is a genuine 
correction or a deliberately damaging action. Obviously I may have my 
suspicions, but I do have the obligation to Assume Good Faith. It’s not easy.

Kerry



Sent from my iPad

> On 16 Jan 2019, at 9:03 pm, Thomas Stieve  
> wrote:
> 
> Dear Listserv,
> 
> Hope all is well. I am mapping IP address edits per country for 271
> language Wikipedias. I would like to exclude IP addresses that are
> vandalism. I was thinking of using the ipblocks table for the IP addresses
> to be excluded. Because this project is in so many different languages and
> my programming skills are intermediate, I would like to use the Wikipedia
> tables or registers that the Wikipedians in those language use to mark
> vandalism. If anyone has another idea, I would be most grateful. Perhaps I
> am missing a way that Wikipedians across languages are using to mark
> vandalism.
> 
> Thank you,
> Tom
> 
> 
> -- 
> Thomas Stieve
> Ph.D. Candidate
> School of Geography and Development
> University of Arizona
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Vandalism

2019-01-16 Thread Kerry Raymond
I’m not quite sure what you want. An IP address may be used by one or many 
anonymous contributors (workplaces, universities and schools can often appear 
to Wikipedia as a single IP address). Each of those contributors may make one 
or more edits. Each of those edits may be vandalism (a deliberate intention to 
damage and hopefully reverted), poor quality but good faith edits (which are 
reverted for a wide variety of reasons) or acceptable contributions.

Also there is a reluctance to block a known multi-user IP address because of 
misbehaviour by what appears to be one person.

So, when you say “IP addresses that are vandalism”, can you more specific about 
what you want or don’t want?

Kerry

Sent from my iPad

> On 16 Jan 2019, at 9:03 pm, Thomas Stieve  
> wrote:
> 
> Dear Listserv,
> 
> Hope all is well. I am mapping IP address edits per country for 271
> language Wikipedias. I would like to exclude IP addresses that are
> vandalism. I was thinking of using the ipblocks table for the IP addresses
> to be excluded. Because this project is in so many different languages and
> my programming skills are intermediate, I would like to use the Wikipedia
> tables or registers that the Wikipedians in those language use to mark
> vandalism. If anyone has another idea, I would be most grateful. Perhaps I
> am missing a way that Wikipedians across languages are using to mark
> vandalism.
> 
> Thank you,
> Tom
> 
> 
> -- 
> Thomas Stieve
> Ph.D. Candidate
> School of Geography and Development
> University of Arizona
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Invitations to participate in a research about Wikidata as a learning platform

2018-12-17 Thread Kerry Raymond
Other personal details did appear to be mandatory. For me, trying leave a page 
without answering some of the personal questions caused them to display with a 
pink background saying it was mandatory in red. That was my experience. 

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Shani Evenstein
Sent: Tuesday, 18 December 2018 9:02 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Invitations to participate in a research about 
Wikidata as a learning platform

Hello, Stuart.

Thank you for your feedback. A few comments on what you wrote --

* With the exception of username & email (which are mandatory), all the other 
personal details are not, so it is up to participants to decide on their level 
of interaction and involvement. Some people are happy to share the information 
requested, some less so.

* Every single piece of data collected is related to a research question(s).  
As an example, one part of the questionnaire focuses on community and how it 
influences learning. For this part specifically, we wanted to check (among 
other things) for a correlation between online activity on WD and participation 
in various social media platforms, such as the WD Facebook group and the 
telegram group, so a cross between the username and real name is needed. That 
does not suit everyone, and that's fine, but it's a question we'd like to 
answer and are therefore collecting this information from those willing to 
provide it.

* Not sure which "similar surveys" you are comparing this research to, but 
there was never any similar research about Wikidata. Even the ones I know of 
about Wikipedia, also do not research for exactly what I'm researching.
That said, many research papers I've read that deal with learners, try to infer 
from personal data such as age, occupation, gender etc. This is not uncommon 
and the terms of confidentiality were expressed clearly.

* When you write, "the overview suggests that further pages are going to ask 
for more personal information", that is absolutely right.
Every single question in this questionnaire is asking participants about their 
*personal experience* engaging with Wikidata as a learning platform, making it 
personal information. I'm unsure why you find that problematic.
That is what questionnaires do. They ask participants about their experience 
with X.

Finally, our community never had any similar research done on Wikidata, 
especially not in relation to Education, and I have naturally asked for the 
community's support.
This research cannot happen without the community. It reply on it heavily, 
unlike other (and very popular) "big data" approaches.
There are various way in which the community demonstrated its support -- some 
chose to fill out the questionnaire; some even agreed to participate in a 
follow up interview; some would not fill it out themselves, but have shared it 
with their local communities; some simply sent some good words and 'good luck' 
my way; and some sent me (welcome) feedback on things to improve.  They did so 
(and continue to do so) because they know it's an important topic, and I'd like 
to believe that it's also because they know me and trust me. They know that 
being part of the community, I will do whatever I can to use the information 
they shared with me for the good and for creating positive impact in Academia 
(which I have done for years with Wikipedia and now with Wikidata).

While I do not expect anyone to show support blindly, I do find your message a 
bit puzzling -- weather you meant it or not, your mail suggests that your are 
unsupportive of this research and your tone was dismissive, without portraying 
the situation accurately, nor by checking the details properly.
That is your prerogative, as is not filling out the questionnaire. But I would 
urge you to reconsider.
Take a closer look. Assume good faith. And you'll find that once you pass the 
personal info part, the questions are not at all intrusive, but rather focused, 
genuine, inquisitive and most importantly -- focused on the Wikidata experience 
you've had.

In short, I hope you reconsider. If you don't, that's fine as well.

Shani.

---
*Shani Evenstein Sigalov*
EdTech Innovation Strategist, NY/American Medical Program, Sackler School of 
Medicine, Tel Aviv University.
PhD Candidate, School of Education, Tel Aviv University.
Lecturer, Tel Aviv University.
Chairperson, WikiProject Medicine Foundation 
.
Chairperson, Wikipedia & Education User Group 
.
Chairperson, The Hebrew Literature Digitization Society 
.
Chief Editor, Project Ben-Yehuda .
*+972-525640648*


On Mon, Dec 17, 2018 at 11:52 PM Stuart A. Yeates  wrote:

> I find the person

Re: [Wiki-research-l] Readers of Wikipedia

2018-12-16 Thread Kerry Raymond
I’m not suggesting categories are bad. I certainly don’t want uncategorised 
articles. I also make use of hidden tracking categories to manage groups of 
articles associated with various projects. But we do have to recognise that it 
is editors that appear to make the most use of them. Eye-tracking studies on 
desktop and I believe some instrumentation in mobile viewing shows that readers 
don’t look at them (although I acknowledge that they may indirectly benefit the 
reader through improved search). I do outreach work (general talks about 
Wikipedia and edit training) and I know from those interactions that our 
readers have mostly never seen or used our categories, even many librarians 
(folks for whom categorisation is part of their fundamental way of working) 
appear not to have noticed our categories. 

 

What I am objecting to is what I see on my watchlist every day, many 
recategorisations into increasingly fine-grained categories. Also Categories 
for Discussion Speedy seems to be a way to constantly fiddle with the category 
tree (mostly just renaming) which then result in huge numbers of edits to 
rename the categories in all the affected articles. If you look at some of our 
top contributors, that’s what they do all day, yet goodness knows how much time 
is spent by the rest of us reviewing these very-low value edits on our 
watchlists. I would be very interested if anyone had any studies on the 
cost/benefit of various types of edit (maybe a job for ORES) against the 
benefit to the article (and hence the reader) and the consumed time (by all 
parties) of that edit. For example, vandalism would score strongly negative 
(damage to article content) but corresponding removal of that vandalism would 
not score as strongly positive, because it’s not a zero-sum game due to risk of 
exposure of the vandalism to the reader before the revert and due to the 
reviewer cost (I review many changed articles that have had an edit-revert 
sequence) and the window in which the vandalism may have been ex, so even 
though the impact on the content is net zero, the impact on everyone who 
reviews it needlessly is a net negative for the project). All edits (good or 
bad) have a reviewer cost. Do we know anything about reviewer costs of edits?

 

A couple of people have asked me about my mention of studies showing people 
don’t look below the references. I was referring to a presentation at Wikimania 
this year (URL to slides below). While the slides do not explicitly mention 
categories, it shows readers rarely get to the bottom of an article, where the 
categories lurk. 

 

https://upload.wikimedia.org/wikipedia/commons/e/e1/Which_parts_of_a_%28Wikipedia%29_article_are_actually_being_read_%28Wikimania_2018%29.pdf

 

I don’t know if there is more information available on the topic, but hopefully 
as it was WMF Research, someone on this list may be able to point us to more 
info.

 

Kerry

 

From: WereSpielChequers [mailto:werespielchequ...@gmail.com] 
Sent: Monday, 17 December 2018 1:26 AM
To: Kerry Raymond ; Research into Wikimedia content 
and communities 
Subject: Re: [Wiki-research-l] Readers of Wikipedia

 

I've long seen categorisation on wikipedia as a way to bring articles to the 
attention of those who follow certain categories. During the cleanup of 
unreferenced biographies a few year ago this was a useful adjunct, with several 
wikiprojects cleaning up all the articles legitimately categorised for them. 
Some of the other Wikiprojects did at least go through and prod or speedy the 
non-notables and hoaxes in their areas.

 

I'm pretty sure it still operates that way, categorisation of an uncategorised 
article sometimes brings it to the attention of people who know the topic.

 

And of course where the article doesn't contain the words in the category, 
categorisation then improves search. 

 

If like me you are a glass third full person categories make a useful 
contribution. 

 

 

On Sat, 15 Dec 2018 at 22:21, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

Pointy? I think you may misunderstand  my use of the term “hostage”. I don’t 
use it with the meaning of abducting people for ransom, but in the sense of 
“subject to things beyond our control”.



I agree entirely that Wikipedia should serve its readers and to that end “To 
do” lists are compiled with the intention of giving adequate coverage of topics 
perceived to be needed. Yet, many of those “To do” lists are full of redlinks 
years later because we have volunteer contributors whose interests / expertise 
may not align with the perceived needs. Whereas if Wikipedia employed its 
writers, it could direct them to write articles about required topics. It would 
be a wonderful thing if we could harness the volunteer energy that goes into 
largely unproductive activities like endless category reorganisation (given 
studies show readers rarely look below the reference section and don’t see or 
use the cat

Re: [Wiki-research-l] Readers of Wikipedia

2018-12-15 Thread Kerry Raymond
Pointy? I think you may misunderstand  my use of the term “hostage”. I don’t 
use it with the meaning of abducting people for ransom, but in the sense of 
“subject to things beyond our control”.

 

I agree entirely that Wikipedia should serve its readers and to that end “To 
do” lists are compiled with the intention of giving adequate coverage of topics 
perceived to be needed. Yet, many of those “To do” lists are full of redlinks 
years later because we have volunteer contributors whose interests / expertise 
may not align with the perceived needs. Whereas if Wikipedia employed its 
writers, it could direct them to write articles about required topics. It would 
be a wonderful thing if we could harness the volunteer energy that goes into 
largely unproductive activities like endless category reorganisation (given 
studies show readers rarely look below the reference section and don’t see or 
use the categories) into writing content that is actually needed. But alas it 
is not so.

 

Kerry

 

 

From: Ziko van Dijk [mailto:zvand...@gmail.com] 
Sent: Sunday, 16 December 2018 3:32 AM
To: Kerry Raymond ; Research into Wikimedia content 
and communities 
Subject: Re: [Wiki-research-l] Readers of Wikipedia

 

Hello,

Thanks for the link and the comments, Leila!

 

Am Fr., 14. Dez. 2018 um 00:44 Uhr schrieb Kerry Raymond 
mailto:kerry.raym...@gmail.com> >:

hostage to the interests of their contributors (unless they actively remove the 
material). That is, you get the topics that the contributors are willing and 
able to write, no matter what the intention might be.

 

That's a very pointy expression: "Hostage to the interests of their 
contributors"! In fact, WP should serve recipients, but the reality is often 
different. We alreday saw that Article Feedback Tool as a means to find out 
what recipients think. I would be happy with a new, less ambitious approach, 
where we don't expect recipients to contribute to the improvement of content 
but just want to know their opinion.

 

By the way, the distincion of large and short articles I have found in 
Collison's "Encyclopedias through the ages" (or similar) from 1966. It is not 
very prominent in there, but I have elaborated on the idea in 2015, with a 
distinction of definition articles, exposition articles, longer articles and 
dissertations.

 

An encyclopedia with "short" articles - or a meaningful combination of the four 
types above - would fit well to the original concept of hypertext not being an 
actual set of texts (or nodes), but being an individual's specific learning 
strategy or reading path.

 

Federico: remember, most of the oldest German texts (Old High German) deal with 
Biblical topics... :-)

 

Kind regards

Ziko

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Readers of Wikipedia

2018-12-13 Thread Kerry Raymond
I think the decision on the scope probably depends on whether people who speak 
that language also speak other languages. For example, many people in the 
Netherlands and Norway speak English very well. There may be less need to 
provide some topics in their own language if that topic is well-covered in 
Wikipedia so perhaps the focus can be more on local content. But if the 
speakers of that language are less likely to speak a "larger" language, then 
the need to provide a wide variety of non-local topics may be more important 
than providing information on local topics.

I don't know if any Wikipedias consciously make a decision to focus (or not) on 
local content, but even if they do, I presume they are hostage to the interests 
of their contributors (unless they actively remove the material). That is, you 
get the topics that the contributors are willing and able to write, no matter 
what the intention might be.

Australians are often surprised to find content about the Australian Outback 
appears in German Wikipedia and not in English Wikipedia but if you travel in 
the Outback, the reason is obvious -- the outback is full of German tourists in 
campervans and this is reflected in their Wikipedia contributions.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Ziko van Dijk
Sent: Thursday, 13 December 2018 8:02 PM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] Readers of Wikipedia

Hello,

I just watched the showcase of December 2018, thank you for the interesting 
contribution! It would be great it further research could have a look at 
questions such as language choice.
With regard to have more insight in what readers want, I struggled in the past 
with two questions:

Regionally important content: Should a Wikipedia language version concentrate 
on regional topics, or try to cover a large variety of topics?
Heinz Kloss in the 1970s introduced the idea of "eigenbezogene Inhalte", 
content, that is closely related to a language and its region, like local 
history, culture and typical crafts such as fishing on the Faroe islands or 
farming in the Alps. What do the readers in Hungary want? That hu.WP 
concentrates on Hungarian topics, while they consult English wikipedia for 
specialized technical topics or other countries?

Large or small articles: Some printed encyclopedias had relatively few, but 
large articles. Others segmented the content into many small articles.
(Think of Encyclopedia Britannica: Macropedia and Micropedia.) What do 
Wikipedia readers want? Do they prefer to read about a larger topic in one 
long, well structured article? Or several short ones, linking to each other?

I could imagine that a reader who is interested in information for work or 
school prefers long articles that provide an in-depth approach in order to 
became familiar with the overall topic (that is, what one would expect 
traditionally). And that "news" readers want to look up something quickly, in a 
short, simplyfing article.

Kind regards
Ziko
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Definition of the death of a wiki

2018-11-05 Thread Kerry Raymond
While perhaps not relevant to the original enquiry ...

For Wikipedia, do we need to know (or care) about the "death" of:

* a WikiProject
* a Portal
* a category
* an article

While there are articles on historical topics which might, once written, 
arguably not need further updating, there are many articles, categories, 
portals, projects which do as they involve topics that are current in the real 
world. As a simple example, for a town, we report on population and 
temperatures. For an electorate, there are elections and new people elected. 
Sports results etc.  If readers visit articles with out-of-date information, 
they may see less value in the article and therefore Wikipedia as a whole. Have 
we ever thought of having some system of tagging articles of this nature, 
either as a whole or in sections or in infobox fields that indicates when the 
information could be considered out of date. E.g. we have the Australian census 
every 5 years (the last being 2016). It takes them about a year to release the 
data to the public (so 2017 we had first access to the 2016 data), so we might 
think it desirable that all Australian places with census data have been 
updated by 2018 and certainly we would not think it acceptable if there were 
any with 2006 (or earlier) census data (except as historical information). Yet 
of course there are many such out-of-date Australian articles, but probably we 
don't have an easy way to know which ones. (Before anyone rushes to tell me 
about Wikidata solutions, I would point out that the average Australian 
Wikipedia editor neither knows nor cares about Wikidata and our attempt to add 
2016 census data from Wikidata more-or-less collapsed from lack of community 
support). I'm thinking here about solutions that Wikipedians might understand, 
such as templates which have a tracking category that is activated when the 
article misses an update deadline based on some template in the article.

Of course, on Wikipedia, many articles have the illusion of being actively 
maintained in the sense of edits occurring, thanks to vandalism and reverts of 
vandalism, endless re-categorisation, automated changes of a trivial nature 
(e.g. dash length), the Internet Archive Bot and other bots,  copyedits etc. As 
someone who does her watchlist diligently, I am seeing increasing activity over 
articles (my daily watchlist seems to be growing faster than the number of 
entries on my watchlist) which suggests we are more active, but, when I look at 
the edits, relatively few of them are updates to the information content. So 
activity should not be equated to information currency. Note, as anyone who 
deals with visible metrics soon learns, people game them and our edit counts 
are a classic example. I sometimes wonder what would happen if we suppressed 
that information. Or better still, counted something that we value more than 
"number of times clicked Save". What if we only counted the number of  
citations added (or counted it in addition to plain old edit count)? Would that 
drive behavioural change from less information-productive activities towards 
more information-productive activities?

But if we can have some measure of information-activity/inactivity for an 
article, then I presume we can aggregate this across any natural groupings of 
articles (e.g. Category trees, Portals, WikiProjects) to discover where we are 
stagnating and then let humans decide if that topic space is one that can 
stagnate (because it is historic) or one that must be updated periodically to 
be considered useful and whether the correct frequency of updates seems to be 
occurring, either macroscopically or (ideally) microscopically around 
particular time-sensitive factoids. 

Can we measure "information growth"?

Kerry



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-10-04 Thread Kerry Raymond
I am with you 100% on the principle that if we don't change how we do things, 
nothing will change in terms of our outcomes. But I guess what we are debating 
is what the change should be.

Our problem is indeed one of ideology as we have three  statements of ideology 
underpinning Wikipedia. We have the vision:

"Imagine a world in which every single person on the planet is given free 
access to the sum of all human knowledge. That's what we're doing."

We have the 5 Pillars which I assume we all know so I won't elaborate here

and we have the main page that says "Welcome to Wikipedia, the free 
encyclopaedia that anyone can edit." 

Frankly these various ideologies don't combine terribly well and I think that 
"anyone can edit" is something that we do have to re-think. At the end of the 
day we are building and (increasingly) maintaining an encyclopaedia. We do need 
adequately educated people to do this. The ability to research and write is not 
innate, most people have to learn it through a formal education process. Now I 
am not suggesting a formal education barrier to participation but really, if 
you can't cite, you can't write for Wikipedia. Maybe you can fill other roles 
in Wikipedia but not as a content writer. 

We absolutely do need new contributors. We know we have a contributor gap and a 
content gap and there is research that shows these are related. But I am not 
convinced that the vandals and self-promoters are part of our contributor gap. 
I suspect our bad faith editors are predominantly white male and 1st-world, and 
we have plenty of good faith contributors from that group already. Do we have 
any evidence that vandals turn into productive contributors? Have we surveyed 
our existing editor community on how many of them started out as a vandal?

Maybe we could turn CoI and bias around to be a motivator? A lot of the 
self-promoters seem to be quite well educated. Let's have some new namespaces 
e.g. "CV" (for CVs), "Essay" (for opinions). Maybe you get to the right to one 
of these for every N  productive edits you do in mainspace. Obviously they get 
displayed to the reader in a way that makes clear these are "personal views" or 
whatever words are appropriate so there is no misrepresentation of what they 
are. And of course they should be subject to our normal rules about puffery, 
hate speech etc. And they can choose to have or not have an associated talk 
page. But I would put one caveat on these new namespaces, verified identity. If 
you want to advertise yourself and your views, you need to stand up and be 
honest about who you are (but it doesn't have to be linked to your normal user 
name or IP editing for those who edit on "sensitive" topics). After enough good 
mainspace edits, you get a token that you can "cash in" for one of these 
personal statement pages. This works well for the paid editors. They can write 
good edits on mainspace topics to earn tokens to write CVs and personal 
statements for their clients (as long as their clients are happy to verify 
their real world identity). And as the easiest way to get a good edit is to 
revert vandalism, maybe we can solve our vandalism problem that way.

Maintenance is a problem. 2016 we had a census in Australia. We still have 
loads of town/suburb articles with 2011 census date, and I stumble over 2006 
data too. (Note this is not easy to automate as the internal identifiers used 
for the places are not stable from one census to the next -- if it was, we 
would have automated this). Let's set this kind of stuff up into a pipeline 
like Mechanical Turk as another way to get "good edits". Indeed let's consider 
whether the price of paying folks in the third world to do this kind of 
maintenance might be worth it. They are pretty cheap and they need the money.

We need to nurture the good faith new contributors. Could we have something 
that isn't "un-do: but say "re-do" which acts some kind of referral to a more 
caring part of Wikipedia than your average editor to help them learn how to do 
it beter? E.g. Teahouse type people.

But back to the contributer gap. We do need to do something about oral 
knowledge, such as we have in Australian Indigenous communities. At the moment, 
this is a verification problem. But Indigenous people don't have a verification 
problem. They know who their elders are and they know who they trust to hear 
their lore fromMaybe we need a family of templates, e.g. {{Oral Quandamooka}} 
that tells the reader that what's inside this box (or however we present it) is 
oral knowledge provided by" SoAndSo, Elder Of the Quandamooka People", and 
within such templates, normal verification does not apply but there is some 
culturally appropriate real world verification that is used to authorised 
certain user names to use that template. It might not be the respected elder 
themselves as they may not be technologically savvy but it might be someone 
they designate to assist them with the task. And of course, 

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-10-03 Thread Kerry Raymond
Stripping out a long email trail ...

I am not advocating lowering the BLP bar as there are genuine legal needs to 
prevent libel.

What I am advocating is not letting new users do their first edits in “high 
risk” articles. When I do training, I pick exercises for the group which 
deliberately take place in quiet backwaters of Wikipedia, eg add schools to 
local suburb articles. Such articles have low readership and low levels of 
watchers and no BLP considerations, i.e. low risk articles. If the newbie first 
edit is a bit of a mess, probably no reader will see it before it is fixed by a 
subsequent edit. They will be able to get help from me to fix it before anyone 
is harmed by it and before anyone reverts them. 

The “organic” newbie can dive into any article. It would be a very interesting 
research question to look at reverts and see if we can develop risk models that 
predict which articles are at higher risks of reverted edits (e.g. quality 
rating, length, type of article eg BLP, level of readership, number of active 
watchers, etc) and there might be separate models specifically for newbies 
revert risk and female newbie revert risk. 

Or we just simply calculate the proportion of  reverted edits and just use 
declare anything over some threshold as “high risk” and not bother finding out 
what the article characteristics are. We could also calculate what is the 
newbie revert rate. 

Then we have something actionable. We could treat the high risk articles (by 
predictive model or straight stats) as semi-protected and divert newbies from 
making direct edits. Or at least warn them before letting them loose. For that 
matter, warn any user if they are entering into a high conflict zone.

When you learn to drive a car, you normally start in the quiet streets,  not a 
busy high speed freeway, not narrow winding roads without guard rails up a 
mountain. Why shouldn’t we take the same attitude to Wikipedia? Start where it 
is safe.

Kerry
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia research on "productivity"

2018-10-02 Thread Kerry Raymond
They might be referring to this:

https://meta.wikimedia.org/wiki/Research:Measuring_edit_productivity

"Productivity" is not a term that Wikipedia editors use among themselves (or at 
least not in the circles I move in). I suspect it's more of a research term.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Alex Yarovoy
Sent: Tuesday, 2 October 2018 1:51 PM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Wikipedia research on "productivity"

I'm working on a research paper and one of the reviewers has commented that 
"There is even a Wikipedia measure called productivity, which is essentially 
the amount of text produced over time less the reverted text"

Anybody familiar with that metric of "productivity"?

Any pointers would be greatly appreciated.

Thanks in advance,
Ofer
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-29 Thread Kerry Raymond
As advice to an individual editor on how to deal with good faith but 
problematics edits, I would say give the newbie feedback on exactly what the 
specific problem is with their reverted edit explain how to fix it, and 
continue to watch the article and their user talk page to see how they are 
going, and keep offering help until they get it right.

For my long answer on how to do it at scale, see my other longer email.

Kerry
 

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Sunday, 30 September 2018 5:01 AM
To: Wiki Research-l 
Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

Kerry,

This discussion about reverts, combined with my recent experience on ENWP, 
makes me wonder if there's a way to make reverts feel less hostile on average. 
Do you have any ideas about how to do that?

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine ) 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-29 Thread Kerry Raymond
Part of the reason we have a problem with dealing with good faith new users is 
because we assume they understand things like we do. They don't.

Try and imagine you are a genuine good-faith new user (hard for us, but I get 
to see them face-to-face so I get some insight into their experience). Imagine 
that you have just spent quite a number of minutes making your first ever 
change to a Wikipedia article. You found it quite difficult, strange jargon, 
incomprehensible tool bars etc. Lots of things didn't work, so you explore more 
menu options, blah blah blah. But you finally prevailed! You saved your edit 
and you could see your change on the screen in the article. Hurray! Do a little 
dance to celebrate! Sacrifice a goat! I must show Mum!

Then the edit gets reverted.

The first question to ask is how does the user know it got reverted.

* The article does not show their edit when they look at it later; they do not 
know it was reverted

This is the likely scenario if they are not still logged in to their user 
account (or at a different IP address if they did an IP edit) They find out 
next time they look at the article. Remember how proud they are of that edit. 
They may show the article to someone "look how I changed Wikipedia, hey, why 
can't I see the change I made?".

Now how do they react to this? They may be thinking "maybe it's awaiting a 
review" (remember newbies don't know how things work) so they wait and wait ..

Maybe, perhaps after some waiting, they decide they must have not got it right. 
Remember they struggled to do that edit; they found it difficult; they can 
imagine that they did something wrong. So they might just give up thinking "I 
am not tech savvy enough to change Wikipedia". Or they might think "I have to 
give it another go and see if I can get it right this time. So they repeat the 
edit (possibly not being logged in) and presumably it gets reverted again.

* they may see a an alert or notification so they know it was reverted

If they are still logged in or at the same IP address, they may see an alert or 
notification. I say "may" because not being a new user I am not sure how they 
are shown a reverted edit. Someone else who knows will have to answer this. But 
I do know from face-to-face observation that new users often do not notice 
things in the user interface like alerts, notifications, message etc even when 
they remain logged-in. Their eye focus is entirely on the article content. Lots 
of studies of eye tracking and their heat maps show us that this is normal 
behaviour on most web pages, people are focussed on where they think is 
relevant to them. Since this user's experience of Wikipedia is 99.99% as a 
reader, they are 99.99% pre-programmed to look straight to the article content. 
As regular contributors, we are probably far more aware of things like alerts, 
notifications, etc (but equally would you notice a change in the elements of, 
say, the left hand tool bar as quickly).

Assuming they see that there is an alert or notification,  do they know to 
click the alert or notification to find out that their edit was reverted? 
Again, stuff we take for granted, but it's their first time. So they may still 
not know their edit has been reverted.

Assuming they managed to navigate the GUI to get to the revert notification, 
they might be seeing the edit summary on the reversion and/or a talk page entry 
(probably a Twinkle-or-other-tool template).

Edit summaries are by their very nature short and they can be empty, or very 
cryptic or use unfamiliar jargon or link off to pages full of more jargon 
[[WP:SOMEPOLICY]]. Messages on talk pages can  be longer but not necessarily 
any more helpful. For example, the default Twinkle response for a revert (level 
1 vandalism) says that the reverted edit "did not appear constructive" and 
points the user to the Sandbox (not helpful) or to the Help Desk (potentially 
helpful). Also, the user did an original VE edit, they may be unable to 
interpret a page they are pointed to which uses any markup example (which 
occurs if they have done something wrong technically rather than policy-wise).

If they got this far, it is very likely that although the user knows their edit 
was reverted, they may still not know why either in general or in particular 
about what was wrong with their edit. Or they may know what was wrong but be 
unclear on how to fix it. Why was my citation not reliable enough? Etc.

Assuming they have not given up, they will probably feel the need to talk to 
someone about their reverted edit. Depending on how they were notified of the 
revert, there are a range of places that they have been shown as a place to 
have such a conversation. These include their own user talk page, the user talk 
page of the person who wrote a message on their user talk page, the Help Desk, 
the Teahouse, the article Talk page, talk pages of Wikipedia policies, etc. So 
we don't know how they choose where to go but there are prob

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-29 Thread Kerry Raymond
I have seen this too in face-to-face situations. While it is COI, if it’s 
notable and written factually, I don’t worry too much (I might swing past it 
later and just remove any puffery that may have crept in). I do stop them 
writing about themselves or other living people with whom they may have a COI. 
There’s a fine line between “having an interest” and “having a conflict of 
interest” and I find the dead/living distinction tends to make a difference. An 
article about a dead person is unlikely to be promotional, which is the big 
concern with COI.

 

I find edit-a-thons have more risk around CoI and notability, particularly when 
the organisers have not provided a list of possible topics but let the 
participants choose their own (I am generally supporting these events as an 
experienced Wikipedian rather than organising them). Also they are often larger 
groups than training sessions so it is a lot more difficult for me to know what 
they are all writing about and be able to chat to them about why they chose 
that topic, so I am far less likely to be aware if there is CoI.

 

Kerry

 

From: Ziko van Dijk [mailto:zvand...@gmail.com] 
Sent: Saturday, 29 September 2018 10:21 PM
To: kerry.raym...@gmail.com
Cc: Research into Wikimedia content and communities 
; Rosie Stephenson-Goodknight 

Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

 

Hello Kerry,

 

Sorry, I did not see all the mails and the context before.

 

I remember a gentleman in a training lesson who wanted to write about his 
grandfather. Notability no problem, and no obvious bias. Why not assume Good 
Faith. But still, one might ask oneself whether this is an ideal situation. It 
is tricky. In general I totally agree that the hostility is a problem.

 

Kind regards

Ziko

 

 

Kerry Raymond mailto:kerry.raym...@gmail.com> > 
schrieb am Sa. 29. Sep. 2018 um 08:27:

Well, I run training and events. The folk who turn up to these are always good 
faith, typically middle-aged and older, mostly women, and of  above-average 
education for their age (our oldest Australians will not all have had the 
opportunity to go to high school) and generally acceptable IT skills. I think 
most of them are capable of being good contributors and their errors are mostly 
unintentional, e.g. copyright is not always well understood and so there are 
photo uploads from “family albums” or “our local history collection” where the 
provenance of the image is unknown  and hence its copyright status is unclear. 
But off-line activities like mine are too few in number to make a significant 
impact on en.WP. We have to get better at attracting and on-boarding people via 
on-line.

 

Obviously on my watchlist I see plenty of  blatant and subtle vandalism, so I 
am not naïve about that, but I do also see what appears to be good faith 
behaviour from newbies too. I suspect people who only see their watchlist have 
a more negative view about newbies than I do.

 

So, yes, we may have to filter out some of the good faith folks if their 
behaviour remains problematic, but reverting them for any small problem in 
their early edits certainly isn’t proving to be an effective strategy. 

 

Kerry

 

From: Ziko van Dijk [mailto:zvand...@gmail.com <mailto:zvand...@gmail.com> ] 
Sent: Saturday, 29 September 2018 3:27 PM
To: Research into Wikimedia content and communities 
mailto:wiki-research-l@lists.wikimedia.org> >; kerry.raym...@gmail.com 
<mailto:kerry.raym...@gmail.com> 
Cc: Rosie Stephenson-Goodknight mailto:rosiestep.w...@gmail.com> >


Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

 

Hello Kerry,

 

While I agree to most what you said, I think that the bigger picture should 
include that: newbies are not always good contributors, and not always 
good-faith contributors. And even if they have good faith, that does not mean 
that they can be trained to become good contributors. Dealing with newbies 
means always a filtering. MAybe different people are differently optimistic 
about the probability to make a newbie a good contributor.

 

Kind regards,

Ziko

 

Kerry Raymond mailto:kerry.raym...@gmail.com> > 
schrieb am Do. 27. Sep. 2018 um 06:47:

While I have no objection to the administrator training, I don't think most of 
the problem lies with administrators. There's a lot of biting of the good-faith 
newbies done by "ordinary" editors (although I have seen some admins do it 
too). And, while I agree that there are many good folk out there on en.WP, 
unfortunately the newbie tends to meet the other folk first or perhaps it's 
that 1 bad experience has more impact than one good experience.

Similarly while Arbcom's willingness to desysop folks is good, I doubt a newbie 
knows how or where to complain in the first instance. Also there's a high level 
of defensive reaction if they do. Some of my trainees have

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-29 Thread Kerry Raymond
Well, I run training and events. The folk who turn up to these are always good 
faith, typically middle-aged and older, mostly women, and of  above-average 
education for their age (our oldest Australians will not all have had the 
opportunity to go to high school) and generally acceptable IT skills. I think 
most of them are capable of being good contributors and their errors are mostly 
unintentional, e.g. copyright is not always well understood and so there are 
photo uploads from “family albums” or “our local history collection” where the 
provenance of the image is unknown  and hence its copyright status is unclear. 
But off-line activities like mine are too few in number to make a significant 
impact on en.WP. We have to get better at attracting and on-boarding people via 
on-line.

 

Obviously on my watchlist I see plenty of  blatant and subtle vandalism, so I 
am not naïve about that, but I do also see what appears to be good faith 
behaviour from newbies too. I suspect people who only see their watchlist have 
a more negative view about newbies than I do.

 

So, yes, we may have to filter out some of the good faith folks if their 
behaviour remains problematic, but reverting them for any small problem in 
their early edits certainly isn’t proving to be an effective strategy. 

 

Kerry

 

From: Ziko van Dijk [mailto:zvand...@gmail.com] 
Sent: Saturday, 29 September 2018 3:27 PM
To: Research into Wikimedia content and communities 
; kerry.raym...@gmail.com
Cc: Rosie Stephenson-Goodknight 
Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

 

Hello Kerry,

 

While I agree to most what you said, I think that the bigger picture should 
include that: newbies are not always good contributors, and not always 
good-faith contributors. And even if they have good faith, that does not mean 
that they can be trained to become good contributors. Dealing with newbies 
means always a filtering. MAybe different people are differently optimistic 
about the probability to make a newbie a good contributor.

 

Kind regards,

Ziko

 

Kerry Raymond mailto:kerry.raym...@gmail.com> > 
schrieb am Do. 27. Sep. 2018 um 06:47:

While I have no objection to the administrator training, I don't think most of 
the problem lies with administrators. There's a lot of biting of the good-faith 
newbies done by "ordinary" editors (although I have seen some admins do it 
too). And, while I agree that there are many good folk out there on en.WP, 
unfortunately the newbie tends to meet the other folk first or perhaps it's 
that 1 bad experience has more impact than one good experience.

Similarly while Arbcom's willingness to desysop folks is good, I doubt a newbie 
knows how or where to complain in the first instance. Also there's a high level 
of defensive reaction if they do. Some of my trainees have contacted me about 
being reverted for clearly good-faith edits on the most spurious of reasons. 
When I have restored their edit with a hopefully helpful explanation, I often 
get reverted too. If a newbie takes any action themselves, it is likely to be 
an undo and that road leads to 3RR block or at least a 3RR warning. The other 
action they take is to respond on their User Talk page (when there is a message 
there to respond to). However, such replies are usually ignored, whether the 
other user isn't watching for a reply or whether they just don't like their 
authority to be challenged, I don't know. But it rarely leads to a satisfactory 
resolution.

One of the problems we have with Wikipedia is that most of us tend to see it 
edit-by-edit (whether we are talking about a new edit or a revert of an edit), 
we don't ever see a "big picture" of a user's behaviour without a lot of 
tedious investigation (working through their recent contributions one by one). 
So, it's easy to think "I am not 100% sure that the edit/revert I saw was OK 
but I really don't have time to see if this is one-off or a consistent 
problem". Maybe we need a way to privately "express doubt" about an edit (in 
the way you can report a Facebook post). Then if someone starts getting too 
many "doubtful edits" per unit time (or whatever), it triggers an admin (or 
someone) to take a closer look at what that user is up to. I think if we had a 
lightweight way to express doubt about any edit, then we could use machine 
learning to detect patterns that suggest specific types of undesirable user 
behaviours that can really only be seen as a "big picture".

Given this is the research mailing list, I guess we should we talking about 
ways research can help with this problem.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org 
<mailto:wiki-research-l-boun...@lists.wikimedia.org> ] On Behalf Of Pine W
Sent: Wednesday, 26 September 2018 1:0

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-28 Thread Kerry Raymond
Yes, it may well be easier to try experiments on smaller Wikipedias where there 
isn't an immovable dominant culture that would strenuously resist the 
experiments. I understand the new on-boarding experiments are happening (or 
will happen soon) on Czech and South Korean Wikipedia, so there is an example. 

German Wikipedia (of its own choice) decided to experiment with making the 
Visual Editor the default for new users a couple of years ago and were happy 
with the result:

https://wikimania2017.wikimedia.org/wiki/Submissions/From_open_hostility_to_collaboration:_The_WMF,_the_VisualEditor_and_the_German-language_Wikipedia

So I don't see a problem conducting gender experiments on other Wikipedia. I 
guess they have to have a documented gender imbalance in the first place.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Gerard Meijssen
Sent: Friday, 28 September 2018 4:03 PM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

Hoi,
To move the needle on English Wikipedia, the numbers involved are huge. So at 
best things change incrementally. What fails most of the research is that it 
only considers English WIkipedia whereas changes are much easier on the smaller 
projects.

I do go as far that in order to become more inclusive we should stop focusing 
on English Wikipedia both in attention, spending and research and by English. 
Then again there are too many systemic impediments.
Thanks,
  GerardM

On Fri, 21 Sep 2018 at 02:44, Jonathan Morgan  wrote:

> (Re: Jonathan's 'Chilling Effect' theory and Kerry's call for 
> experiments to increase gender diversity)
>
> Kerry: In a magic world, where I could experiment with anything I 
> wanted to without having to get permission from communities, I would 
> experiment with enforceable codes of conduct that covered a wider 
> range of harassing and hostile behavior, coupled with robust & 
> confidential incident reporting and review tools. But that's not 
> really an 'experiment', that's a whole new social/software system.
>
> I actually think we're beyond 'experiments' when it comes to 
> increasing gender diversity. There are too many systemic factors 
> working against increasing non-male participation. In order to do that 
> you would need to increase newcomer retention dramatically, and we can 
> barely move the needle there on EnWiki, for both social and technical 
> reasons. But one non-technical intervention might be carefully 
> revising and re-scope policies like WP:NOTSOCIAL that are often used 
> to arbitrarily and aggressively shut down modes of communication, 
> self-expression, and collaboration that don't fit so-and-so's idea of 
> what it means to be Wikipedian.
>
> Initiatives that start off wiki, like women-oriented edit-a-thons and 
> outreach campaigns, are vitally important and could certainly be 
> supported better in terms of maintaining a sense of community among 
> participants once the event is over and they find they're now stuck 
> alone in hostile wiki-territory. But I'm not sure what the best 
> strategy is there, and these kind of initiatives are not large-scale 
> enough to make a large overall impact on active editor numbers on 
> their own, though they set important precedents, create 
> infrastructure, change the conversation, and do lead to new editors.
>
> The Community Health
>  
> team just hired a new researcher who has lots of experience in the 
> online harassment space. I don't feel comfortable announcing their 
> name yet, since they hasn't officially started, but I'll make sure 
> they subscribe to this list, and will point out this thread.
>
> Jonathan: This study  is 
> the one I cite. There's a more recent--paywalled!--follow up 
>  
> (expansion?) that I haven't read yet, but which may provide new 
> insights. And this short but powerful enthnographic study 
> . And this lab study 
>  
> on the gendered perceptions of feedback and anonymity. And 
> the--ancient, by now--former contributors survey 
>  s>, which IIRC shows that conflict fatigue is a significant reason 
> people leave. And of course there's a mountain of credible evidence at 
> this point that antisocial behaviors drive away newcomers, 
> irrespective of gender.
>
> Thanks for raising these questions,
>
> - J
>
> On Wed, Sep 19, 2018 at 3:21 AM, Jonathan Cardy < 
> werespielchequ...@gmail.com
> > wrote:
>
> > Thanks Pine,
> >
> > In case I didn’t make it clear, I am very much of the camp that IP
> editing
> > is our lifeline, 

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-27 Thread Kerry Raymond
Again, to bring this back to some research question, why do female newbie 
editors get reverted more?

Possible research question. Where (topic space) are the reverts happening and 
what types of reason given? Is there any sign that male/female are affected 
differently? To what extent does level of editing experience affect this?

One research side-question. Should we just be comparing male vs female or 
should we look at the unknowns? I know some people think that we have more 
women than we think but that they choose not to self-identify as such on 
Wikipedia. If we compared various statistics for no-gender editors with that of 
self-identifying male and female editors, does it give us any insight on what 
the likely gender composition of the no-gender group are. For example, if among 
self-identifying editors we known there is a 90-10 gender split, then if the 
no-gendered are also 90-10 split, then statistics about the non-gendered 
editors should show corresponding averages (male stat * 90 + female stat * 10). 
If they do not, then can we use a range to statistics to back calculate the 
likely gender split of the non-gendered group? Has anyone ever done this? 

Kerry

-Original Message-
From: Kerry Raymond [mailto:kerry.raym...@gmail.com] 
Sent: Friday, 28 September 2018 10:05 AM
To: 'Research into Wikimedia content and communities' 

Subject: RE: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

Pine

This paper has some good studies about gender and new editors and reverting

https://www.researchgate.net/profile/Shilad_Sen/publication/221367798_WPClubhouse_An_exploration_of_Wikipedia's_gender_imbalance/links/54bacca00cf253b50e2d0652/WPClubhouse-An-exploration-of-Wikipedias-gender-imbalance.pdf

It shows that both male and female newbies are equally likely to drop out after 
being reverted for good-faith edits, BUT that female newbies are more likely to 
be reverted than male newbies, leading to a greater proportion of them dropping 
out.

It also shows that male and female editors tend to be attracted to different 
types of topic. "There is a greater concentration of females in the People and 
Arts areas, while males focus more on Geography and Science." (see Table 1 in 
the paper). And their engagement with History seems lower.

So why are newbie women reverted more? This paper does not investigate that. 
But I think it has to be either than they are reverted because they are women 
(i.e. conscious discrimination) or because women's edits are less acceptable in 
some way.  

I have *hypothesised* that newbie women may get reverted more because women 
show higher interest in People but not in History suggesting women are more 
likely to be editing articles about living people than about dead people. BLP 
policy is stricter on verification compared with dead people topics,  or with 
topics in male-attracting topics like Geography and Science, so women are 
perhaps doing more BLP edits as newbies and more likely to be reverted because 
they fail to provide a citation or their citation comes from a source which may 
not be considered reliable (e.g. celebrity magazine).

If this could be established as at least a part of the problem, maybe there 
might be targeted solutions to address the problem. E.g. maybe newbies should 
not be allowed to edit articles which are BLP or have a high revert history 
(suggesting it's dangerous territory for some reason, e.g. real-world 
controversy, "ownership") and are deflected to the Talk page to suggest edits 
(as with a protected article or semi-protected article). Currently we 
auto-confirm user accounts at 10 edits or 4 days (from memory). But these 
thresholds are based on the likelihood of vandalism (early good-faith behaviour 
is a good predictor of future good faith behaviour). But, having trained 
people, I know that the auto-confirmation threshold should not be used as 
"beyond newbie" indicator; they are newbies for many more edits.

How many edits do you need to stop being a newbie? I don't know, but as I know 
myself with over 100k edits, if I edit an article outside my normal interests, 
I am far more likely to be reverted than in my regular topic area, so we can 
all be newbies in unfamiliar topic spaces. There is a lot of convention, 
pre-existing consensus and other "norms" in topic spaces that the "newbie to 
this topic" doesn't know. All editors in this situation may back off, but the 
established editor has a comfort zone (normal topic space) to return to, the 
total newbie does not.

Kerry


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-27 Thread Kerry Raymond
Pine

This paper has some good studies about gender and new editors and reverting

https://www.researchgate.net/profile/Shilad_Sen/publication/221367798_WPClubhouse_An_exploration_of_Wikipedia's_gender_imbalance/links/54bacca00cf253b50e2d0652/WPClubhouse-An-exploration-of-Wikipedias-gender-imbalance.pdf

It shows that both male and female newbies are equally likely to drop out after 
being reverted for good-faith edits, BUT that female newbies are more likely to 
be reverted than male newbies, leading to a greater proportion of them dropping 
out.

It also shows that male and female editors tend to be attracted to different 
types of topic. "There is a greater concentration of females in the People and 
Arts areas, while males focus more on Geography and Science." (see Table 1 in 
the paper). And their engagement with History seems lower.

So why are newbie women reverted more? This paper does not investigate that. 
But I think it has to be either than they are reverted because they are women 
(i.e. conscious discrimination) or because women's edits are less acceptable in 
some way.  

I have *hypothesised* that newbie women may get reverted more because women 
show higher interest in People but not in History suggesting women are more 
likely to be editing articles about living people than about dead people. BLP 
policy is stricter on verification compared with dead people topics,  or with 
topics in male-attracting topics like Geography and Science, so women are 
perhaps doing more BLP edits as newbies and more likely to be reverted because 
they fail to provide a citation or their citation comes from a source which may 
not be considered reliable (e.g. celebrity magazine).

If this could be established as at least a part of the problem, maybe there 
might be targeted solutions to address the problem. E.g. maybe newbies should 
not be allowed to edit articles which are BLP or have a high revert history 
(suggesting it's dangerous territory for some reason, e.g. real-world 
controversy, "ownership") and are deflected to the Talk page to suggest edits 
(as with a protected article or semi-protected article). Currently we 
auto-confirm user accounts at 10 edits or 4 days (from memory). But these 
thresholds are based on the likelihood of vandalism (early good-faith behaviour 
is a good predictor of future good faith behaviour). But, having trained 
people, I know that the auto-confirmation threshold should not be used as 
"beyond newbie" indicator; they are newbies for many more edits.

How many edits do you need to stop being a newbie? I don't know, but as I know 
myself with over 100k edits, if I edit an article outside my normal interests, 
I am far more likely to be reverted than in my regular topic area, so we can 
all be newbies in unfamiliar topic spaces. There is a lot of convention, 
pre-existing consensus and other "norms" in topic spaces that the "newbie to 
this topic" doesn't know. All editors in this situation may back off, but the 
established editor has a comfort zone (normal topic space) to return to, the 
total newbie does not.

Kerry


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-26 Thread Kerry Raymond
While I have no objection to the administrator training, I don't think most of 
the problem lies with administrators. There's a lot of biting of the good-faith 
newbies done by "ordinary" editors (although I have seen some admins do it 
too). And, while I agree that there are many good folk out there on en.WP, 
unfortunately the newbie tends to meet the other folk first or perhaps it's 
that 1 bad experience has more impact than one good experience.

Similarly while Arbcom's willingness to desysop folks is good, I doubt a newbie 
knows how or where to complain in the first instance. Also there's a high level 
of defensive reaction if they do. Some of my trainees have contacted me about 
being reverted for clearly good-faith edits on the most spurious of reasons. 
When I have restored their edit with a hopefully helpful explanation, I often 
get reverted too. If a newbie takes any action themselves, it is likely to be 
an undo and that road leads to 3RR block or at least a 3RR warning. The other 
action they take is to respond on their User Talk page (when there is a message 
there to respond to). However, such replies are usually ignored, whether the 
other user isn't watching for a reply or whether they just don't like their 
authority to be challenged, I don't know. But it rarely leads to a satisfactory 
resolution.

One of the problems we have with Wikipedia is that most of us tend to see it 
edit-by-edit (whether we are talking about a new edit or a revert of an edit), 
we don't ever see a "big picture" of a user's behaviour without a lot of 
tedious investigation (working through their recent contributions one by one). 
So, it's easy to think "I am not 100% sure that the edit/revert I saw was OK 
but I really don't have time to see if this is one-off or a consistent 
problem". Maybe we need a way to privately "express doubt" about an edit (in 
the way you can report a Facebook post). Then if someone starts getting too 
many "doubtful edits" per unit time (or whatever), it triggers an admin (or 
someone) to take a closer look at what that user is up to. I think if we had a 
lightweight way to express doubt about any edit, then we could use machine 
learning to detect patterns that suggest specific types of undesirable user 
behaviours that can really only be seen as a "big picture".

Given this is the research mailing list, I guess we should we talking about 
ways research can help with this problem.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Wednesday, 26 September 2018 1:07 PM
To: Wiki Research-l ; Rosie 
Stephenson-Goodknight 
Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

I'm appreciative that we're having this conversation - not in the sense that 
I'm happy with the status quo, but I'm glad that some of us are continuing to 
work on our persistent difficulties with contributor retention, civility, and 
diversity.

I've spent several hours on ENWP recently, and I've been surprised by the 
willingness of people to revert good-faith edits, sometimes with blunt 
commentary or with no explanation. I can understand how a newbie who 
experienced even one of these incidents would find it to be unpleasant, 
intimidating, or discouraging. Based on these experiences, I've decided that I 
should coach newbies to avoid taking reversions personally if their original 
contributions were in good faith.

I agree with Jonathan Morgan that WP:NOTSOCIAL can be overused.

Kerry, I appreciate your suggestions about about cultural change. I can think 
of two ways to influence culture on English Wikipedia in large-scale ways.

1. I think that there should be more and higher-quality training and continuing 
education for administrators in topics like policies, conflict resolution, 
communications skills, legal issues, and setting good examples.
I think that these trainings would be one way through which cultural change 
could gradually happen over time. For what it's worth, I think that there are 
many excellent administrators who do a lot of good work (which can be tedious 
and/or stressful) with little appreciation. Also, my impression is that ENWP 
Arbcom has become more willing over the years to remove admin privileges from 
admins who misuse their tools. I recall having a discussion awhile back with 
Rosie on the topic of training for administrators, and I'm adding her to this 
email chain as an invitation for her to participate in this discussion. I think 
that offering training to administrators could be helpful in facilitating 
changes to ENWP culture.

2. I think that I can encourage civil participation in ENWP in the context of 
my training project 

that I'm hoping that WMF will continue to fund. ENWP is a complex and sometimes 
emotionally difficult environment, an

Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-20 Thread Kerry Raymond
I agree there are some systemic factors that may prevent us achieving 50-50 
male-female participation (or in these enlightened non-binary times 49-49-2). 
Studies continue to show that wives still spend more hours at domestic tasks 
than their husbands, even when both are in full-time employment, and clearly 
less free time is less time for Wikipedia. But still men now do more housework 
than they once did. (My husband would argue that I have never let housework 
take priority over Wikipedia, but maybe I'm not typical!). Similarly, we have 
not yet seen pay rates for women reach parity with men but they are moving 
closer. A gender balance of 90-10 that might once have been the norm in many 
occupations is now unusual. Wikipedia is a child of the 21st century; one might 
expect it to more closely reflect the societal norms of this century not the 
19th century.

Women use wikis like Confluence in workplaces without apparent difficulty. But 
I note that modern for-profit wikis have visual editing and tools that 
import/export from Word as normal modes of contribution.

I agree entirely with you about outreach and off-wiki activities. I said when 
there was the big push to "solve the women problem" by such events that it 
wouldn't make the difference because the problem is on-wiki. The majority of 
people who attend my training class and come to the events I support are women. 
It's not women can't do it. It's not that they don't want to do. As you say, 
it's just that it's such an unpleasant environment to do it in and that's what 
women don't like. For that matter, a lot of men don't like it either. 

What shall we write on Wikipedia's tombstone? "Wikipedia: an encyclopedia 
written by the most unpleasant people"?

Can one create cultural change? Yes, I've seen it done in organisations. You 
tell people what the new rules are, you illustrate with examples of acceptable 
and unacceptable behaviours. You offer a voluntary redundancy program for those 
who don't wish to stay and you make clear it that those who wish to stay and 
continue to engage in the unacceptable behaviours will be "managed out" through 
performance reviews. You run surveys that measure your culture throughout the 
whole process. Interestingly the cultural change almost always involved being 
less critical, more collaborative, less micromanaged, more goal-oriented, more 
self-starting, many of which I would say apply here (except perhaps for being 
more self-starting, I don't think that's our problem).

En.WP can change but WMF will have to take a stand and state what the new 
culture is going to be. En.WP will not change of its own accord; we have years 
of evidence to demonstrate that.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jonathan Morgan
Sent: Friday, 21 September 2018 10:44 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

(Re: Jonathan's 'Chilling Effect' theory and Kerry's call for experiments to 
increase gender diversity)

Kerry: In a magic world, where I could experiment with anything I wanted to 
without having to get permission from communities, I would experiment with 
enforceable codes of conduct that covered a wider range of harassing and 
hostile behavior, coupled with robust & confidential incident reporting and 
review tools. But that's not really an 'experiment', that's a whole new 
social/software system.

I actually think we're beyond 'experiments' when it comes to increasing gender 
diversity. There are too many systemic factors working against increasing 
non-male participation. In order to do that you would need to increase newcomer 
retention dramatically, and we can barely move the needle there on EnWiki, for 
both social and technical reasons. But one non-technical intervention might be 
carefully revising and re-scope policies like WP:NOTSOCIAL that are often used 
to arbitrarily and aggressively shut down modes of communication, 
self-expression, and collaboration that don't fit so-and-so's idea of what it 
means to be Wikipedian.

Initiatives that start off wiki, like women-oriented edit-a-thons and outreach 
campaigns, are vitally important and could certainly be supported better in 
terms of maintaining a sense of community among participants once the event is 
over and they find they're now stuck alone in hostile wiki-territory. But I'm 
not sure what the best strategy is there, and these kind of initiatives are not 
large-scale enough to make a large overall impact on active editor numbers on 
their own, though they set important precedents, create infrastructure, change 
the conversation, and do lead to new editors.

The Community Health
 team just 
hired a new researcher who has lots of experience in the online harassment 
space. I don't feel comfortable a

Re: [Wiki-research-l] Anonymous editing

2018-09-20 Thread Kerry Raymond
What was the conclusion? 

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Andrew Lih
Sent: Friday, 21 September 2018 4:03 AM
To: Research into Wikimedia content and communities 

Cc: wiki-research-l-requ...@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Anonymous editing

This might be interesting, from Wikimania 2015

https://wikimania2015.wikimedia.org/wiki/Submissions/The_Effect_of_Blocking_IP_Editing:_Evidence_from_Wikia

Video here:

https://archive.org/details/videoeditserver-92



On Wed, Sep 19, 2018 at 12:54 PM Ziko van Dijk  wrote:

> Hello Kevin Crowston,
>
> THank you for the link. I have read your paper about the initial phase 
> and profited very much from it.
>
> My personal opinion on UP editing, not backed by research: IP editing 
> has negative social consequences for the community. This negative side 
> is not quite visible when only looking quantitatively at huge data.
>
> Kind regards
> Ziko
>
>
>
> Kevin G Crowston  schrieb am Mi. 19. Sep. 2018 um 18:41:
>
> > Jonathan Cardy  > werespielchequ...@gmail.com>> wrote:
> >
> > In case I didn’t make it clear, I am very much of the camp that IP
> editing
> > is our lifeline, the way we recruit new members.
> >
> > Tangentially elated to this question, we have a forthcoming paper at 
> > the CSCW conference about how research conclusions change when 
> > anonymous work (e.g., IP editing) is taken into account. We looked 
> > at data from a
> citizen
> > science project. Short answer: it makes a difference.
> >
> > The paper isn’t up on the ACM DL yet, but you can see it here:
> > https://crowston.syr.edu/node/756
> >
> > Doing the study requires access to IP addresses for logged in users, 
> > so someone at WMF would have to do the study for Wikipedia, which 
> > would be really interesting and would speak to the question of 
> > whether IP editing
> is
> > a gateway to further editing.
> >
> >
> > Kevin Crowston
> > Associate Dean for Research, Distinguished Professor of Information
> Science
> > School of Information Studies
> >
> > +1 (315) 443.1676
> > crows...@syr.edu
> >
> > 348 Hinds Hall, Syracuse, NY 13244
> > crowston.syr.edu 
> >
> > Syracuse University
> > Most recent publication:  Kevin Crowston, Isabelle Fagnot. (2018). 
> > Stages of motivation for contributing user-generated content: A 
> > theory and empirical test. International Journal of Human-Computer 
> > Studies, 109, 89-101,  doi: 10.1016/j.ijhcs.2017.08.005< 
> > http://dx.doi.org/10.1016/j.ijhcs.2017.08.005> .
> >
> > Check out our new research coordination network on Work in the Age 
> > of Intelligent Machine:  http://waim.network/
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
-Andrew Lih
Author of The Wikipedia Revolution
US National Archives Citizen Archivist of the Year (2016) Knight Foundation 
grant recipient - Wikipedia Space (2015) Wikimedia DC - Outreach and GLAM
Previously: professor of journalism and communications, American University, 
Columbia University, USC
---
Email: and...@andrewlih.com
WEB: https://muckrack.com/fuzheado
PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-19 Thread Kerry Raymond
Instead of putting down every idea as not being able to work without the 
benefit of an experiment, let's reverse the question.

Researchers, forgetting for a moment whether the community would accept it, if 
you were asked by the WMF BoT to make recommendations on experiments to run on 
en.WP to try to make it more attractive to women (since that's the aspect of 
diversity on which we seem to have the most data and the most research), what 
changes would you suggest for the experiment and why?

Let's at least get the ideas onto the table before knocking them off.

Or do we genuinely believe this is something that cannot be solved?

Kerry



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-18 Thread Kerry Raymond
Pine, I would absolutely disagree with you about off-wiki transparency. Why 
should a woman have to publicly disclose the contents of a thoroughly 
disgusting sexual email for public entertainment because they reverted some 
guy's edit. Why should a women be expected to provide details of an physical 
unwanted contact at an event for other men to pontificate about?  That's what 
transparency would mean.  The right of the 90% of Wikipedia contributors who 
are men to get to decide if a woman has the right to be offended by these 
things. Let's put it all out there in the open so everyone can get involved.

"Couldn't it just have been a friendly hug?". "So did his hand actually tweak 
your nipple or just brush part of your breast?" And so on.

And of course anyone in the world with a web browser could watch on too, such 
as the women's partner, her parents, her children, her colleagues. And of 
course IPs and new accounts could come along and join in the conversation and 
get involved too in the interrogation. "How lowcut was your dress? Did you have 
a bra on?"

Transparency would not work off-wiki and I don't think it works on-wiki for 
harassment issues. You might think it does because I suspect a lot of stuff 
doesn't get reported on the public forums. The folks in private process (such 
as oversight) probably see a lot of ugly stuff that the rest of us don't, or 
the woman just walks away from Wikipedia because they don't know there are 
private ways to report problems or they think it's easier just to walk away.

If you want to address diversity, I think you have to address the need for 
privacy in complaints processes. Although I have only outlined issues relating 
to women here, I am sure there are similar issues for people of other races, 
other religions, other cultures and so on.

Kerry




___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are published!

2018-09-18 Thread Kerry Raymond
It comes as no great surprise to me to see these survey results show very 
little change in matters of some concern (e.g. diversity, community health). 
Quite simply, if you don't change the system, then don't expect the outcomes to 
change. I can't speak about most projects but I don't see any change on en.WP 
in terms of how it operates since the last WMF strategic plan published in 
2011. We had a non-diverse toxic culture then; nothing changes; culture 
remained the same. Our active editor numbers go down, the number of articles to 
be maintained goes up, do the maths and see the long-term problem. Admin 
numbers are also declining.

One big potentially positive change was the Visual Editor. WMF built the Visual 
Editor specifically to open up editing to a wider ground of users and, as 
someone who does training for new users, it is a game changer for making it 
easier for new users. However, en.WP didn't change. VE is not the default for 
new editors on en.WP. It is not enabled for en.WP talk pages, project pages, or 
even the Teahouse, or any forum where new users might report problems or 
harassment etc. Almost any how-to help page gives information only for source 
editor users. Commons has blocked new users from using the VE to upload 
own-work photos (and no useful error message is provided to tell them what to 
do - just something generic like "server error" is returned because Commons 
just "fails" the upload and doesn't pass back a reason to the VE).

The old adage "praise in public, criticise in private" remains inverted in the 
world of Wikipedia. Everyone can see reverted edits and the criticisms on User 
Talk pages. Meanwhile "Thanks" (our lightest weight way to praise) is 
effectively private (yeah, I know there is a public log, but at most it tells 
you who likes who). And what the public log does show is that most people never 
thank anyone anyway, which again speaks volume about our culture. We are all 
for transparency except curiously when thanking for a particular edit. 
Transparency leads to a lack of privacy that comes with it is a turn-off to 
some new users. I know from training some new users don't think it's OK that 
everyone can read their User Talk page or that their entire contribution 
history is visible to all. They generally believe that if they were to 
misbehave, then of course someone in authority (admins in our world) should be 
able to look at such things for the purposes of keeping the place safe and 
functioning effectively, but they don't see why just anyone should be able to 
monitor them, which is a means by which you can stalk someone or wikihound them 
on Wikipedia.  Interestingly pretty much all of those who raise these concerns 
are women, who are, in real life, the most common victims of privacy invasions 
(think "up-skirt-ing" vs "up-trouser-ing", think Peeping Tom vs Peeping 
Tomasina) and stalking. So should we look at trading off some transparency in 
order to get more diversity?

Vandalism. Many years ago, when I questioned our very soft policy on vandalism 
(it takes 4 to allow you to request to block an account), I was told that 
"yeah, there is a lot of vandalism now but Wikipedia is new and once people 
realise its value and that vandals get blocked, it will stop happening over 
time". Sadly nobody told the vandals this, as, based on my watchlist, they are 
still very active and still mostly IPs. I note we have not changed our IP 
policy or our pseudonym account policy; editors remain as non-real-world 
accountable as always. As many online newspapers and other forums are turning 
off comments as they have learned that anonymous/pseudo accounts lead to 
completely unproductive name calling, defamatory comments, and not the 
constructive civil debate envisaged, yet at en.WP we persist in believing that 
the same approach can create a positive collaborative culture, which clearly it 
has not.

There's no willingness even to experiment with anything that might change the 
culture and I see little likelihood that en.WP's culture will change of its own 
accord.

However, there is one easy win for diversity at WMF. Start diversifying the WMF 
livestream times. Every WMF livestream is usually between 2-4am here in 
Australia so I'd like to see a bit of support for the Global East diversity by 
shifting the livestreams so everyone gets a chance to participate live. One 
small step that WMF could take ... 

Kerry 

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Saturday, 15 September 2018 1:52 PM
To: Wiki Research-l 
Subject: Re: [Wiki-research-l] Results from 2018 global Wikimedia survey are 
published!

Hi Edward,

Thanks for this publication. This research is likely to be of interest to the 
WikimediaAnnounce-l (and by extension, Wikimedia-l) and Wikitech-l subscribers, 
so I suggest that you cross-post this publication to those lists.

After reading this report, I have a 

Re: [Wiki-research-l] where did I read about predicting user conflicts?

2018-09-18 Thread Kerry Raymond
Thank  you, that was the one I was looking for!

 

Thank you too to the other suggestions that people have sent me. While they 
weren’t exactly what I was looking for, they were all interesting reading 
nonetheless. 

 

Kerry

 

 

From: Tilman Bayer [mailto:tba...@wikimedia.org] 
Sent: Tuesday, 18 September 2018 3:44 AM
To: kerry.raym...@gmail.com; Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] where did I read about predicting user conflicts?

 

Maybe it was this research ? 
https://blog.wikimedia.org/2018/06/13/conversations-gone-awry/ 

 

Or perhaps you were recalling the talk page research summarized in this year's  
<https://wikimania2018.wikimedia.org/wiki/Program/State_of_Wikimedia_Research_2017-2018>
 "State of Wikimedia Research" Wikimania presentation? 
https://mako.cc/talks/201807-wikimania_research.pdf 

 

On Sun, Sep 16, 2018 at 2:27 AM, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

Some time in the last few months (possibly at Wikimania) someone pointed me
at some research about predicting the outcome of Wikipedia consensus
building from the language they were using in Talk. I think it was either
research in progress or recently completed.



As I recall, the main "take home" message was that discussions where "you"
started to be used tended to end up in conflict and that discussions that
avoided "you" were more likely to resolve amicably.



If this rings any bells for you, can you please point me at it please.



Thanks



Kerry





___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org 
<mailto:Wiki-research-l@lists.wikimedia.org> 
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l





 

-- 

Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] where did I read about predicting user conflicts?

2018-09-16 Thread Kerry Raymond
Some time in the last few months (possibly at Wikimania) someone pointed me
at some research about predicting the outcome of Wikipedia consensus
building from the language they were using in Talk. I think it was either
research in progress or recently completed.

 

As I recall, the main "take home" message was that discussions where "you"
started to be used tended to end up in conflict and that discussions that
avoided "you" were more likely to resolve amicably.

 

If this rings any bells for you, can you please point me at it please.

 

Thanks

 

Kerry

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] New viz.: Wikipedias, participation per language

2018-09-13 Thread Kerry Raymond
From my knowledge of the Australian census and what I can find on the 
Australian Bureau of Statistics website, we don't have this information about 
Australians either. It seems we don't know what other languages people can 
speak. The only statistic available is *households* which speak a language 
other than English, which greatly under-estimates the ability of any individual 
to speak that language as it depends on who they are living with and fails to 
tell us how well that other language is spoken by any individual.

This issue came up for Australian Wikipedians in connection with Indigenous 
languages. Despite the fact that we get asked to provide information on 
Wikipedia on the number of people who speak either any Indigenous language or a 
particular Indigenous language, we have no ability to answer that question 
except for whole households. And since (depending on how you define "language") 
there were 250+ Indigenous languages (with even more sub-dialects), even a 
household entirely composed of Indigenous people may not have a common 
Indigenous language to speak at home.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Federico Leva (Nemo)
Sent: Friday, 14 September 2018 12:16 AM
To: Research into Wikimedia content and communities 
; Samuel Klein 
Subject: Re: [Wiki-research-l] New viz.: Wikipedias, participation per language

Always nice to see language data presented in an appealing way!

Samuel Klein, 10/09/2018 23:27:
> Do we have data on "# of speakers of language X who don't speak a 
> better-covered lang as a secondary language"?

I usually have a very hard time finding such data from official/reliable 
sources, even for EU languages. (I usually search for CLDR purposes.)

Federico

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Quarry query

2018-07-29 Thread Kerry Raymond
Ignore my request. The penny dropped. I needed the iwlinks table

 

https://quarry.wmflabs.org/query/28618

 

Kerry

 

 

From: Kerry Raymond [mailto:kerry.raym...@gmail.com] 
Sent: Monday, 30 July 2018 12:53 PM
To: 'Research into Wikimedia content and communities'

Subject: Quarry query

 

I'm trying to work out how to link an en.WP article to its commons category
using Quarry (the end purpose is create the datasets needed for Wiki Loves
Monuments in time for September).

 

I managed to get this far 

 

https://quarry.wmflabs.org/query/28618

 

which tells me which Wikipedia articles I need and whether they have a
commons category template but it doesn't get me to the actual commons
category itself. Now if I look at any of these articles in the browser in
the normal way, I can see a link to the Commons category on the left-hand
tool bar, but I can't figure out which table in Quarry has this connection.

 

Any pointers to the right table would be a great help. Surely I can't be the
first person to want to generate a table of data for WLM using Quarry.

 

Kerry

 

 

 

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Quarry query

2018-07-29 Thread Kerry Raymond
I'm trying to work out how to link an en.WP article to its commons category
using Quarry (the end purpose is create the datasets needed for Wiki Loves
Monuments in time for September).

 

I managed to get this far 

 

https://quarry.wmflabs.org/query/28618

 

which tells me which Wikipedia articles I need and whether they have a
commons category template but it doesn't get me to the actual commons
category itself. Now if I look at any of these articles in the browser in
the normal way, I can see a link to the Commons category on the left-hand
tool bar, but I can't figure out which table in Quarry has this connection.

 

Any pointers to the right table would be a great help. Surely I can't be the
first person to want to generate a table of data for WLM using Quarry.

 

Kerry

 

 

 

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Country (culture...) as a factor in contributing to collective intelligence projects

2018-07-25 Thread Kerry Raymond
Another issue in terms of choice of language to contribute in could relate to 
their motivation to add the content and presumed audience for the content. A 
multi-lingual person might decide to write about (say) magnetism in English (or 
other widely spoken language) in the belief that magnetism is of worldwide 
interest, but might choose to write about a local folk story  in a more local 
language in the belief that it is likely to be of interest only to local people.

Also given that there are different policies on different Wikipedias, it may be 
that a topic might not pass notability on English Wikipedia but be entirely 
acceptable on another Wikipedia.

Also, my observation of English Wikipedia is that regular contributors tend to 
divide into article-starters (a smaller group) and article-expanders (a much 
larger group). If there are cultural reasons (or Wikipedia policy reasons) why 
people fluent in one language are less likely to be article starters, this may 
limit the range of topics for the article-expanders to work on and hence the 
growth of the encyclopedia overall. There may also be cultural reasons why 
certain types of article are not started in some Wikipedias, e.g. popular 
culture articles (e.g. Pokemon characters) might not be seen as "encyclopedic" 
in some cultures.

As to the specific difference between Polish Wikipedia and South Korean 
Wikipedia, I would observe that South Korea is a nation obsessed with computer 
gaming both for personal leisure through to professional sport, and it is a 
very time-consuming passion.

https://en.wikipedia.org/wiki/Video_gaming_in_South_Korea

So maybe gaming takes away the time from those who might otherwise contribute 
to Wikipedia.

Kerry


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Analysis on the "thanks" feature, location of revision data?

2018-06-13 Thread Kerry Raymond
If the data on the specific edit is not public, it may nonetheless be able to 
be guessed.

According to the public log, my last (genuine) thanks (User:Kerry Raymond) was 
to User:Ozesoldier on 11 June. Since we know the date of the thanks, what edits 
did Ozesoldier do prior to that?

Well, that one is easy. That user did 4 edits to the same article Anton 
Hettrich, having edited nothing else for two years. So yes I thanked in 
relation to Anton Hettrich. Which of the 4 edits? I have no way of knowing 
(nothing shows in the history, which is interesting as it does show when it is 
recent history) and to test if I could find out which of the 4 edits I thanked, 
I just thanked the user for all 4 of edits  to see if any of them said “already 
thanked”, and it didn’t, but only 3 new thanks appear in the public log, so 
under the hood, the thanks system knew one of the thanks was a repeat and 
ignored it).

Having said that, I encountered these edits on Anton Hettrich via my watch list 
and saw the diff of 4 edits rather than diffs of 4 separate edits. Now the 
thanks system doesn’t allow you to thank a diff of multiple edits (even if made 
by the same user), so to thank, you have to the extra step of doing to the 
history and thanking a specific edit, which I do somewhat randomly as I am 
really thanking for the group of edits. I typically pick the edit of the group 
that added the most bytes, but it can just be the first one my mouse reaches.  

So it’s important to understand that a thanks is probably not 100% linked to a 
specific edit when it occurs as part of a sequence of edits done around the 
same time. It may mean “I like what you are doing to this article” rather than 
“I like the way you removed that comma”.  So I would argue that you don’t need 
to do about the specific edit, but that knowing the specific article probably 
suffices. So can we work that out.

Well, if we assume thanks are a response to a watchlist notification (mine 
almost certainly are of that type but maybe others have different behavioural 
patterns), then the article in question would be in the intersection of 
“recently” edited by the receiver of the thanks (which is public information) 
and on the watchlist of the giver of thanks (not public Information). However, 
why do you watchlist something? Again, for me, it’s pretty simple. I watch 
articles I have made a contribution to, by default, and later remove those that 
generate a lot of watchlist activities but are topics about which I do not 
deeply care or to which my own contribution was housekeeping rather than 
intellectual). But I think everything in my watchlist is going to be something 
I contributed to. And, yes, I had previously edited Anton Hettrich (I started 
it) and this is public knowledge.

So, based on my own user behaviour (which may or may not be typical) I would be 
tempted to suggest that the article Z  that is the cause of the thanks from X 
to Y must be a recent edit by Y (that occurred before the timestamp of the 
thank) to an article previously edited by X. And that article (or set of 
articles) is computable with public knowledge. What do we mean by “recent”? I 
am honestly not sure, but if it is very recent, it’s faster to compute, so 
practical computation limitations may determine how recent you are prepared to 
consider. Maybe you just work backwards through the Y’s contributions until you 
find an article that the X previously edited as an approximation. Clearly the 
further you work back the larger the set of candidate articles becomes. If 
thanks are watchlist triggered, I would think that “recent” would be a one 
month or less.

So while the data is not public, maybe you can make a fair guess at least about 
the article that is involved and from that the single edit or group of edits 
that likely to have attracted the thanks. But whether these approximations are 
adequate for your task depends a lot on your research question.

Kerry

Sent from my iPad

> On 13 Jun 2018, at 8:15 pm, Leila Zia  wrote:
> 
> Hi Max,
> 
> Two items:
> * Please review
> https://meta.wikimedia.org/wiki/Research:Understanding_thanks and ping
> if you want to talk so we make sure there is no duplication of
> efforts. :) Do you have a page somewhere that you've described more
> details?
> 
> * The data that you request is not public as far as I know.
> 
> Best,
> Leila
> 
> 
>> On Wed, Jun 13, 2018 at 11:12 AM, Maximilian Klein  wrote:
>> Hi Everyone,
>> 
>> I've been performing some analysis on the "thank a user for an edit"
>> feature which was introduced in 2013, but have run into a data availability
>> hurdle. I'm able to easily retrieve all the "thanks" that happened using
>> the database replicas by searching the logging table with "log_type =
>> 'thanks'" criteria. However these entries only shows who thanked who, and
>> when. I don't see recorded which *revision* was being thanked. Does anyone
>> know where I might find this data?
>> 
>> Make a great day,
>> M

Re: [Wiki-research-l] Reader use of Wikipedia and Commons categories

2018-05-24 Thread Kerry Raymond
I do outreach including training. From that, I am inclined to agree that 
readers don’t use categories. People who come to edit training are 
(unsurprisingly) generally already keen readers of Wikipedia, but categories 
seem to be something they first learn about in edit training. Indeed, one of my 
outreach offerings is just a talk about Wikipedia, which includes tips for 
getting more out of the reader experience, like categories, What Links Here, 
and lots of thing that are in plain view on the standard desktop interface but 
people aren't looking there.

Also many categories exist in parallel with List-of articles and navboxes, 
which do more-or-less-but-not-exactly the same thing. It may be that readers 
are more likely to stumble on the lists or see the navbox entries (particularly 
if the navbox renders open). But all in all, I still think most readers enter 
Wikipedia via search engines and then progress further through Wikipedia by 
link clicking and using the Wikipedia search box as their principal navigation 
tools.

Editors use categories principally to increase their edit count (cynical but 
it's hard to think otherwise given what I see on my watchlist); there's an 
awful lot of messing about with categories for what seems to be very little 
benefit to the reader (especially as readers don't seem to use them). And with 
a lack of obvious ways to intersect categories (petscan is wonderful but 
neither readers nor most editor know about it) an leads to the never-ending 
creation of cross-categorisation like

https://en.wikipedia.org/wiki/Category:19th-century_Australian_women_writers

which is pretty clearly the intersection of 4 category trees that probably 
should be independent: nationality, sex, occupation, time frame. Sooner or 
later it will inevitably be further subcategorised into

1870s British-born-Australian cis-women poets

First-Monday-in-the-month Indian-born Far-North-Queensland 
cis-women-with-male-pseudonym romantic-sonnet-poets :-)

Obviously categories do have some uses to editors. If you have a source that 
provides you with some information about some aspect of a group of topics, it 
can be useful to work your way through each of the entries in the category 
updating it accordingly.

Machines. Yes, absolutely. I use AWB and doing things across a category (and 
the recursive closure of a category) is my primary use-case for AWB. My second 
use-case for AWB I use a template-use (template/infobox use is a de-facto 
category and indeed is a third thing that often parallels a category but unlike 
lists and navboxes, this form is invisible to the reader).

With Commons, again, I don't think readers go there, most haven't even heard of 
it. It's mainly editors at work there and I think they do use categories. The 
category structure seems to grow there more organically. There is not the 
constant "let's rename this category worldwide" or the same level of 
cross-categorisation on Commons that I see on en.Wikipedia.

I note that while we cannot know who is using categories, we can still get page 
count stats for the category itself. These tend to be close to 0-per-day for a 
lot of categories (e.g. Town halls in Queensland). Even a category that one 
might think has much greater interest get relatively low numbers, e.g. 
"Presidents of the United States" gets 26-per-day views on average. This 
compares with 37K daily average for the Donald Trump article, 19K for Barack 
Obama, and 16K for George Washington. So this definitely suggests that the 
readers who presumably make up the bulk of the views  on the presidential 
articles  are not looking at the obvious category for such folk (although they 
might be moving between presidential articles using by navboxes, succession 
boxes, lists or other links). Having said that, the Donald Trump article has 
*53* categories of which Presidents of the United States is number 39 (they 
appear to be alphabetically ordered), so it is possible that the reader never 
found the presidential category which is lost in a sea of categories like "21st 
century Presbyterians" and "Critics of the European Union". I would really have 
thought that being in the category Presidents of the USA was a slightly more 
important to the topic of the article than his apparent conversion to 
Presbyterianism in the 21st century (given he's not categorised as a 20th 
century Presbyterian).

And, somewhat amazingly, there is no apparent category for "Critics of Donald 
Trump". I must propose it, along with a fully diffused sub-cat system of 
Critics of Donald Trump's immigration policies, Critics of Donald Trump's hair, 
etc. By the time I've add all the relevant articles to those categories, I 
should have at least another 100K edits to my name!

Kerry




 


-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Federico Leva (Nemo)
Sent: Friday, 25 May 2018 7:14 AM
To: Research into Wikimedia content and co

Re: [Wiki-research-l] Revert data by article importance/quality/readership/watchership/BLP

2018-03-20 Thread Kerry Raymond
I said "where I am suggesting that we don’t allow new users to edit articles of 
higher importance, higher quality, higher readership, or higher 
page-watcher-ship, or about living people because I strongly suspect  that this 
is where new users are at much higher risk of reverting"

I entirely agree with you that editing Donald Trump would not be a good new 
user experience. I run all my edit training sessions and new-user 1Lib1Ref 
edit-a-thons on "low risk" articles as I perceive them. I am just curious if my 
perception of revert risk for new users matches statistical reality.

Kerry 

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jonathan Morgan
Sent: Wednesday, 21 March 2018 4:30 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Revert data by article 
importance/quality/readership/watchership/BLP

Kerry,

Did you really mean "not allow" here? IMO we (WMF, researchers,
Wikipedians) shouldn't be in the business of creating Yet Another Barrier to 
newcomer contribution.

*Suggesting* that people avoid making their first edit to the article on Donald 
Trump, etc.--sure, that's a good "teachable moment" and probably helps shield 
newcomers from unnecessary confusion and hostility.

I also believe that we could make progress by *recommending *articles for 
newcomers to edit based on some combination of 1) quality improvement needed, 
2) low likelihood that good faith edits will be immediately reverted 3) topic 
is of general interest OR topic is likely to be of interest to newcomer based 
on their stated preferences or their editing history.

The data necessary to run a study like the one you're looking for is all public 
and so I think a study like this could be done. But to my knowledge no one has 
done it yet.

- Jonathan

On Tue, Mar 20, 2018 at 4:42 AM, Andy Mabbett 
wrote:

> On 20 March 2018 at 11:40, Andy Mabbett  wrote:
>
> > I can understand your reasoning, but consider who this would impact 
> > things like [...]
>
> *how* this would impact...
>
> Apologies.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Revert data by article importance/quality/readership/watchership/BLP

2018-03-20 Thread Kerry Raymond
Part of the larger discussion is about an on-boarding system that is asking the 
user what they are trying to do so they can be given "just in time" advice on 
how to do it. Obviously the thing isn't built yet to know the options to be 
offered but you are right that we don't want unintended consequences of it. 
But, having said that, we did over 1000 edits at State Library of Queensland 
both this year and last year (about 1/4 of the world's total last year and 
about 1/6 of the world's total this year), so I see my fair share of 1Lib1Ref 
edits and, yes, they do get reverted. Here's an example edit from 2018 1Lib1Ref 
that was reverted:

https://en.wikipedia.org/w/index.php?title=Charters_Towers&type=revision&diff=821207200&oldid=816557651

Our community will bite 1Lib1Ref people (and the edit is clearly tagged as 
such) just as happily as any other new user and, in this case, wouldn't back 
down when I pointed out there was nothing wrong with the edit. I note that I 
had advised the Australian community in advance about 1Lib1Ref and the kinds of 
edits they would see happening precisely to try to start this sort of thing 
happening, but ...

Actually the on-boarding system would also get in the way of training sessions. 
So I will probably be asking for a way for "trustworthy" new users to be able 
to bypass the on-boarding as this will be necessary for training sessions and 
might also be a solution for 1Lib1Ref.

But the larger issue is to avoid new users having really bad initial 
experiences because it drives them away so avoiding the high-risk articles for 
reverting would be useful strategy. I'd happily keep 1Lib1Ref-ers away from 
that kind of experience. I am hand-holding my librarians through the process 
(the new ones all do their 1Lib1Ref in a series of edit-a-thons (we run 3 each 
week through the 3 weeks and they all have my email address for any problems, 
plus we do have some moderately experienced users among the librarians 
themselves). We do not encourage the new users to use Citation Hunt because it 
takes them to high-risk articles. We have our "lucky dip box" instead. We 
literally have a box with slips of paper with the names of articles needing 
certain kinds of edits -- this year we added public libraries in Queensland to 
articles about Queensland towns and suburbs and opening/closing of schools in 
Qld towns and suburbs) and we have clear instructions on how to do those kinds 
of lucky dip edits. The repetition of doing the same kind of edit over multiple 
(usually low-risk) articles builds skill and confidence with these groups. We 
do similar things in our monthly WikiClubs with the new users (different theme 
each month). They love doing the lucky dips (librarians are "completer" 
personalities I think) and only a few seem to desire to advance to more 
"freelance" editing.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Andy Mabbett
Sent: Tuesday, 20 March 2018 9:40 PM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Revert data by article 
importance/quality/readership/watchership/BLP

On 20 March 2018 at 10:09, Kerry Raymond  wrote:

> https://www.mediawiki.org/wiki/In-context_help_and_onboarding

> where I am suggesting that we don't allow new users to edit articles 
> of higher importance, higher quality, higher readership, or higher 
> page-watcher-ship, or about living people because I strongly suspect  
> that this is where new users are at much higher risk of reverting.

I can understand your reasoning, but consider who this would impact things like 
1Lib1Ref, or an editor who just adds photos (possibly their own, taken 
especially) to articles that lack them.

--
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Revert data by article importance/quality/readership/watchership/BLP

2018-03-20 Thread Kerry Raymond
Do we have any data on frequency of reverts of users (or more particularly
new users) based on characteristics of the article being developed? There is
a proposal about "in-context help and onboarding" of new users:

 

https://www.mediawiki.org/wiki/In-context_help_and_onboarding

 

where I am suggesting that we don't allow new users to edit articles of
higher importance, higher quality, higher readership, or higher
page-watcher-ship, or about living people because I strongly suspect  that
this is where new users are at much higher risk of reverting. I take this
approach during training, I suggest the topics they edit and choose what I
regard as "low risk" ones (and provide some sources). This produces almost
no reverts of their first edits which I think is very important in gaining
confidence with basic editing skills.

 

So I was curious about whether anyone has crunched such data or has data
that could be easily crunched to confirm or deny my hypotheses.

 

Kerry

 

 

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Gaps

2018-02-08 Thread Kerry Raymond
I think there are two parts to the problem of filling gaps. Drawing attention 
to the gaps is half of the problem. The other half of the problem is finding 
the editor who wants to write that article. For example, I often check on the 
"missing topics" list for WikiProject Queensland (which is machine-generated by 
counting the number of redlinks in articles tagged on the Talk page as 
belonging to that project).

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Queensland/Missing_topics

This is not a highly sophisticated algorithm but it does result in my thinking 
"oh well, I am sure I could at least write a stub on that topic" and so I write 
an article. 

But if you look at the first couple of screens of those "most missing" topics, 
there are lots of racing car drivers. I have no interest whatsoever in racing 
car drivers, I have no idea what sources might exist or which might be 
reliable. So as I pick off other topics from the "most missing" list, it has 
the effect of increasing the density of racing car drivers at the top of the 
list. Clearly we have a content gap around racing car drivers, but I won't be 
doing anything about it.

This reinforces the point Leila makes about personalising the recommendations. 
I think it's more important to target the right people even if the list you 
present to them isn't overly sophisticated. The right person will be able to 
mentally filter a list of things vaguely associated with their topic interests. 
As Leila says, there's probably less benefit in targeting new users to write 
new articles. But I've started over 4000 articles and I bet 90% are WikiProject 
Queensland. Show me any list of wanted Queensland topics and I'll probably be 
willing to write about *many * of them (but not all). Similarly if you look at 
the categories of the articles I write, the category Queensland Heritage 
Register will come up a lot (probably 1/3 of my articles are about heritage 
properties). Probably another 1/3 are articles about Queensland 
towns/suburbs/localities. I think looking at the categories/projects of the 
articles people write is a very strong indicator of interest areas. And the 
more articles they write, the more sure you can be that they are confident 
about starting new articles (a lot of people are not willing to start new 
articles but will happily contribute to a stub -- probably had a past bad 
experience with article creation) and the more you can be sure about their 
areas of interest.

With the exception of redirects and disambiguation pages, I would think anyone 
who has started many articles is likely to have easily-inferred topic space 
interests. For that matter, a lot of people (myself included) talk about their 
interest areas on their user page, so key words in user pages that fuzzy-match 
to project names or category names may be another indicator.

However, some of the content gaps on Wikipedia exist because we don't have 
contributors who are interested in the topic. Given that there is a known 
difference between the topics that women generally write about compared to men, 
it's clear that a lack of diversity in editors is likely to lead to content 
gaps. I would suspect the same is true about other personal characteristics. As 
an Australian, I am more likely to write about Australian than say Greenland, 
but I did holiday there last year, so actually I have written a little about 
Greenland and uploaded some photos, but that's just a "blip" in my contribution 
profile (and I don't think I started any new articles about Greenland). If we 
have a content gap about Greenland, maybe we don't have enough Greenlanders to 
fill it? I think we can't address content gaps unless we also address 
contributor gaps. This in turn may result in devolving responsibility for 
things like notability and verifiability down to the Project level. For 
example, it is often commented that Indigenous Australian topics are a content 
gap. The problem is a lack of sources. Indigenous Australians did not have a 
written language so oral sources are very important, but en.Wikipedia isn't 
keen on oral sources, so there's a content gap that's hard to fill. And I 
suspect we have very few Indigenous Australians writing for Wikipedia. 
Statistically 3% of our population self-identifies as Indigenous but they tend 
to have lower educational attainments which probably makes them less likely to 
be Wikipedia contributors who, based on the 2011 survey, have above average 
likelihood of having a university degree. 

So I think we have two flavours of content gap, those for which we have active 
contributors in the broader topic space who may be enticed to write about the 
missing topics (which is the problem being principally addressed by this area 
of research), and those where we do not have active contributors.

Kerry





___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mail

Re: [Wiki-research-l] San Francisco union rules

2018-01-24 Thread Kerry Raymond
It's not just USA. We have similar issues in Australia too. Many "enterprise 
agreements" (as we call them) specifically prevent volunteers doing work that 
is part of the duties of a paid staff member. 

And of course any venue can impose rules on you as part of the contract of 
using their premises, whether there is any law or any union agreement 
underpinning it. They may just do it to protect themselves in relation to 
workplace safety. A volunteer may not know they have to cover a cable snaking 
across the floor with gaffer tape (for example). They may bring their own 
electrical cable without a recent "test tag" on it. Etc.

Kerry



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] New policy about performing research on English WikipediaWiki-research-l Digest, Vol 149, Issue 1

2018-01-03 Thread Kerry Raymond
Interesting question. I'll bite :-)

From reading what I hope the pertinent sections of the research paper (I didn't 
read it all), the research required new Wikipedia articles to be written on 
topics in chemistry and econometrics. These Wikipedia articles were not written 
by the researchers themselves but rather they recruited PhD students in those 
disciplines to write them. The topics were chosen by looking at lists of topics 
in the discipline based on text books and university syllabuses and then 
looking to see which topics did not yet have a Wikipedia article. The set of 
articles were then split into one group which were uploaded to Wikipedia and 
one group that was not (control group). What happened after that was 
more-or-less business-as-usual on Wikipedia (although some of the PhD students 
remained engaged in order to get the article to satisfy the reviewers to get 
the article accepted - I assume they were uploaded via Articles for Creation). 
The research itself was to see the impact of these new articles (compared to 
the topics that were not uploaded); this did not interfere with the articles, 
merely observing.

In terms of the new policy of "not a laboratory", I don't see any reason to 
regard these uploaded articles as "disruptive" or "negatively impacting 
articles". The method by which the topics were chosen and the selection of PhD 
students in that discipline to write the articles appears similar to that used 
in most edit-a-thons. The topics chosen seem likely to be notable and the 
authors were presumably competent in that discipline so presumably the quality 
should be at least equal to most new articles. It is unclear if the PhD 
students were paid to write the articles but, even if so, there seems to no 
conflict-of-interest as the researchers had no "agenda" other than to write a 
typical Wikipedia article on that topic. I guess the only argument for 
disruption might be that uploading a large number of articles in the same 
discipline presumably around the same time which might have generated a higher 
than normal workload for those competent to review them, but then an 
edit-a-thon may have had the same impact. 

So my take is that it may have been courteous (under the new policy) to discuss 
the project at the Village Pump, but I think the researchers would have been 
operating within the policies even if they did not. Since many of the articles 
survived in some form (some were merged) would suggest there was benefit to 
Wikipedia from the research.

But a similar project that chose the topics more carelessly or used people with 
inferior discipline knowledge to write the articles or remained actively 
engaged with the article (e.g. gatekeeping/ownership) could well have been 
disruptive. So there probably would be benefit in having a conversation on the 
Village Pump (or wherever, I'm not convinced Village Pump is the right place) 
to establish exactly how certain aspects of the project should be conducted to 
avoid disruption and negative effects on articles. Simply, if the researchers 
are not active Wikipedians (by which I mean more than "I think I have mastered 
the syntax") , I am not convinced they are capable of judging what might be 
disruptive or negative.

Personally I think this list might be a better place than the Village Pump as 
we understand both research and Wikipedia while I am not convinced that the 
Village Pump understands research. But the reality is that many researchers 
will not know of the "not a laboratory policy" so the first we may know about 
research is either the resultant publications or the screaming and yelling that 
arises from discovering the research being executed (possibly because of the 
disruption being created). How do we communicate this policy to researchers?!

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of James Salsman
Sent: Wednesday, 3 January 2018 10:30 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] New policy about performing research on English 
WikipediaWiki-research-l Digest, Vol 149, Issue 1

Hi Jonathan,

Can you please give a concrete example of what, for example, the 
http://ide.mit.edu/sites/default/files/publications/SSRN-id3039505.pdf
researchers would have had to do differently under this new policy?

Best regards,
Jim

> Date: Tue, 2 Jan 2018 15:29:03 -0800
> From: Jonathan Morgan 
> To: Wiki Research-l 
> Subject: [Wiki-research-l] New policy about performing research on 
> English Wikipedia
>
> Hi there wiki-research folks,
>
> This is just a heads-up that English Wikipedia has adopted a new 
> policy[1] about research on that project. The policy codifies some new 
> requirements for community notification and disclosure that 
> potentially apply to all research projects (regardless of the affiliation of 
> the researcher).
>
> You can read more about the policy on WP:NOT[1], but I've included the 
> majo

[Wiki-research-l] FW: Re: Editor participation rates in surveys

2017-10-31 Thread Kerry Raymond
I just received this response to my message to this wiki-research mailing list. 
Do we think it acceptable that people sign up to this list with this kind of 
demand for payment? I don’t.

 

Sure I can just delete the email, but I don’t believe this is acceptable 
behaviour when this person has presumably willingly signed up to this list and 
should therefore be willing to receive legitimate messages sent to it by other 
list members.

 

Kerry

 

From: REMOVED BY KERRY
Sent: Wednesday, 1 November 2017 11:15 AM
To: kerry.raym...@gmail.com
Subject: Re: Re: [Wiki-research-l] Editor participation rates in surveys

 








 




Hello!

I use a new email filtering service called BitBounce to better filter my spam. 
To deliver your email to my inbox, please click the link below and pay the 
small Bitcoin fee. Thank you!

  You can sign up or get more info about 
BitBounce by clicking here. 

To deliver your email:

  We’ve never met. I’ll pay 
your fee.

 

 I know you. Add me to your whitelist.

   

Email that pays.



BitBounce is a product by:

Turing Cloud

  BitBounce.com

Redwood City, CA

BitBounce integrates with:

CoinBase

  CoinBase.com

San Francisco, CA






 





 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor participation rates in surveys

2017-10-31 Thread Kerry Raymond
Frankly that sounds like quite a high participation rate to me. I'd be 
surprised if a random call on English Wikipedians would have produced that 
level of response.  But, as Jonathon has already said, participation rate is 
going to depend on a lot of variables.

I don't think anyone has conducted the "meta-research" on the willingness of 
Wikipedians to be surveyed and in what circumstances. Indeed, I doubt it can be 
studied. Even if you had access to every study ever done on Wikipedians, it is 
unlikely that each study occurs independent of any other. Almost certainly, 
some of the studies started with enquiries to other researchers about how best 
to solicit participation, so I would suspect that many studies adopted methods 
of recruitment influenced by the experience of previous studies. This means 
that surveys may be recruiting based on greatest likelihood of finding 
respondents rather than whether those respondents are a representative sample 
within the desired cohort of subjects. And, let's face it, a researcher who has 
to produce a result to get a PhD, tenure, promotion, or their next grant, a 
large number of non-representatives respondents at least gives you enough data 
to draw some conclusions about, whereas a very small number of representative 
respondents might not. "We surveyed 2 people and they said ..." (ouch!).  Let's 
never forget that we don't do research just to make the world a better place.

Indeed, when it comes to Wikipedians, I don't think we know what a 
representative sample should look like in any case. Even the WMF's own editor 
surveys had relatively low participation rates (5000 for the 2011 editor 
survey) which is a drop in the bucket of the millions of registered user 
accounts (and the almost unknowable number of editors contributing anonymously. 
In contrast when you do a random population study or within staff of an 
organisation, you do generally have other data (e.g. census, HR records) to 
tell you how representative your sample is on a number of the standard 
demographic variables. No wonder we like to study university students so much 
(known demographics and such a convenient sample!).

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Juliana Bastos Marques
Sent: Wednesday, 1 November 2017 7:38 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Editor participation rates in surveys

Thanks for your reply, Jonathan. I was wondering if anybody has ever conducted 
a systematic research on the variables that you listed.

I had a sample of the 200 editors with more edits (30 days in Sept/Oct) on 
Portuguese Wikipedia, and 11 of these participated. I could have adopted other 
criteria - for instance, only 34 of these are admins, 2 are bots -, but for my 
purposes I just wanted a small sample with an objective selection. Indeed, the 
participation rate of 5,5% was expected, but I was wondering if there are any 
studies that can corroborate this.

Thanks,
Juliana

On Tue, Oct 31, 2017 at 5:27 PM, Jonathan Morgan 
wrote:

> Hi Juliana,
>
> Can you give a little more info about what you're looking for, and a 
> little context about why your asking?
>
> I don't know of any research that has specifically asked whether there 
> is a difference in response rate per target group. Anecdotally (I've 
> run a lot of editor surveys), I can say that in my experience:
>
>- very new editors often don't respond to surveys at a high rate,
>probably because they're less committed to/invested in Wikipedia and/or
>they have already lost interest (or stopped participating for other
>reasons) by the time they get the survey
>- how you deliver the survey matters a lot: for example, direct email
>vs. talkpage message vs. newsletter/mailing list message vs. 
> invitation at
>a live f2f event
>- the topic and goal of your survey matters a lot: if it's something
>that people care about, they're more likely to respond. If people 
> feel that
>it's important or personally useful to tell you what they know or 
> what they
>think, they're more likely to respond. If you're asking for very 
> personal
>information, or information that is not clearly relevant to your stated
>goals, they're often less likely to respond.
>- who you are and why you're asking matters a lot: do the editors trust
>you? do they have preconceived notions (correct or not) about who 
> you are,
>what the data will be used for, how it will be stored and published, how
>privacy and anonymity will be ensured (if applicable)... these all 
> matter a
>whole lot.
>- in general, smaller-scale surveys targeted at a very specific group
>and which are clearly relevant to the expertise and goals of that group,
>and follow scientific best practices for open and ethical research, 
> seem to
>work pretty well (with all the above caveats)
>

Re: [Wiki-research-l] [Announcement] Voice and exit in a voluntary work environment

2017-07-20 Thread Kerry Raymond
Leila,

I am wondering if you can explain the project title "Voice and exit in a 
voluntary work environment". I don't quite see the connection to the project as 
proposed

https://meta.wikimedia.org/wiki/Research:Voice_and_exit_in_a_voluntary_work_environment

On reading the project, I see two almost separate items. One is the intent to 
survey all new users about their demographics. The second goal here is to form 
newbie teams of women based on a similar interests based on "20 questions". 

Regarding the demographics of new users. Is this intended to occur when they 
create a new account (rather than a new IP)? If so, will it be optional? I 
guess my concern is people will back off from signing up, either because they 
don't want to reveal the information, or because the process has just become 
too heavyweight.  From a privacy perspective (I presume there will be a privacy 
statement), will the demographic survey remained linked to the user name? From 
the point of view of the science, it would be good if it was for tracking 
purposes but it's also a possible reason why people won't answer your questions 
if it is (or more to the point, if they think it is). I know myself when 
organisations approach me for demographic information (anonymously or linked to 
my username or real world identity), my reaction to such requests tends to 
depend on how much I care about them (and how much I trust them). If I am very 
involved in an organisation, I am generally happy to provide data that assists 
them in the stated purpose because I want them to be successful. When I am 
marginally engaged (the case with many a website that requires a signup), I am 
unlikely to provide demographic information in general and almost certainly not 
at the point of signup.

I assume the link between the two parts of the project is that some/all of 
those new users whose demographic profile reveals they are women will then be 
approached to form teams based on the 20 questions. Will that occur before 
their first edit? I'm just thinking of the person sitting down to fix a 
spelling error going through signup, demographic survey, invitation to be in a 
team before and possibly 20 questions before we let them do the edit they came 
to do. I guess I am fearful that the experiment will drive women away if it is 
all too up-front heavy relative to the task they came to do. Not in the 
interests of diversity.

Also, the word "organic" was mentioned. Not all new users are organic. Anyone 
who is signing up for a training class, edit-a-thon, university class exercise 
etc is NOT organic. Can I ask that when there is a research intervention, 
reasonable steps are taken to ensure that non-organic new users are not caught 
up in it. That means having some way to bypass the intervention and informing  
the course instructors (it's a user right) well in advance so they can ensure 
their groups are bypassed. Ditto any scheduled events/edit-a-thons. Mine are 
published on the Wikimedia Australia website. When you have 2 hours to teach 
Wikipedia (the typical time slot I get from organisations) and you have a 
prepared set of PPT slides, you want the Wikipedia interface to follow the 
sequence you are expecting. Trainees are confused by buttons being relabelled 
different to the PPT slides etc. And it's worse if it happens to only part of 
the group as they think they did something wrong. Anything that slows things up 
means you don't get finished in two hours and training has failed its goals.  I 
got caught by the A/B testing of Visual Editor by new users. At that time, I 
had never seen or used the Visual Editor and a proportion of my training class 
were being shown it. It was a disaster and I nearly gave up training after 
that, it was just so embarrassing. I did not know it was happening. Nor did I 
have any way to get those users back into the source editor (which I was 
teaching at that time). While I think the VE is a good thing for Wikipedia, 
that was NOT the way to experiment with it. Also with events, because of the 
limit on signups per day from the same IP address, it is common to ask people 
to sign up in advance for which you provide information on the process. So the 
bypass of the intervention needs to be available for the signups occurring 
before the event so don't think it is sufficient to just provide an "on the 
day" signup solution. It has to work for the people doing it at their own desks 
days ahead. Given that the vast majority of participants in my groups are 
women, I don't think it’s in the interests of diversity to give them a bad 
experience by being inadvertently caught up in an experiment.

Moving on to the newbie teams, how is this going to work? How will they 
communicate?

Will you tell them about the Visual Editor which is NOT enabled by default for 
new users? As someone who has delivered training on both editors, the VE is an 
absolute winner for new users, particular women. I could not do Wikipedia edit 
t

Re: [Wiki-research-l] link rot

2017-06-26 Thread Kerry Raymond
It's worth commenting that link rot occurs at a variety of ways.

The obvious way is that the URL is broken, error 404 is returned to the browser.
Or but rather than send a 404 to the browser, the site redirects you to a page 
that says "Page not found" without an error 404.
Or but you are redirected to a search page from which does not  find what you 
want. (A lot of sites seem to be increasingly to be hiding content by returning 
it as search results that you cannot archive).
Or but you are redirected to a general search page from which you may or may 
not find the page you were after at a new URL
Or the URL has been replaced by a specialised search which will give you what 
you want but not in a way that you can use for citing or archiving. A lot of 
sites seem to be increasingly to be hiding content by returning it as search 
results.
Or the URL works but the contents on the page is not what was expected 
(different topic) which occurs with sites that number (and then re-number their 
web pages) or when cybersquatters buy an expired domain name.
Or the URL works, continues to be about the topic expected, but does not say 
anything to back up the claim in the Wikipedia article because the content has 
changed since.
Or the URL works and the content NEVER said what the Wikipedia article claims 
(contributor error or deliberate misleading).

And there may be more variations on the them that I have forgotten about.

Obviously these variations have to be detected in different ways. And for 
archive sites, it is often impossible to recognise in an automated way that a 
lot of these have occurred. It can be really tedious to wade through dozens of 
archived snapshots of a webpage finding "Page not found" pages in your search 
for the "most recent really-what-I-wanted content". This is a problem for the 
Internet Archive Bot.

So you often need a human to say "hey, it's broken" at which point the Internet 
Archive Bot may try to fix it. Because the bot writers know that the bot can be 
fooled by finding an "archived page" that actually doesn't replace the deadlink 
with useful content, they put those very long messages on the Talk pages to try 
to ask people to check the rescued citation. I don't know about other people, 
but when the Internet Archive Bot was released, it deluged my watchlist and I 
simply stopped checking its work (I could never have kept up). Now its volume 
has reduced but I'm now trained to ignore it. I think it does a better job at 
archiving external links than rescuing (but given the variations above, this is 
not to be wondered at).

At the end of the day, most deadlinks need a human in the loop for recovery. 
And it's a huge task and a tedious one. But I do dabble in it from time to time 
for claims that seem particularly "bold" or on articles that I care a little 
bit more about. So let me talk about the process.

 One of the problems is that for URLs that I did not add myself, I can see the 
deadlink citation and I may have located what I think is a replacement page 
(whether on the original website or from an archive or whatever), say with a 
similar-ish title appearing to take about the topic of the Wikipedia article, 
but my problem is that I cannot tell from the article how much of the content 
preceeding the citation (or in the case of bullet lists, tables, etc, following 
the citation) is intended to be supported by the citation. So I don't really 
know if some particular claim is supposed to be supported by the nearest 
citation or whether it may be supported by another citation that has drifted a 
long way away. I've emailed at some length previously about this problem of 
being unable to relate chunks of texts in articles to citations and the 
citation rot that occurs as the article grows and the citations drift into the 
wrong text (or just get deleted because a subsequent editor can't see where 
they fit into the narrative or can't be bothered to see). So, not quite knowing 
what information was supposed to be supported by this citation, it is genuinely 
hard to say if the new URL I have found is or isn't an adequate replacement.  
Am I doing more harm to replace it when I may not totally confident, or should 
I leave it for someone else to decide (assuming someone else will even try)? I 
often try to fix a deadlink citation but back away because I just don't know if 
I am doing the right thing or not.

To try to get around the "citation rot" issues, if I am highly motivated that 
day, I use WikiBlame to try to locate the version of the article in the History 
where the citation was added. This gives me the best chance to know what 
information it was intended to support. So then I go and look in Internet 
Archive and find the URL has been archived, but the first archived version is 
AFTER the date of the version of the Wikipedia article that added the citation. 
Is this a problem? Generally I take the risk and go for it if the info seems to 
be consistent. At the end of 

Re: [Wiki-research-l] [Analytics] Wikipedia Detox: Scaling up our understanding of harassment on Wikipedia

2017-06-24 Thread Kerry Raymond
No right to be offended? To say to someone "you don't have the right to be 
offended" seems pretty offensive in itself. It seems to imply that their 
cultural norms are somehow inferior or unacceptable. 

With the global reach of Wikipedia, there are obviously many points of view on 
what is or isn't offensive in what circumstances. Offence may not be intended 
at first, but, if after a person is told their behaviour is offensive and they 
persist with that behaviour, I think it is reasonable to assume that they 
intend to offend. Which is why the data showing there is a group of experienced 
users involved in numerous personal attacks demands some human investigation of 
their behaviour.

Similarly for a person offended, if there is a genuinely innocent 
interpretation to something they found offensive and that is explained to them 
(perhaps by third parties), I think they need to be accepting that no offence 
was intended on that occasion. Obviously we need a bit of give and take. But I 
think there have to be limits on the repeated behaviour (either in giving the 
offence or taking the offence).

Kerry




 


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] Wikipedia Detox: Scaling up our understanding of harassment on Wikipedia

2017-06-22 Thread Kerry Raymond
I agree you can probably never pin down these terms to everyone's satisfaction. 
But, at the end of the day, is the real issue here the definition of harassment 
or is it the issue of people leaving Wikipedia because of unpleasant 
interactions with other people or perhaps retaliating in some inappropriate 
way. Harassment may not even be occurring on a Talk page. If someone stalks you 
on-wiki and reverts each of your edits, you are probably being harassed without 
a word being said on Talk.

This is the problem. Two people can see the same set of events or the same 
commentary from very different points of view. The question of "harassment" 
isn't completely decidable in the real world for the same reasons. But if we 
train the algorithms based on human assessments (provided that a wide range of 
people were making those assessments), we do have something useful to work with 
to begin to test hypotheses in the lab before taking real-world action.

For example, I find it very interesting that a small group of experienced users 
appear responsible for a lot of apparently obvious personal attacks. It does 
indeed suggest that these people think themselves unstoppable, whether that is 
being they believe themselves "unblockable" or perhaps they feel safe in the 
knowledge that their less-experienced victim is unlikely to know how to 
complain. Or perhaps they are just bantering among themselves, like a bunch of 
mate at the pub? But it certainly seems to suggest that there is a way to start 
identifying potential problem users for a human-based investigation.

But does the "community" really care about harassment to investigate them? 
Would it really take action against experienced users who engaged in 
harassment? Past events suggest not. 

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Thursday, 22 June 2017 10:04 AM
To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics. ; Wiki 
Research-l 
Subject: Re: [Wiki-research-l] [Analytics] Wikipedia Detox: Scaling up our 
understanding of harassment on Wikipedia

I'm glad that work on detecting and addressing harassment are moving forward.

At the same time, I'd appreciate getting a more precise understanding of how 
WMF is defining the word "harassment". There are legal definitions and 
dictionary definitions, but I don't think that there is One Definition to Rule 
Them All. I'm hoping that WMF will be careful to distinguish debate and freedom 
to express opinions from harassment; we may disagree with minority or fringe 
views (even views that are offensive to some) but that doesn't necessarily mean 
that we should use policy and admin tools instead of persuasion and other tools 
(such as content policies about verifiability and notability) to address them 
(and in some cases Wikipedia may not be a good place for these discussions). 
Other distinctions include (1) the distinction between a personal attack and 
harassment ( 
https://blog.wikimedia.org/2017/02/07/scaling-understanding-of-harassment/
appears to have equivocated the two definitions, while English Wikipedia policy 
makes distinctions between them), and (2) the distinction between a personal 
attack and an evidence-based critique.

Also note that definitions of what constitutes an attack may vary between 
languages; for example an expression which sounds insulting to someone in one 
place, culture, or language may mean something very different or relatively 
benign in a different place, culture, or language. I had an experience myself 
when I made a statement to someone which from my perspective was a statement of 
fact, and the other party took it as an insult. I don't apologize for what I 
said since from my perspective it was valid, and the other party has not 
apologized for their reaction, but the point is that defining what constitutes 
a personal attack or harassment can be a very subjective business and I'm not 
sure to what extent I would trust an AI to evaluate what constitutes a personal 
attack or harassment in a wide range of contexts. I get the impression that WMF 
intends to flag potentially problematic edits for admins to review, which I 
think could be a good thing, but I hope that there is great care being invested 
in how the AI is being trained to define personal attacks and harassment, and I 
wouldn't necessarily want admins to be encouraged to substitute the opinion of 
an AI for their own.

I understand the desire to tone down some of the more heated discourse around 
Wikipedia for the sake of improving our user population statistics, and at the 
same time I'm hoping that we can continue to have very strong support for 
freedom of expression and differences of opinion. This is a difficult balancing 
act. I think that moving the needle a bit in the direction of more civility 
would be a good thing, but I get the impression that there ar

Re: [Wiki-research-l] Research about WikiProject Recommendation

2017-06-20 Thread Kerry Raymond
 contributors are source-edit people. Our newer people will be a mix of 
source-edit and Visual Editor people. A Visual Editor user cannot write on a 
Project Page or any Talk page with the Visual Editor because the Visual Editor 
has not been enabled for those name spaces (it’s one of those “it was good 
enough for me” issues). You can get around this for an individual Project page 
or Talk page by adding the template

 

{{VEFriendly}} at the top of the page, see it in action on my User Talk page

 

 <https://en.wikipedia.org/wiki/User_talk:Kerry_Raymond> 
https://en.wikipedia.org/wiki/User_talk:Kerry_Raymond

 

Aside. I notice this template does not appear on the top of 

 

 <https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research> 
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research

 <https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Research> 
https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Research

 

Is there any reason why not? BTW, who are the “project leaders” in that project?

 

But back to my point, if you have an invitee who appears to be a VE user, you 
probably want to connect them to a VE-Friendly mentor (they don’t have to be an 
active VE user themselves, but ought to have some understanding of how the VE 
User “sees” an article and how to give advice in “VE Speak” not source-editor 
notation). If the first piece of advice the new VE recruit gets is to “change 
to the source editor”, it’s not exactly welcoming. So you probably want to ask 
your mentors about the source-vs-visual editor issue. You probably need each 
project to have a mentor who is OK with VE or else you may be recruiting 
someone you can’t match with a mentor. I think all mentors will be fluent in 
source editor but fewer will be comfortable with VE.

 

Now it may be that others on this list see WikiProjects in a different way to 
me because the WikiProjects they are involved with operate in different ways 
and have a different culture. So hopefully people will chime in with other 
perspectives.

 

Kerry

 

 

From: Bowen Yu [mailto:yuxxx...@umn.edu] 
Sent: Wednesday, 21 June 2017 12:00 PM
To: kerry.raym...@gmail.com; Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Research about WikiProject Recommendation

 

Thanks for your thoughts, Kerry and Jonathan!

 

Here is some response for your comments.

 

1. Regarding the targeted projects. Yes, we will definitely focus on content 
projects only for the reasons you mentioned. "Women in Red" is kind of special, 
and I don't think we will include it for now. Sorry for not being clear about 
this.

 

2. Regarding the invitation tactics. We do think there is a spectrum here, by 
either providing templates or giving total freedom to the recruiters. Also, not 
sure if some projects have their own templates of recruiting new members, but 
for now, we will provide general guidance for constructing recruiting the 
message. For instance, as mentioned in the meta-page, we will encourage project 
organizers to write personalized welcome messages, make specific task requests, 
or provide resource to start. Hope this should do the job - getting the right 
tone while still having it under control.

 

3. Yeh, for evaluation like if the invitation is taken, thanks for providing 
all those reasonable possibilities. Listing themselves on the project page 
might be one approach, but as Kerry mentioned, it might not work sometime. 
Rather, we can just see if they make any edit on the project (talk) pages as a 
sign of getting involved, or more loosely, if they keep editing project related 
articles. We will also provide short survey questions for each recommended 
editor to let organizers/project leaders evaluate the recommendation quality. 
We are expecting the project leaders to self-identify themselves when we post a 
recruiting message on the targeted projects to look for volunteer participants 
for our study (we will explicitly mention looking for "project leaders" or 
using similar descriptions).

 

On Tue, Jun 20, 2017 at 4:11 PM, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

There are pros and cons.

Having a standard invitation makes for better research as the project outcomes 
are more comparable but perhaps worse for recruitment.

Perhaps the WikiProjects involved could write the invitation for the person to 
participate, hopefully they can get the tone right. They might want to 
particularly encourage (or discourage) people with specific skills or 
interests, e.g. "We are particularly interested in expanding our articles on 
Pacific Island 17th century wrestling. We are in desperate need of people who 
can develop templates. Our project prides itself on fully cited articles." But 
then differences in the invitation may lead to differences in the uptake. 
Better recruitment, but worse research.

And of course what are the variables being m

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 142, Issue 13

2017-06-20 Thread Kerry Raymond
Thanks, Zach, for the explanation of these things. I realise the goals of 
WikiEd was not to be a research project but achieve other goals in relation to 
Wikipedia, in which it seems to be very successful from your data.

 

I suspect part of my confusion relates to the terms “instructor”, “faculty”, 
and “program staff” (noting that terminology in North American universities is 
very different to Australian universities). Can you unpick these for me? Are 
the “instructors” academic staff of a university or college, or, as I 
interpreted it, people who would probably identify as Wikipedians who are 
volunteering to work on the WikiEd program in their local area? I assume 
“faculty” here means “academic staff member” (in my world, “faculty” is an 
organisational unit composed of academic staff and non-academic staff). Who are 
the “program staff”?

 

Thanks

 

Kerry

 

 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research about WikiProject Recommendation

2017-06-20 Thread Kerry Raymond
There are pros and cons.

Having a standard invitation makes for better research as the project outcomes 
are more comparable but perhaps worse for recruitment.

Perhaps the WikiProjects involved could write the invitation for the person to 
participate, hopefully they can get the tone right. They might want to 
particularly encourage (or discourage) people with specific skills or 
interests, e.g. "We are particularly interested in expanding our articles on 
Pacific Island 17th century wrestling. We are in desperate need of people who 
can develop templates. Our project prides itself on fully cited articles." But 
then differences in the invitation may lead to differences in the uptake. 
Better recruitment, but worse research.

And of course what are the variables being measured for the outcome:
* number of people invited to each WikiProject (presumably easy enough)
* number of people who take up the invitation - how do we determine this? 
listing themselves on the Project page under Participants (yikes, I am active 
in many projects where I haven't done that), increasing their level of editing 
on articles associated with that project, increased activity on the project 
Talk page? Opinion of project leaders (do we have project leaders)? 
Self-identifying as such when asked by researchers?
* level of activity wrt to the project at various periods after the invitation 
is accepted (when is it accepted? See above)

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Jonathan Cardy
Sent: Tuesday, 20 June 2017 8:02 PM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Research about WikiProject Recommendation

Hi Bowen,

If you are going to promote wikiprojects by recommendation then you need to 
test different styles of recommendation. Taking what may still be the two 
biggest wikiprojects, MILHIST and professional wrestling, what worked as an 
invitation for either might be quite different than what would work for Opera 
or chemistry. Tone of voice is important when you are seeking to entice 
volunteers.

You also need to allow for the effect of different existing recruitment 
programs. These tend to be subtle, but they will vary, and that variation could 
mask your project. The most obvious recruitment is via wikiproject tagging of 
articles, and that isn't necessarily done by people who are active in the 
project concerned.

Regards

Jonathan


> On 20 Jun 2017, at 07:35, Bowen Yu  wrote:
> 
> Hi all,
> 
> We are preparing to conduct a study about WikiProject recommendations. 
> The goals of our study are (1) to understand the effectiveness of 
> different recommendation algorithms on recruiting new members to 
> WikiProjects, and
> (2) to evaluate the effectiveness of this intervention on engaging and 
> retaining Wikipedia newcomers.
> 
> In this study, we will recommend related editors to the organizers of 
> WikiProjects, and request them to approach and recruit the editors. We 
> will measure the actions and reactions of the organizers and editors 
> for evaluation. More details about our study can be found here on this 
> meta-page 
> .
> 
> While planning the experimental design, we thought to gather more 
> thoughts and suggestions from the community since this study would 
> involve the efforts of some Wikipedians, so we wanted to open it up. 
> Also, if you know of existing work or study in this area, please let us know. 
> Thanks!
> 
> Sincerely,
> Bowen
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research about WikiProject Recommendation

2017-06-20 Thread Kerry Raymond
Looking at the list of WikiProjects you pointed at, they seem to be a mixture 
of what I would call "process" projects (e.g. Articles for Creation, Deletion 
Sorting) vs "content" projects (e.g. Military History, Television) vs a third 
group like "Women in Red" (which is part process, part content).

Generally the "content" projects will tag Talk pages with their WikiProject 
Banner. But "process" projects don't seem to always do this. For example, I 
don't think Women in Red has a project banner generally, although I think they 
do tag articles that arise from specific Edit-a-thons. Some of the process 
projects seem to use hidden categories for their work.

I would suggest only working with content projects initially. Content projects 
are more similar to one another in how they operate compared to process 
projects, and I think it is easier to judge if a user is showing an interest in 
a content project than in the process project because of standard use of 
content project banners on articles. So I think you can probably get a better 
understanding if the referral mechanism is working or not with content 
projects, whereas I think process projects have a lot of variability in them 
that may make it difficult to work out if you are seeing success or not. 

And at the end of the day, as an encyclopedia, we live or die on our content. 
Processes are (or at least should be) supportive of good content development 
but are a second-order effect.

I can certainly see some issues arising from pointing newcomers at process 
projects as they are unlikely to be aware of the processes at that stage. And 
indeed some process project do not accept new editors (think of Articles for 
Creation and new page patrolling). I'd see this as a second project if the 
content project referral mechanism seems to be working.

Anyhow, that my 10cc!

Kerry

Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Bowen Yu
Sent: Tuesday, 20 June 2017 4:35 PM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] Research about WikiProject Recommendation

Hi all,

We are preparing to conduct a study about WikiProject recommendations. The 
goals of our study are (1) to understand the effectiveness of different 
recommendation algorithms on recruiting new members to WikiProjects, and
(2) to evaluate the effectiveness of this intervention on engaging and 
retaining Wikipedia newcomers.

In this study, we will recommend related editors to the organizers of 
WikiProjects, and request them to approach and recruit the editors. We will 
measure the actions and reactions of the organizers and editors for evaluation. 
More details about our study can be found here on this meta-page 
.

While planning the experimental design, we thought to gather more thoughts and 
suggestions from the community since this study would involve the efforts of 
some Wikipedians, so we wanted to open it up. Also, if you know of existing 
work or study in this area, please let us know. Thanks!

Sincerely,
Bowen
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Student Learning Outcomes using Wikipedia-based assignments

2017-06-19 Thread Kerry Raymond
This is very interesting data. 

One observation I would make is that like many education experiments,  it does 
not control for (what I call) the "highly motivated researcher effect". What 
I've learned from a lifetime of "new ways to teach" is that the standard 
experiment is to parachute in a highly motivated researcher into the classroom 
to introduce the new method, collect data showing improved learning, and then 
advocate for the new method to be rolled out more widely. However, rolling out 
more widely involves taking regular teachers (good, bad, and in-between) to 
learn and apply a new approach, and techniques often fail in the face of 
teacher lack of enthusiasm to learn anything new, complaints it makes more 
demands on teachers to use the new method, etc. In this report it says " the 
program staff provide Wikipedia training and expertise so the faculty do not 
need to have any experience editing" which is a big red flag to me. It would be 
interesting to see the results in an experiment where you first train the 
faculty and then the faculty carry out the engagement with students. And then 
see the results in 3 years time when it's a case of "business as usual" rather 
than "the new thing".

As a general comment, students like the variety of someone new in their 
classroom. Students do tend to learn more from "real world" assignments than 
"lab" assignments because the real world is more complex. However, staff and 
students are often reluctant to have real world assignments significantly 
influence end-of-term marks/grades because of the uncontrollable variables in 
the real world assignment that makes  it difficult to assess the relative 
achievement of the students. I would expect editing Wikipedia articles to 
suffer from this problem as each student will be working on different 
article(s) of different starting size and quality and with different levels of 
involvement and monitoring by other Wikipedians. It was not clear to me from 
the report if students were being assessed on this Wikipedia assignment and how 
important it was to their overall mark/grade. 

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Zach McDowell
Sent: Tuesday, 20 June 2017 8:30 AM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Student Learning Outcomes using Wikipedia-based 
assignments

Hi Everyone,

For the last year I've been working on a fairly large (social science) research 
project studying student learning outcomes using Wikipedia based assignments 
with the Wiki Education Foundation. This was a mixed-methods study designed to 
address a variety of research questions and provide open data for researchers 
to dig through, analyze, and utilize in whatever way they deem fit.

Today I am happy to announce that the research report, the data, the codebooks, 
and many other supporting documents have been released under an open license.

The research report mostly summarizes the preliminary analysis (there were a 
LOT of questions) of some of the qualitative and quantitative data, but it is 
also meant to help understand the larger scope of the research project as well. 
Although this is just a preliminary report, I am working on a few journal 
publications with this data, so this should lead to more than the report (on my 
end at least).

If you are interested in student learning, new users, information literacy, or 
skills transfer, I hope this report and data set finds you well.

Blog post by LiAnna Davis on WMF Blog:
https://blog.wikimedia.org/2017/06/19/wikipedia-information-literacy-study/

Full data set (zip file):
https://github.com/WikiEducationFoundation/research

Research report (commons):
https://commons.wikimedia.org/wiki/File:Student_Learning_Outcomes_using_Wikipedia-based_Assignments_Fall_2016_Research_Report.pdf


best,

Zach


Zachary J. McDowell, PhD
www.zachmcdowell.com
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: [Design] Design in the Era of the Algorithm

2017-06-19 Thread Kerry Raymond
You could display the confidence or click through to the reasoning. Then the 
user can better understand the quality of the answer.

Sent from my iPad

> On 17 Jun 2017, at 5:08 am, Pine W  wrote:
> 
> Perhaps of interest.
> 
> Pine
> 
> 
> -- Forwarded message --
> From: Chris Koerner 
> Date: Fri, Jun 16, 2017 at 8:31 AM
> Subject: [Design] Design in the Era of the Algorithm
> To: des...@lists.wikimedia.org
> 
> 
> Josh Clark on design principles for addressing flaws in machine learning.
> (via waxy.org)\
> 
> "The answer machines have an overconfidence problem. It’s not only a
> data-science problem that the algorithm returns bad conclusions. It’s a
> problem of presentation: the interface suggests that there’s one true
> answer, offering it up with a confidence that is unjustified.
> 
> So this is a design problem, too. The presentation fails to set appropriate
> expectations or context, and instead presents a bad answer with
> matter-of-fact assurance. As we learn to present machine-originated
> content, we face a very hard question: how might we add some productive
> humility to these interfaces to temper their overconfidence?
> 
> I have ideas."
> 
> https://bigmedium.com/speaking/design-in-the-era-of-the-algorithm.html
> 
> Yours,
> Chris Koerner
> Community Liaison - Discovery
> Wikimedia Foundation
> 
> ___
> Design mailing list
> des...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/design
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Finding what is said about a topic in other articles

2017-06-17 Thread Kerry Raymond
Thanks, Eran!

It seems to  work very well indeed. The only thing standing between me and 
total happiness is the SAVE button. The tool keeps telling me I have 
successfully saved things, but I can't work where they were saved to :-) Any 
clues you'd like to offer?

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Eran Rosenthal
Sent: Sunday, 18 June 2017 2:58 AM
To: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Finding what is said about a topic in other 
articles

I wrote a related script with very similar purpose:
Sometimes users write new articles, and forget to add links to the new article 
from related articles (in the worst case this result in orphan articles).
The script aims to find related articles (e.g using search of the current 
title), and then suggest the specific context where the article is mentioned, 
so the user can select whether to add link.

In [[Special:MyPage/common.js]] you can add the following snippet:

mw.loader.load('//he.wikipedia.org/w/index.php?title=User:ערן/quickLinker.js&action=raw&ctype=text/javascript&smaxage=21600&maxage=86400
');
// [[User:ערן/quickLinker.js]]

Then get to some article that you would like to find related article for, and 
press on "Add links" in the sidebar (under tools), which will open OOUI dialog 
with article where the current page is mentioned, and the specific context with 
suggestion for link.
If you find a suggested link suitable, press "save" or press "skip" to continue 
to next relevant page.






On Wed, Jun 14, 2017 at 8:11 PM, Chris Koerner 
wrote:

> Kerry,
> What an interesting idea. I created a task in Phabricator, Wikimedia's 
> took for tracking bug and feature requests. I'll bug some of the 
> search folks to see if they have any suggestions. In the task I shared 
> a very clunky way of doing this that is not 100% what you're looking 
> for, but something! :)
>
> https://phabricator.wikimedia.org/T167899
>
> Yours,
> Chris Koerner
> Community Liaison - Discovery
> Wikimedia Foundation
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Survey of Welsh Wicipedia's readership

2017-06-16 Thread Kerry Raymond
Now this question may reveal my ignorance about the Welsh language and/or the 
Welsh Wikipedia and/or my own monolingualism (I don't read any other language 
well enough to attempt to read it in a non-English Wikiepdia), but English 
Wikipedia says that "Most Welsh-speaking people in Wales also speak English" 
and Welsh Wikipedia has 91K articles to English Wikipedia's 5M+.

Given the above, I am somewhat curious why the survey did not ask about the 
ability to speak English and the extent of usage of English Wikipedia and what 
factors made them choose to read one or the other. Given that most of the 
respondents are probably bi-lingual (to some degree) as readers, it would seem 
that they are probably reading English Wikipedia to some extent given its far 
wider coverage. So when and why do they choose to read Welsh Wikipedia? My 
guess is that it may be for reasons related to the Welsh language itself:

* as first preference because their Welsh reading is better than their English 
reading (preferable)
* as first preference because despite being equally or more fluent in English 
because they want to support the Welsh language (patriotism)
* as first preference because despite being equally or more fluent in English 
they want to maintain or improve their Welsh language skills by reading in that 
language (learning)

or because of the content:

* because the topic of interest is only on Welsh Wikipedia (necessity)
* because the topic of interest is on both Welsh and English Wikipedia and so 
reading both is likely to provide more information than reading just one 
(comprehensive)
* as a second preference because the topic was not adequately covered on 
English Wikipedia and it is hoped it is better covered in Welsh Wikipedia 
(unsatisfied)

Are there other reasons?

Off-hand, does anyone have a sense of the coverage of Welsh Wikipedia? To what 
extent is it providing articles on topics not on English Wikipedia or much 
better covered than on English Wikipedia, vs providing articles on similar 
topics to English Wikipedia (possibly shorter) but in Welsh? That is, what do 
the goals of the writers of Welsh Wikipedia appear to be? Does it lean towards 
providing specific Welsh content or providing generic content in Welsh? 

I realise you can ask the same questions about any other language Wikipedia vs 
English Wikipedia, but for many other languages, there will be many people who 
speak the other language far more fluently than English (or may not speak 
English at all). But this seems to be less likely to be the case for 
Welsh-vs-English.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Richard Nevell
Sent: Friday, 16 June 2017 11:40 PM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] Survey of Welsh Wicipedia's readership

Hello,

Earlier this year Wikimedia UK conducted a survey of Welch Wicipedia's readers. 
We wanted to learn more about their demographics and why they chose to read c. 
1001 people filled in the survey and the results are available on meta-wiki in 
English 
and Welsh

.

Richard Nevell
--
Richard Nevell
Project Coordinator
Wikimedia UK - sign up to our newsletter 
+44 (0) 20 7065 0921 <020%207065%200921>

Wikimedia UK is a Company Limited by Guarantee registered in England and
Wales, Registered No. 6741827. Registered Charity No.1144513. Registered
Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT.
United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia
movement. The Wikimedia projects are run by the Wikimedia Foundation (who
operate Wikipedia, amongst other projects).

*Wikimedia UK is an independent non-profit charity with no legal control
over Wikipedia nor responsibility for its contents.*
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Finding what is said about a topic in other articles

2017-06-14 Thread Kerry Raymond
Ahh ... it's a whole new meaning of transclusion ...

I am tempted to say "well, just work with the original wikitext and don’t 
resolve the templates" but I guess the problem here is that all templates 
aren't equal. Links in an infobox are much more likely to be relevant to *this* 
article than links in a navbox are and who knows about arbitrary templates more 
generally. I saw a stat in passing the other day that said around 50% of 
Wikipedia article have navboxes and I confess to having added a few navboxes 
even in the past few days. As a reader I like them, but they are a pain for 
anyone using "What links here".

Indeed, just using

insource:"Chapel Hill, Queensland"

* without* the square brackets does a jolly fine job of identifying the 
articles that mention the article [[Chapel Hill, Queensland]] or just the topic 
and provides a snippet (not a great one but it does gives some context to the 
link)

It works because it sees the links that are used as parameters in the infobox 
(whether or not they are wrapped in square brackets) but can't see the ones 
embedded in the definition of the navboxes. Plus you get mentions as well as 
links. Sweet! If one could have a filter that eliminated the "mutually linking" 
articles (X links to Y and Y links to X) it would be close to nailing it! Of 
course it works better for longer article titles unlikely to occur in other 
circumstances. I wouldn't bother to try it for [[Food]] but then I am looking 
to a tool to populate stubs which probably eliminates "common name" articles. 

Kerry

-Original Message-
From: Nick Wilson (Quiddity) [mailto:nwil...@wikimedia.org] 
Sent: Wednesday, 14 June 2017 3:34 PM
To: Kerry Raymond ; Research into Wikimedia content 
and communities 
Cc: Nicholas Moreau 
Subject: Re: [Wiki-research-l] Finding what is said about a topic in other 
articles

On Tue, Jun 13, 2017 at 6:08 PM, Kerry Raymond  wrote:
> Indeed, the “Notable residents” section is one that would definitely benefit 
> from this tool. Is it just me or is there something actually broken with 
> “What links here?”. I try to suppress the transclusions (usually coming from 
> navboxes) but they are still displayed no matter whether I say to “Show/Hide 
> Transclusions” but a search of the article reveals there is no other link 
> present.
>
>

That existing feature works by hiding/showing where *the page itself* is 
transcluded *into*. E.g.
https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:WikiFauna
vs 
https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:WikiFauna&hidetrans=1


Making it work differently for incoming links that are coming from a template, 
is a long-standing (and complicated to implement)
feature-request: https://phabricator.wikimedia.org/T14396

However, I see this comment by Izno suggests a partial (manual) workaround, 
using an "insource:/\[\[FOO/" search.
https://phabricator.wikimedia.org/T14396#3246134
e.g. 
https://en.wikipedia.org/w/index.php?title=Special:Search&profile=all&search=insource%3A%2F\[\[Wikipedia%3AWikiGremlin%2F&fulltext=1
versus https://en.wikipedia.org/wiki/Special:WhatLinksHere/Wikipedia:WikiGremlin
(I'm not sure why Izno's example also includes the "linksto:FOO"
string, but it appears to be redundant)

--
Quiddity


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Finding what is said about a topic in other articles

2017-06-13 Thread Kerry Raymond
Indeed, the “Notable residents” section is one that would definitely benefit 
from this tool. Is it just me or is there something actually broken with “What 
links here?”. I try to suppress the transclusions (usually coming from 
navboxes) but they are still displayed no matter whether I say to “Show/Hide 
Transclusions” but a search of the article reveals there is no other link 
present.

 

Kerry

 

 

From: Nicholas Moreau [mailto:nicholasmor...@gmail.com] 
Sent: Wednesday, 14 June 2017 9:55 AM
To: kerry.raym...@gmail.com; Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Finding what is said about a topic in other 
articles

 

Hi,

I would *love* this tool as well. Being a frequent editor of "List of people 
from" articles, it would save me oodles of time to be able to pass by false 
positives in What links here. (A third of the links to a community are gold, 
the rest are template transclusions and people competing in a tourney at X 
community.)

Nick

On Tue, Jun 13, 2017 at 6:45 PM Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

Is there a tool already (or "how hard would it be?") which would show the
user what is said about article X in other articles. It seems to me that
there are a lot of easy content additions that might be found that way and
used to flesh out stubs and other shorter articles. What is motivating this
is because I often find that  "what links here" often points to some
surprising articles which can reveal new insights into a topic. I often
write about places. Often I think "oh, this one's nothing special" and
suddenly "what links here" reveals some interesting events that occurred
there. Discovery of a famous fossil or a big role in World War II or the
birthplace of someone quite famous. So I am wondering if there is a way to
automate this process a bit by quickly drilling down to the relevant chunk
of the article content rather than having to read/search the whole thing.



That is, if I was writing the article [[Bang Bang Jump Up]], I would want a
list along the lines of:



*From article [[Winston Churchill]] within section "After the Second
World War" : On 23 July 1944 at [[Bang Bang Jump Up]], he met [[Harry
Truman]] to discuss the establishment of the [[United Nations]].



(False news alert: These world leaders did not meet at Bang Bang Jump Up,
but let's pretend they did.)



That is, a list of the articles with the sentence/para containing the link
or +/- N chars before or after the link, whatever's feasible to create an
intelligible snippet without having to read the whole article.



I am assuming here that article X is linked from Y (I'm not considering text
mentions). Of course, the success of the tool is its ability to pick what
might be most relevant. Nobody wants to wade through a list of irrelevant
mentions. So I would want to stick to links occurring in the prose of the
article body rather than navbox transclusions, links in citations, templates
and so forth. I also think that ordering the list by some "likely to be most
useful" metric would be beneficial (or ideally the ability of the user to
fiddle with those choices at run-time). Now until one has such a tool to get
experience with, it's hard to know what might constitute more "relevant".
But some metrics might be:



*The relative importance of the topics. I suspect if a more
important topic is mentioning a less important topic, it might be more
relevant. Winston Churchill is more important than Bang Bang Jump Up.

*The relative quality of the articles. I suspect if a high quality
article is mentioning a low quality article, it might be more relevant.
Winston Church is a higher quality article than Bang Bang Jump Up.

*Being tagged by the same WikiProject (or not within the same
WikiProject). Not sure which would likely be more relevant but it might be
interesting to explore. It's unlikely Winston Churchill and Bang Bang Jump
Up are in the same WikiProject.

*The other article is not already linked in this article. That is,
if Bang Bang Jump Up already links to Winston Churchill, then probably this
is less likely to be "new information" for the Bang Bang Jump Up article.



Anyhow, do we have a tool that does something along these lines? If not, is
there a student project here? :)



Kerry

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org 
<mailto:Wiki-research-l@lists.wikimedia.org> 
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Finding what is said about a topic in other articles

2017-06-13 Thread Kerry Raymond
Is there a tool already (or "how hard would it be?") which would show the
user what is said about article X in other articles. It seems to me that
there are a lot of easy content additions that might be found that way and
used to flesh out stubs and other shorter articles. What is motivating this
is because I often find that  "what links here" often points to some
surprising articles which can reveal new insights into a topic. I often
write about places. Often I think "oh, this one's nothing special" and
suddenly "what links here" reveals some interesting events that occurred
there. Discovery of a famous fossil or a big role in World War II or the
birthplace of someone quite famous. So I am wondering if there is a way to
automate this process a bit by quickly drilling down to the relevant chunk
of the article content rather than having to read/search the whole thing.

 

That is, if I was writing the article [[Bang Bang Jump Up]], I would want a
list along the lines of: 

 

*From article [[Winston Churchill]] within section "After the Second
World War" : On 23 July 1944 at [[Bang Bang Jump Up]], he met [[Harry
Truman]] to discuss the establishment of the [[United Nations]].

 

(False news alert: These world leaders did not meet at Bang Bang Jump Up,
but let's pretend they did.)

 

That is, a list of the articles with the sentence/para containing the link
or +/- N chars before or after the link, whatever's feasible to create an
intelligible snippet without having to read the whole article.

 

I am assuming here that article X is linked from Y (I'm not considering text
mentions). Of course, the success of the tool is its ability to pick what
might be most relevant. Nobody wants to wade through a list of irrelevant
mentions. So I would want to stick to links occurring in the prose of the
article body rather than navbox transclusions, links in citations, templates
and so forth. I also think that ordering the list by some "likely to be most
useful" metric would be beneficial (or ideally the ability of the user to
fiddle with those choices at run-time). Now until one has such a tool to get
experience with, it's hard to know what might constitute more "relevant".
But some metrics might be:

 

*The relative importance of the topics. I suspect if a more
important topic is mentioning a less important topic, it might be more
relevant. Winston Churchill is more important than Bang Bang Jump Up.

*The relative quality of the articles. I suspect if a high quality
article is mentioning a low quality article, it might be more relevant.
Winston Church is a higher quality article than Bang Bang Jump Up.

*Being tagged by the same WikiProject (or not within the same
WikiProject). Not sure which would likely be more relevant but it might be
interesting to explore. It's unlikely Winston Churchill and Bang Bang Jump
Up are in the same WikiProject.

*The other article is not already linked in this article. That is,
if Bang Bang Jump Up already links to Winston Churchill, then probably this
is less likely to be "new information" for the Bang Bang Jump Up article.

 

Anyhow, do we have a tool that does something along these lines? If not, is
there a student project here? :)

 

Kerry

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Video demos of upcoming changes to edit review / RC patrol

2017-06-01 Thread Kerry Raymond
Actually I don't use the on-wiki watchlist. I have them emailed to me. This 
means I can process every one of them and not miss any. I send these emails all 
to a particular folder and, when I process them, I do take advantage of the 
search capabilities of Gmail to group the notification emails and process those 
together. For example, when I notice a bot or user is making a large number of 
similar unproblematic changes to a large group of articles, I will often filter 
on that user and delete all their notification messages rather than check each 
article. However generally the emails do not contain enough information to 
filter in some of the ways being mentioned here. If that info was in the 
notification email, one could achieve similar filtering on email messages. So I 
would love to see a bit more info added to the emails for that purpose.

Kerry

Sent from my iPad

> On 2 Jun 2017, at 9:30 am, Nick Wilson (Quiddity)  
> wrote:
> 
> Hi Kerry,
> The short answer is yes! the Collaboration Team is working now to extend the 
> new user interface so that it includes all the existing features on the 
> Recent Changes page, Watchlist and a few related pages—along with some new 
> tools users are requesting. We’re doing user testing right now of this 
> extended functionality (which includes things like Namespace filters, Tag 
> filters, User filters and, possibly, a Category filter). 
> When we have it all working the way it should, we plan to bring the new UI 
> and tools to Watchlist. This should happen in the next few months.
> 
> However, it is also already possible to add a "On [my] watchlist" filter or 
> highlight, to the results on the recent changes page.
> E.g. 
> https://en.wikipedia.org/wiki/Special:RecentChanges?hidebots=1&hidecategorization=1&hideWikibase=1&watchlist=watched
> or 
> https://meta.wikimedia.org/wiki/Special:RecentChanges?hidebots=1&hidecategorization=1&hideWikibase=1&watchlist=watched
> 
> Further details are in the main documentation at
> https://www.mediawiki.org/wiki/Special:MyLanguage/Edit_Review_Improvements
> and pages linked in the side-navbox.
> 
> Feedback appreciated at
> https://www.mediawiki.org/wiki/Talk:Edit_Review_Improvements/New_filters_for_edit_review
> 
> Cheers,
> 
>> On Sun, May 28, 2017 at 8:00 PM, Kerry Raymond  
>> wrote:
>> I only watched the first video but I can see it is  useful addition to 
>> managing a large number of recent changes. Is there any plan to offer a 
>> similar service with watchlists?
>> 
>> -Original Message-
>> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] 
>> On Behalf Of Pine W
>> Sent: Monday, 29 May 2017 7:42 AM
>> To: wikitec...@lists.wikimedia.org; Wikimedia Mailing List 
>> ; Wiki Research-l 
>> 
>> Subject: [Wiki-research-l] Video demos of upcoming changes to edit review / 
>> RC patrol
>> 
>> I'd like to highlight two videos (some people may have already seen these) 
>> that demo upcoming changes to edit review / RC patrol that take advantage of 
>> ORES. I feel that that the changes look promising, and I hope that RC 
>> patrollers, Teahouse hosts, newbie adopters, and others will find that the 
>> changes make their work easier. I also hope for improved retention of 
>> good-faith contributors.
>> 
>> 0. A succinct overview by Joe Matazzoni (WMF):
>> https://commons.wikimedia.org/w/index.php?title=File%3ANew-feature_demo%E2%80%94smart_Recent_Changes_filtering_with_ORES.webm
>> 
>> 1. A more extensive overview, also by Joe, including valuable context, from 
>> the WMF Metrics Meeting for May 2017:
>> https://www.youtube.com/watch?v=rAGwQdLyFb4 between 15:00 and 28:15.
>> 
>> Pine
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> 
>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> 
> 
> -- 
> Nick Wilson (Quiddity)
> Community Liaison, WMF
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] research trying to influence real-world outcomes by editing Wikipedia

2017-05-29 Thread Kerry Raymond
I can understand the hypothesis whether longer school articles would attract 
more enrolments, but I am a bit bemused about the medical hypothesis whether 
longer articles about a disease would cause more people to have it or at least 
be diagnosed with it. What exactly is the medical hypothesis here? Is it 
relating to treatment articles or drug articles?

As for the ethics, if the information added to an article (school or medical) 
seeks to be accurate and satisfies the normal requirements (citations, NPOV, 
NOR, COI, etc), so what? Does it matter if it's done by a research project or 
done by anybody else? Do we know who did every edit on those articles currently 
or why?

It's pretty clearly an ethical problem to add incorrect information. I can see 
a possible ethical issue if one article was updated with good quality 
contributions and another was done in a deliberately sloppy way, to test a 
difference.

Kerry

Sent from my iPad

> On 29 May 2017, at 6:00 pm, Leila Zia  wrote:
> 
>> On Mon, May 29, 2017 at 12:52 AM, James Salsman  wrote:
>> 
>> 
>> Are there any ethical guidelines concerning whether this is
>> reasonable? Should there be?
>> 
> 
> ​How about contacting the authors directly and asking them if they have
> considered the potential ethical challenges of extending the research to
> the two areas they've mentioned in the paper? Their response may be as
> simple as: sure, and we are aware of it. If they're not aware of it, your
> note can help them think about it.
> 
> Best,
> Leila​
> 
> 
> 
>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Video demos of upcoming changes to edit review / RC patrol

2017-05-28 Thread Kerry Raymond
I only watched the first video but I can see it is  useful addition to managing 
a large number of recent changes. Is there any plan to offer a similar service 
with watchlists? 

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Pine W
Sent: Monday, 29 May 2017 7:42 AM
To: wikitec...@lists.wikimedia.org; Wikimedia Mailing List 
; Wiki Research-l 

Subject: [Wiki-research-l] Video demos of upcoming changes to edit review / RC 
patrol

I'd like to highlight two videos (some people may have already seen these) that 
demo upcoming changes to edit review / RC patrol that take advantage of ORES. I 
feel that that the changes look promising, and I hope that RC patrollers, 
Teahouse hosts, newbie adopters, and others will find that the changes make 
their work easier. I also hope for improved retention of good-faith 
contributors.

0. A succinct overview by Joe Matazzoni (WMF):
https://commons.wikimedia.org/w/index.php?title=File%3ANew-feature_demo%E2%80%94smart_Recent_Changes_filtering_with_ORES.webm

1. A more extensive overview, also by Joe, including valuable context, from the 
WMF Metrics Meeting for May 2017:
https://www.youtube.com/watch?v=rAGwQdLyFb4 between 15:00 and 28:15.

Pine
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Historical Good Articles

2017-05-18 Thread Kerry Raymond
I think this has the data you want

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_History/Assessment

Sent from my iPad

> On 19 May 2017, at 1:09 pm, Haifeng Zhang  wrote:
> 
> Hi, folks,
> 
> Is there a way to find all historical Wikipedia Good Articles (GAs)?
> 
> I checked the following page, which seems only include the current GAs.
> 
> https://en.wikipedia.org/wiki/Wikipedia:Good_articles
> 
> 
> Thanks,
> 
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Citation Project - Comments Welcome!

2017-05-03 Thread Kerry Raymond
 (verifiable good article), the 
text-to-cite mapping must be embedded in the article and almost all of the text 
is “covered” (in the mathematical sense) by the mapping. Indeed, the extent of 
coverage could be a verifiability metric.

 

OK, maybe what I am proposing is not the way to go, but I think we ought to be 
thinking about this issue of cite rot, because I think it’s a real problem. I 
suspect it’s already out there but we don’t notice it because we *see* lots of 
inline citations and assume all is well.

 

Kerry

 

From: Andrea Forte [mailto:andrea.fo...@gmail.com] 
Sent: Wednesday, 3 May 2017 11:46 PM
To: kerry.raym...@gmail.com
Cc: Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Citation Project - Comments Welcome!

 

 

...and YES, detecting when a reference has changed but the adjacent text has 
not is something that will be detectable with the dataset we aim to produce. 
That's a great idea!

 

On Tue, May 2, 2017 at 7:59 AM, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

Just a couple of thoughts that cross my mind ...

If people use the {{cite book}} etc templates, it will be relatively easy to 
work out what the components of the citation are. However if people roll their 
own, e.g.

[http://someurl This And That], Blah Blah 2000

you may have some difficulty working out what is what. I've just been though a 
tedious exercise of updating a set of URLs using AWB over some thousands of 
articles and some of the ways people roll their own citations were quite 
remarkable (and often quite unhelpful). It may be that you can't extract much 
from such citations. However, the good news is that if they have a URL in them, 
it will probably be in plain-sight.

Whereas there are a number of templates that I regularly use for citation like 
{{cite QHR}} (currently 1234 transclusions) and {{cite QPN}} (currently 2738  
transclusions) and {{Census 2011 AUS}} (4400 transclusions) all of which 
generate their URLs. I'm not sure how you will deal with these in terms of 
extracting URLs.

But whatever the limitations, it will be a useful dataset to answer some 
interesting questions.

One phenomena I often see is new users updating information (e.g. changing the 
population of a town) while leaving behind the old citation for the previous 
value. So it superficially looks like the new information is cited to a 
reliable source when in fact it isn't. I've often wished we could automatically 
detect and raise a "warning" when the "text being supported" by the citation 
changes yet the citation does not. The problem, of course, is that we only know 
where the citation appears in the text and that we presume it is in support for 
"some earlier" text (without being clear exactly where it is). And if an 
article is reorganised, it may well result in the citation "drifting away" from 
the text it supports or even that it is in support of text that has been 
deleted. So I think it is important to know what text preceded the citation at 
the time the citation first appears in the article history as it may be useful 
to compare it against the text that *now* appears before it. It is a great pity 
that (in these digital times) we have not developed a citation model where you 
select chunks of text and link your citation to them, so that the relationship 
between the text and the citation is more apparent.

Kerry


-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org 
<mailto:wiki-research-l-boun...@lists.wikimedia.org> ] On Behalf Of Andrea Forte
Sent: Tuesday, 2 May 2017 5:18 AM
To: Research into Wikimedia content and communities 
mailto:Wiki-research-l@lists.wikimedia.org> >
Subject: [Wiki-research-l] Citation Project - Comments Welcome!

Hi all,


One of my PhD students, Meen Chul Kim, is a data scientist with experience in 
bibliometrics and we will be working on some citation-related research together 
with Aaron and Dario in the coming months. Our main goal in the short term is 
to develop an enhanced citation dataset that will allow for future analyses of 
citation data associated with article quality, lifecycle, editing trends, etc.


The project page is here:
https://meta.wikimedia.org/wiki/Research:Understanding_the_context_of_citations_in_Wikipedia


The project is just getting started so this is a great time to offer feedback 
and suggestions, especially for features of citations that we should mine as a 
first step, since this will affect what the dataset can be used for in the 
future.


Looking forward to seeing some of you at WikiCite!!

Andrea




--
 :: Andrea Forte
 :: Associate Professor
 :: College of Computing and Informatics, Drexel University
 :: http://www.andreaforte.net

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org 
<mailto:Wiki-re

Re: [Wiki-research-l] Citation Project - Comments Welcome!

2017-05-02 Thread Kerry Raymond
Just a couple of thoughts that cross my mind ...

If people use the {{cite book}} etc templates, it will be relatively easy to 
work out what the components of the citation are. However if people roll their 
own, e.g.

[http://someurl This And That], Blah Blah 2000

you may have some difficulty working out what is what. I've just been though a 
tedious exercise of updating a set of URLs using AWB over some thousands of 
articles and some of the ways people roll their own citations were quite 
remarkable (and often quite unhelpful). It may be that you can't extract much 
from such citations. However, the good news is that if they have a URL in them, 
it will probably be in plain-sight.

Whereas there are a number of templates that I regularly use for citation like 
{{cite QHR}} (currently 1234 transclusions) and {{cite QPN}} (currently 2738  
transclusions) and {{Census 2011 AUS}} (4400 transclusions) all of which 
generate their URLs. I'm not sure how you will deal with these in terms of 
extracting URLs.

But whatever the limitations, it will be a useful dataset to answer some 
interesting questions.

One phenomena I often see is new users updating information (e.g. changing the 
population of a town) while leaving behind the old citation for the previous 
value. So it superficially looks like the new information is cited to a 
reliable source when in fact it isn't. I've often wished we could automatically 
detect and raise a "warning" when the "text being supported" by the citation 
changes yet the citation does not. The problem, of course, is that we only know 
where the citation appears in the text and that we presume it is in support for 
"some earlier" text (without being clear exactly where it is). And if an 
article is reorganised, it may well result in the citation "drifting away" from 
the text it supports or even that it is in support of text that has been 
deleted. So I think it is important to know what text preceded the citation at 
the time the citation first appears in the article history as it may be useful 
to compare it against the text that *now* appears before it. It is a great pity 
that (in these digital times) we have not developed a citation model where you 
select chunks of text and link your citation to them, so that the relationship 
between the text and the citation is more apparent.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
Behalf Of Andrea Forte
Sent: Tuesday, 2 May 2017 5:18 AM
To: Research into Wikimedia content and communities 

Subject: [Wiki-research-l] Citation Project - Comments Welcome!

Hi all,


One of my PhD students, Meen Chul Kim, is a data scientist with experience in 
bibliometrics and we will be working on some citation-related research together 
with Aaron and Dario in the coming months. Our main goal in the short term is 
to develop an enhanced citation dataset that will allow for future analyses of 
citation data associated with article quality, lifecycle, editing trends, etc.


The project page is here:
https://meta.wikimedia.org/wiki/Research:Understanding_the_context_of_citations_in_Wikipedia


The project is just getting started so this is a great time to offer feedback 
and suggestions, especially for features of citations that we should mine as a 
first step, since this will affect what the dataset can be used for in the 
future.


Looking forward to seeing some of you at WikiCite!!

Andrea




--
 :: Andrea Forte
 :: Associate Professor
 :: College of Computing and Informatics, Drexel University
 :: http://www.andreaforte.net
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Kerry Raymond
Yes, under-categorised and/or under-tagged articles could probably be detected 
by inbound/outbound link analysis and presented as candidates to the relevant 
WikiProjects for categorising and tagging. So long as you didn’t deliver up too 
many false positives, people would probably still deal with the false positives 
by a best efforts categorisation of tagging or at least pass them off to a more 
relevant project based on their human intelligence.

 

On a related theme, outgoing link analysis could be used to draw orphan 
articles to the attention of likely WikiProjects.

Kerry

 

From: Stuart A. Yeates [mailto:syea...@gmail.com] 
Sent: Friday, 28 April 2017 10:59 AM
To: kerry.raym...@gmail.com; Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Project exploring automated classification of 
article importance

 

Following up Kerry's comments: far more useful to our encyclopedia building 
project would not be a global importance assessor, but a assessor of which 
wikiprojects a page is likely to be of interest to. There are hundreds of 
thousands of en.wiki pages which are not tagged properly to their wikiprojects 
and are thus effectively invisible to the community of editors who case about 
them.

This is a classic example of statistical classification, so it shouldn't be too 
technically difficult...

 

cheers

stuart




--
...let us be heard from red core to black sky

 

On 28 April 2017 at 12:28, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

I observe (and am unsurprised) that WikiProject Australia also rates the 
Pavlova article as High importance, which demonstrates into the Stuart's 
comments about graphs and subgraphs. If there are relationships between 
WikiProjects, there is probably some correlation about importance of articles 
as seen by those projects. As it happens, WikiProject Australia and WikiProject 
New Zealand are related on Wikipedia only by both being within the category 
"WikiProject Countries projects" (along with every other national WikiProject), 
so this is an example where you cannot see the connection between these 
projects "on-wiki" but anyone who knows anything about the geography, history, 
and culture of the two countries will understand the close connection (e.g. 
ANZAC, sheep, pavlova, rugby union) but, as the project tagging will show, we 
do have our differences, e.g. Whitebait is a High Importance article for NZ but 
Oz doesn't even tag it (we don't share the NZ passion for these small fish). 
And perhaps more seriously, our two countries have different indigenous peoples 
so our project tagging around Maori (NZ) and Aboriginal and Torres Strait 
Islander (Oz) articles would usually be quite disjoint.

So if there are correlations between project tagging, it may be something 
exploitable in machine assessment of importance.

Kerry

-Original Message-
From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org 
<mailto:wiki-research-l-boun...@lists.wikimedia.org> ] On Behalf Of Stuart A. 
Yeates
Sent: Friday, 28 April 2017 6:18 AM
To: Research into Wikimedia content and communities 
mailto:wiki-research-l@lists.wikimedia.org> >
Subject: Re: [Wiki-research-l] Project exploring automated classification of 
article importance

On em.wiki article importance is relative to some wikiproject. This is encoded 
in https://en.wikipedia.org/wiki/Template:WPBannerMeta which appears on 16% of 
all wikipedia pages via specialisations such as 
https://en.wikipedia.org/wiki/Template:WikiProject_New_Zealand

Within Wikiproject New Zealand, there are articles which we think are very 
important to us, which we would never argue are even marginally important on a 
global scale. Take for example
https://en.wikipedia.org/wiki/Pavlova_(food)

For the mathematically inclined, this is a classic case of graph and many 
subgraphs.

cheers
stuart


--
...let us be heard from red core to black sky

On 27 April 2017 at 21:44, Gerard Meijssen mailto:gerard.meijs...@gmail.com> >
wrote:

> Hoi,
> I have read the proposal and it leaves me wondering. Also the notion
> of importance is indeed neither easy nor obvious. I think the question
> what is most important is irrelevant depending on how you look at it.
> Subject can be irrelevant when you look at it from a personal
> perspective, looking at it from a particular perspective and indeed
> what seems relevant may become irrelevant or relevant over time. When
> you use metrics there will always be one way or another why it will be found 
> to be problematic.
>
> When you consider Wikipedia, the difference it makes with similar
> resources is that its long tail is so much longer and still it is easy
> and obvious to show how the English Wikipedia's long tail is not long
> enough [1]. When you are looking for links and relevance, Wikidata
> includes data on all Wi

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Kerry Raymond
ee these ideas pop up in the discussion and be able to show how 
we're incorporating these into what we're doing and that they affect our 
results.

As I wrap up, I would like to challenge the assertion that initial importance 
ratings are "pretty accurate". I'm not sure we really know that.
They might be, but it might be because the vast majority of them are newly 
created stubs that get rated "low importance". More interesting are perhaps 
other types of articles, where I suspect that importance ratings are copied 
from one WikiProject template to another, and one could argue that they need 
updating. Our collaboration with WikiProject Medicine has resulted in updated 
ratings of a couple of hundred or so articles so far, although most of them 
were corrections that increase consistency in the ratings. As I continue 
working on this project I hope to expand our collaborations to other 
WikiProjects, and I'm looking forward to seeing how well we fare with those!


Citations:
West, R.; Weber, I.; and Castillo, C. 2012. Drawing a Data-driven Portrait of 
Wikipedia Editors. In Proc. of OpenSym/WikiSym, 3:1–3:10.

Lam, S. T. K.; Uduwage, A.; Dong, Z.; Sen, S.; Musicant, D. R.; Terveen, L.; 
and Riedl, J. 2011. WP:Clubhouse?: An Exploration of Wikipedia's Gender 
Imbalance. In Proc. of WikiSym, 1–10.

Warncke-Wang, M., Ranjan, V., Terveen, L., and Hecht, B. "Misalignment Between 
Supply and Demand of Quality Content in Peer Production Communities" in the 
proceedings of ICWSM 2015.

Dimitrov, D., Singer, P., Lemmerich, F., & Strohmaier, M. (2016, April).
Visual positions of links and clicks on wikipedia. In Proceedings of the 25th 
International Conference Companion on WWW (pp. 27-28).


Cheers,
Morten

On 25 April 2017 at 20:39, Kerry Raymond  wrote:

> Just a few musings on the issue of Importance and how to research it ...
>
> I agree it is intuitive that importance is likely to be linked to 
> pageviews and inbound links but, as the preliminary experiment showed, 
> it's probably not that simple.
>
> Pageviews tells us something about importance to readers of Wikipedia, 
> while inbound links tells us something about importance to writers of 
> Wikipedia, and I suspect that writers are not a proxy for readers as 
> the editor surveys suggest that Wikipedia writers are not typical of 
> broader society on at least two variables: gender and level of 
> education (might be others, I can't remember).
>
> But I think importance is a relative metric rather than  absolute. I 
> think by taking the mean value of importance across a number of 
> WikiProjects in the preliminary experiment may have lost something 
> because it tried (through averaging) to look at importance 
> "generally". I would suspect conducting an experiment considering only 
> the importance ratings wrt to a single WikiProject would be more 
> likely to show correlation with pageviews (wrt to other articles in 
> that same WikiProject) and inbound links. And I think there are two 
> kinds of inbound links to be considered, those coming from other 
> articles within the same WikiProject and those coming from outside 
> that Wikiproject. I suspect different insights will be obtained by 
> looking at both types of inbound links separately rather than treating 
> them as an aggregate. I note also that WikiProjects are not entirely 
> independent of one another but have relationships between them. For 
> example, The WikiProject Australian Roads describes itself as an 
> "intersection" (ha ha!) of WikiProject Highways and WikiProject 
> Australia, so I expect that we would find greater correlation in importance 
> between related WikiProjects than between unrelated WikiProjects.
>
> When thinking about readers and pageviews, I think we have to ask 
> ourselves is there a difference between popularity and importance. Or 
> whether popularity *is* importance. I sense that, as a group of 
> educated people, those of us reading this research mailing list 
> probably do think there is a difference. Certainly if there is no 
> difference, then this research can stop now -- just judge importance 
> by  pageviews. Let's assume a difference then. When looking at 
> pageviews of an article, they are not always consistent over time. 
> Here are the pageviews for Drottninggatan
>
> https://tools.wmflabs.org/pageviews/?project=en.wikipedia.
> org&platform=all-access&agent=user&range=latest-90&pages=Drottninggata
> n
>
> Why so interesting on 8 April? A terrorist attack occurred there. This 
> spike in pageviews occurs all the time when some topic is in the news 
> (even peripherally as in this case where it is not the article about 
> the terrorist attack but about the street in which it occurred). Did 
> t

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Kerry Raymond
win out over 
> > popularity. If Wikipedia still exists in fifty of five hundred years 
> > time and we are
> still
> > using pasteurisation and indeed still eating hydrocarbon based 
> > foods,
> then
> > I suspect the pop group you mention will be less frequently read 
> > about
> than
> > the pasteurisation process.
> >
> > In the meantime if we try to work it out at all it has to be 
> > something of a judgement call, and one we will occasionally get 
> > wrong. Any guesses as
> to
> > which current branches of science will be as forgotten in a century 
> > as phrenology is today?
> >
> > At an extreme the weekly top ten most viewed articles are a good 
> > guide to what is trending in the popular cultures of India and the 
> > USA. I'm
> assuming
> > that most modern pop culture is inherently ephemeral. Of course 
> > digital historians of future centuries may be rolling on the floor 
> > laughing at
> this
> > email, and the TV dramas currently being filmed may still be widely
> studied
> > and universally known classics while our leading edge science lies 
> > buried in the foundations of their science.
> >
> > Regards
> >
> > Jonathan
> >
> >
> > > On 26 Apr 2017, at 08:50, Jane Darnell  wrote:
> > >
> > > Yes I totally agree that "importance is a relative metric rather 
> > > than absolute." I also agree that incoming links and pageviews are 
> > > not
> > accurate
> > > measurements of "importance" for all of the reasons you mention.
> However,
> > > we are still a project that is actively exploring the universe of 
> > > knowledge, and leaning heavily on academia and other established
> sources
> > we
> > > must "boldly go where no man has gone before" (and please feel 
> > > free to insert "white, euro-centric" before the man part). So do 
> > > you have any suggestions what we could measure going forward that 
> > > would cough up
> some
> > > interesting stats to monitor? Pagewatching is useful , but 
> > > problematic because these are only assigned at page-creation, 
> > > while some marginal editor interest might be expanded to whole 
> > > categories (speaking as
> > someone
> > > who has thousands of pages watchlisted on multiple projects). I 
> > > like
> your
> > > thoughts about looking for key articles such as those used as the
> > "article
> > > as the "main" article for a category or as the title of a navbox 
> > > ".  I
> am
> > > looking for similar usages of paintings as a way to find popular
> painters
> > > or paintings rather than just those paintings which have articles
> written
> > > about them (which are often written for totally random reasons 
> > > such as theft/sale/wikiproject).
> > >
> > > On Wed, Apr 26, 2017 at 5:39 AM, Kerry Raymond <
> kerry.raym...@gmail.com>
> > > wrote:
> > >
> > >> Just a few musings on the issue of Importance and how to research 
> > >> it
> ...
> > >>
> > >> I agree it is intuitive that importance is likely to be linked to 
> > >> pageviews and inbound links but, as the preliminary experiment 
> > >> showed,
> > it's
> > >> probably not that simple.
> > >>
> > >> Pageviews tells us something about importance to readers of 
> > >> Wikipedia, while inbound links tells us something about 
> > >> importance to writers of Wikipedia, and I suspect that writers 
> > >> are not a proxy for readers as
> the
> > >> editor surveys suggest that Wikipedia writers are not typical of
> broader
> > >> society on at least two variables: gender and level of education
> (might
> > be
> > >> others, I can't remember).
> > >>
> > >> But I think importance is a relative metric rather than  
> > >> absolute. I
> > think
> > >> by taking the mean value of importance across a number of 
> > >> WikiProjects
> > in
> > >> the preliminary experiment may have lost something because it 
> > >> tried (through averaging) to look at importance "generally". I 
> > >> would suspect conducting an experiment considering only the 
> > >> importance ratings wrt
> to
> > a
> > >> single WikiProject would be more likely to show correlation with
> > pageviews

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-26 Thread Kerry Raymond
I think you are reading my comments too negatively. I’m not saying to ignore 
pageviews or incoming links. I’m saying that a naïve look at their stats may 
not be as useful as some of the variations I mention. I think it is worth 
looking at pageviews relative to those articles in the same WikiProject. I 
think it is worth looking at inbound links but to consider two groups, those 
coming from the same WikiProject(s) and from other WikiProjects. I think the 
position of the incoming links within their source articles is also 
significant, either first sentence, first para, whole of lede, or 
absolute/relative position of the link in the article (e.g. 2000 bytes from 
start, or 40% from start).

 

The big difference between machine-assessment of article quality and article 
importance is that quality is a metric on the article but importance is a 
metric on the topic. Also, my informal observation is that article quality does 
improve and degrade over time and hence is much more dynamic than topic 
importance, which seems to me to be much more stable. So I think there is less 
scope for dramatically improving the situation by being able to determine topic 
importance than the benefits likely to be achieved from automated quality 
assessment, but there may be benefit if there are heuristics to spot the 
relatively few articles which do need  importance re-assessed due to “current 
events”. In which case “editor activity” may be a metric, particularly “editor 
activity” on the lede para or other more critical areas of the article.

 

I am not too worried about 22nd century. I think we should look more at the 
next decade. Who would have predicted the demise of Usenet? It seemed pretty 
sexy at the time, etc. Wikipedia, like many things, will pass. It’s not to say 
it will pass into oblivion but it may morph into something very different to 
what it is today. Being CC-BY-SA improves the chances that any successor can 
build on it, but maybe we should put into WMF’s constitution, “if WMF shuts 
down, we release the contents of the projects as CC0” (to increase the 
likelihood that the content has a future). Having had to shut down a number of 
research institutes when the funding ran out, I know the utter stupidity occurs 
when they retain a skeleton of staff to “sell off all our valuable IP” which 
every closing-down institution seems to wants to do and the result is that the 
IP gets wasted because it isn’t sold or it’s sold to one of those companies who 
buy IP for tuppence on the off-chance they can potentially engage in patent 
litigation (or other IP litigation) downstream. We waste so much IP with this 
kind of “make a buck” thinking. 

 

Kerry

 

From: Jane Darnell [mailto:jane...@gmail.com] 
Sent: Wednesday, 26 April 2017 5:51 PM
To: kerry.raym...@gmail.com; Research into Wikimedia content and communities 

Subject: Re: [Wiki-research-l] Project exploring automated classification of 
article importance

 

Yes I totally agree that "importance is a relative metric rather than 
absolute." I also agree that incoming links and pageviews are not accurate 
measurements of "importance" for all of the reasons you mention. However, we 
are still a project that is actively exploring the universe of knowledge, and 
leaning heavily on academia and other established sources we must "boldly go 
where no man has gone before" (and please feel free to insert "white, 
euro-centric" before the man part). So do you have any suggestions what we 
could measure going forward that would cough up some interesting stats to 
monitor? Pagewatching is useful , but problematic because these are only 
assigned at page-creation, while some marginal editor interest might be 
expanded to whole categories (speaking as someone who has thousands of pages 
watchlisted on multiple projects). I like your thoughts about looking for key 
articles such as those used as the "article as the "main" article for a 
category or as the title of a navbox ".  I am looking for similar usages of 
paintings as a way to find popular painters or paintings rather than just those 
paintings which have articles written about them (which are often written for 
totally random reasons such as theft/sale/wikiproject).

 

On Wed, Apr 26, 2017 at 5:39 AM, Kerry Raymond mailto:kerry.raym...@gmail.com> > wrote:

Just a few musings on the issue of Importance and how to research it ...

I agree it is intuitive that importance is likely to be linked to pageviews and 
inbound links but, as the preliminary experiment showed, it's probably not that 
simple.

Pageviews tells us something about importance to readers of Wikipedia, while 
inbound links tells us something about importance to writers of Wikipedia, and 
I suspect that writers are not a proxy for readers as the editor surveys 
suggest that Wikipedia writers are not typical of broader society on at least 
two variables: gender and level of 

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-25 Thread Kerry Raymond
Just a few musings on the issue of Importance and how to research it ...

I agree it is intuitive that importance is likely to be linked to pageviews and 
inbound links but, as the preliminary experiment showed, it's probably not that 
simple.

Pageviews tells us something about importance to readers of Wikipedia, while 
inbound links tells us something about importance to writers of Wikipedia, and 
I suspect that writers are not a proxy for readers as the editor surveys 
suggest that Wikipedia writers are not typical of broader society on at least 
two variables: gender and level of education (might be others, I can't 
remember).

But I think importance is a relative metric rather than  absolute. I think by 
taking the mean value of importance across a number of WikiProjects in the 
preliminary experiment may have lost something because it tried (through 
averaging) to look at importance "generally". I would suspect conducting an 
experiment considering only the importance ratings wrt to a single WikiProject 
would be more likely to show correlation with pageviews (wrt to other articles 
in that same WikiProject) and inbound links. And I think there are two kinds of 
inbound links to be considered, those coming from other articles within the 
same WikiProject and those coming from outside that Wikiproject. I suspect 
different insights will be obtained by looking at both types of inbound links 
separately rather than treating them as an aggregate. I note also that 
WikiProjects are not entirely independent of one another but have relationships 
between them. For example, The WikiProject Australian Roads describes itself as 
an "intersection" (ha ha!) of WikiProject Highways and WikiProject Australia, 
so I expect that we would find greater correlation in importance between 
related WikiProjects than between unrelated WikiProjects.

When thinking about readers and pageviews, I think we have to ask ourselves is 
there a difference between popularity and importance. Or whether popularity 
*is* importance. I sense that, as a group of educated people, those of us 
reading this research mailing list probably do think there is a difference. 
Certainly if there is no difference, then this research can stop now -- just 
judge importance by  pageviews. Let's assume a difference then. When looking at 
pageviews of an article, they are not always consistent over time. Here are the 
pageviews for Drottninggatan 

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-90&pages=Drottninggatan

Why so interesting on 8 April? A terrorist attack occurred there. This spike in 
pageviews occurs all the time when some topic is in the news (even peripherally 
as in this case where it is not the article about the terrorist attack but 
about the street in which it occurred). Did the street become more "important"? 
I think it became more interesting but not more important. So I think we do 
have to be careful to understand that pageviews probably reflect interest 
rather than importance.  I note that The Chainsmokers (a music group with a 
number of songs in the current USA music charts) gets many more Wikipedia 
article pageviews  than the Wikipedia article on Pasteurization but The 
Chainsmokers are not rated as being of high importance by the relevant 
WikiProjects while Pasteurization is very important in WikiProject Food and 
Drink. Since pasteurisation prevents a lot of deaths, I think we might agree 
that in the real world pasteurisation is more important than a music group 
regardless of what pageviews tell us.

https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-90&pages=The_Chainsmokers|Pasteurization

Of course it is matters for Wikipedia's success that our *popular* articles are 
of high quality, but I think we have be cautious about pageviews being a proxy 
for importance.

When we look at Wikipedia writers' decisions in tagging the importance of 
articles to WikiProjects, what do we find? As we know, project tags are often 
placed on new articles (and often not subsequently reviewed). So while I find 
that quality tags are often out-of-date, the importance seems to be pretty 
accurate even on a new stub articles. This is because it is the importance of 
the *topic* that is being assessed which is independent of the Wikipedia 
article itself. Provided the article is clear enough about what it is about and 
why it matters (which is the traditional content of that first paragraph or two 
and failing to provide it will likely result in speedy deletion of the new 
article), assessment of the topic's importance can be made even at new stub 
level. This tells us that importance for Wikipedia writers is determined by 
something outside of Wikipedia (probably their real-world knowledge of that 
topic space -- one assumes that project taggers are quite interested in the 
topic space of that project). While article quality

  1   2   3   >