[Wiki-research-l] Re: [Research-wmf] Enquires for conducting a project related to Wikipedia

2023-11-10 Thread WereSpielChequers
Dear Hanxuan,

Thanks for your interest in researching Wikipedia.

A few observations from a Wikipedian here.

1 Don't assume that translations are only amended by other translators.
Once an article has been published on Wikipedia it is open for editing, and
those editors may not speak the language it was translated from or use
sources in that language. I don't speak a word of Chinese, but I know I
have edited articles on English that were translated from Chinese.

2 You probably want to broaden the pool of editors you are looking at.
Latest stats show we have 70 active editors on the chinese Wikipedia
resident in Australia. I don't know how many of them if any do a
significant amount of translations from English to Chinese, but even if it
was closer to 70 than 7 I suggest you look for a larger subset of
Wikipedians to research.
https://stats.wikimedia.org/#/zh.wikipedia.org/contributing/active-editors-by-country/normal|map|last-month|(activity-level)~5..99-edits|monthly

3 More broadly, only a small proportion of Wikipedians are likely to take
part in research, and if there are only a handful who meet a particular
criteria, then anonymity gets tricky.

4 I'm not in a position to be definitive here, but I'd assume that some
Chinese Wikipedians, even in a safe country like Australia, are going to be
wary at taking part in something that might compromise their anonymity.
Chinese Australians likely have relatives back in China who they won't want
to get in trouble.  You are probably better off looking at a different
language, or at least broadening your research to "editors in the chinese
language Wikipedia who live in English speaking countries"

5 Happy to look at your question set, just email me a copy.

Jonathan

On Wed, 8 Nov 2023 at 16:50, Leila Zia  wrote:

> Dear Hanxuan,
>
> I did a quick pass on your meta page. Thank you for creating it.
> Unfortunately I will not have bandwidth to look into the survey and your
> page in more detail. However, to be very clear: this is not a blocker for
> your research. :) Others may decide to check out your survey or meta page
> and give you feedback. I do recommend that you keep an eye on the
> "Discussion" tab of your meta page as folks may leave comments there over
> time.
>
> Best,
> Leila
>
> --
> Leila Zia
> Head of Research
> Wikimedia Foundation
>
>
> On Sun, Oct 29, 2023 at 4:57 PM Hanxuan Sun 
> wrote:
>
> > Dear Leila and Zachary,
> >
> >
> >
> > I have set up my research project via Wikimedia:
> >
> https://meta.wikimedia.org/wiki/Research:A_Comparative_Mixed-Methods_Case_Study_to_Explore_the_Revision_of_English-Chinese_Translations_on_Wikipedia
> > based on your suggestions. Meanwhile, I have revised the questionnaires
> in
> > English version:
> > https://unsw.au1.qualtrics.com/jfe/form/SV_3fNVlLeMgM3BNzg and Chinese
> > version: https://unsw.au1.qualtrics.com/jfe/form/SV_0AjTai9lMOf48FE.
> >
> >
> >
> > Would you please check them at your most convenience? Thank you so much!
> >
> >
> >
> > Best,
> >
> > Hanxuan.
> >
> > *From:* Hanxuan Sun
> > *Sent:* Saturday, October 28, 2023 11:38 AM
> > *To:* Leila Zia ; Zachary Levonian  >
> > *Cc:* wiki-research-l@lists.wikimedia.org
> > *Subject:* RE: [Research-wmf] Enquires for conducting a project related
> > to Wikipedia
> >
> >
> >
> > Dear Leila and Zachary,
> >
> >
> >
> > Thank you so much for your time and detailed suggestions. Apologies for
> > not creating my project through the website page. I am working on it now
> > and hope it will work soon.
> >
> >
> >
> > For the data management section, all the private data will be stored for
> > the duration of the study on the UNSW Data Archive (RDMP ID: H0408583),
> to
> > which only the Chief Investigator, my supervisor Professor Stephen
> Doherty,
> > and the Student Investigator, Ms Hanxuan Sun, will have access. The
> Privacy
> > and Confidentiality part is described in Section 11 of the Human Research
> > Project Description, which is attached in this email. For the gender
> > question, I will revise it based on your suggestions or delete it as it
> is
> > not highly related to my research project. Then, I will revise all the
> > questionnaires after discussing with Chinese Wikipedia pump, to make sure
> > everything goes well. If you have any question, please feel free to
> contact
> > me.
> >
> >
> >
> > Have a nice weekend!
> >
> > Hanxuan.
> >
> >
> >
> > *From:* Leila Zia 
> > *Sent:* Friday, October 27, 2023 10:18 AM
> > *To:* Hanxuan Sun 
> > *Cc:* wiki-research-l@lists.wikimedia.org
> > *Subject:* Re: [Research-wmf] Enquires for conducting a project related
> > to Wikipedia
> >
> >
> >
> > 你通常不会收到来自 l...@wikimedia.org 的电子邮件。了解这一点为什么很重要
> > 
> >
> > [Moving research-wmf to Bcc.]
> >
> >
> >
> > Dear Hanxuan Sun.
> >
> >
> >
> > Thank you for reaching out.
> >
> >
> >
> > *Some tips for increasing the chances of success for your project*
> >
> >- *Reduce the chance of surprising existing 

[Wiki-research-l] Re: Generation gap widens between admins and other editors on the English Wikipedia.

2023-08-16 Thread WereSpielChequers
Probably the biggest change to the process came with the unbundling of
rollback in 2008, at least that was when the biggest drop came in RFAs, and
"good vandalfighter" ceased to be sufficient to pass RFA. You also had to
show some contribution to building the pedia. We now have over six thousand
rollbackers and less than 900 admins, so I think that unbundling did make
it easier to get Rollback.though arguably Rollback itself is now a
redundant userright as anyone can just opt in to tools like twinkle.



I wasn't around in the early years, I started editing in 2007 towards the
end of the exponential growth era and only started to pay attention to RFA
in 2008. Though I have looked at quite a few earlier RFAs.  I think that
the criteria haven't changed much in a decade - maybe there has been an
increase in the requirements for tenure and or edits, or rather someone
with 3,000 to 4,000 unautomated edits can expect a few opposes as would
someone with between one and two years active editing. What I can't explain
is why we appointed 121 new admins in 2009 but averaged less than 20 new
admins a year for the last ten years. I really don't think that the de
facto criteria for adminship are very different now compared to 2009:

There are people who care about the deletion button and don't want someone
who will be to soft or harsh with it.

There are people who care about the block button, including those who don't
want someone blocking the regulars who hasn't gone through the process of
building content.

There are people who think that all admins should be legally adult

And there are those who want to stop certain long term problems returning
in a new guise. One assumption made here is that the mask will slip if one
of those editors tries to make nice for an entire year in order to make
admin.


Given that the total size of the community is stable or slowly growing, I
don't see why so few candidates are coming forward for RFA.

WSC

On Wed, 16 Aug 2023 at 03:24, Samuel Klein  wrote:

> The iron law of gaps...
>
> On Tue, Aug 15, 2023 at 5:44 PM The Cunctator  wrote:
>
> > IMHO: The amount of jargon and legalistic booby traps to navigate now to
> > become an admin is gargantuan, and there isn't a strong investment in a
> > development ladder.
>
>
> Yes.  More generally, a shift towards a Nupedia model (elaborate seven-step
> processes, focus on quality, focus on knowing lots of precedent and not
> making mistakes, spending more time justifying actions than making them) is
> making sweeping, mopping, and bureaucracy generally more work, less fun,
> and more exclusionary.
>
> Perhaps asking everyone to adopt someone new, or sticking "provisional"
> tags on a family of palette-swap roles that are Really Truly NBD
>  We Mean It This Time, would help stave
> off the iron law in a repeatable
>  way//
>
> SJ
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wiki-research-l] Generation gap widens between admins and other editors on the English Wikipedia.

2023-08-15 Thread WereSpielChequers
Hi,

Thirteen years after I wrote about the emergence of a Wikigeneration gap
between Wikipedia's admins and its editors, I have revisited the topic,
recalculated the gap and published a new signpost article.

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-08-15/Special_report

To I expect few people's surprise, the admin cadre is still overwhelmingly
drawn from editors who started editing in Wikipedia's exponential growth
phase (2001-2007), Half from 2003-2005.

WSC
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wiki-research-l] Re: Call for participation of Wikipedians for “Ask a Wikipedian” Session

2022-03-23 Thread WereSpielChequers
Hi,

That date doesn't work for me, but if you have another session after the
6th May I may well be able to join you.

Regards

Jonathan

On Wed, 23 Mar 2022 at 14:01, Lucie Kaffee  wrote:

> Hello everyone!
>
> We are currently organising a workshop with the title “Wiki-M3L: Wikipedia
> and Multi-Modal & Multi-Lingual Research” at ICLR [1], in which we bring
> together research working on topics around Wikipedia, with a focus on
> multilingual projects as well as multi-modality (e.g., text and images). In
> this workshop, we want to foster collaboration between researchers and the
> Wikimedia community, so we allocated a session for researchers to exchange
> with Wikimedians. Therefore we are looking for participants, who are
> interested in joining us at ICLR for the workshop and would like to
> exchange with and answer some questions of researchers working on
> Wikipedia. The workshop will happen virtually on 29th of April 2022, and
> the session would take around 30 minutes from 14:15 CET. Please reach out
> to us if you are interested in participating!
>
> Cheers,
>
> Lucie and Tiziano
>
> [1] https://meta.wikimedia.org/wiki/Wiki-M3L
>
> --
> Lucie-Aimée Kaffee
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wiki-research-l] Re: How to access deleted Wikipedia articles

2021-11-08 Thread WereSpielChequers
Just to add a little further complexity.

Lots of articles and deleted articles are about people. Names are often not
unique, and just because one person with a particular name has had an
article on them deleted it does not mean that there won't be a notable
person of the same name.

For example I was once asked to restore a particular deleted article so
that someone could look at the deleted version before creating an article
on a professor who they assured me was very notable and they had plenty of
sources for. I had a look at the deleted article, and told them I doubted
there was anything there worth restoring, and to go ahead with the article
on the professor. I also added that I didn't know if the deleted article
was about the same person or a different person of the same name, but if
they found that their professor had been a pro skateboarder in his teens, I
suggested they give that its own section, and not make that his main claim
to notability or have it dominate the lede. In another instance I resolved
an edit war over whether an article  should be about either of two people
of the same name by deleting the article, restoring all the versions that
were about person A and moving them to a new clearer name, then restoring
the other revisions and moving them to a page with a name that made it
clear they were about person B, then I turned the original
battleground article into a disambiguation page that listed both people.
That would be a rare situation compared to redirects, but I hope it gives
you an idea of the complexity of Wikipedia article names over time.

When there are multiple topics with the same name the default should be
that the primary one gets the name with the secondary topics getting longer
names and a mention in a disambiguation page. There are people who get very
concerned as to which if any article should be primary, and while sometimes
that is as obvious as Dallas, Scotland v Dallas, Texas, other times that
can be contentious and even be changed over time. I can remember heated
arguments about Perth Scotland v Perth Australia and I dread to think how
the Mercury, Atlas and Apollo  decisions were made.

TLDR Names of articles don't just go through a process of deletion

WSC

On Fri, 5 Nov 2021 at 18:30, Adam Wight  wrote:

> Going back to your original question,
>
> > which articles are no longer on Wikipedia
>
> This is easy enough to query in bulk:
>
>
> https://en.wikipedia.org/w/api.php?action=query=revisions=ids=Zayn%20Malik|NonexistentPage|Draft:Kajl%C3%A2ngvoj
> 
>
> The first page exists, but the other two never existed and were deleted,
> respectively.  Both missing articles have a "missing" key in the response
> data, which you can rely on for determining if the articles exist.
>
> It sounds like this is what you needed, and maybe the inconsistencies were
> due to non-Latin character encoding issues?  Let me know if I
> misunderstood, and you also need to know whether the page used to exist but
> was deleted.
>
> Regards,
> Adam W.
>
> On Fri, Nov 5, 2021 at 7:12 PM D Z  wrote:
>
> > I am still unclear on how to know definitely for sure that an article was
> > deleted.  It seems like the only way is to tell through the comments. For
> > example, this call:
> >
> >
> https://en.wikipedia.org/w/api.php?action=query=logevents=delete/delete=Zayn%20Malik
> > shows the comment "[[Wikipedia:Articles for deletion/Louis Tomlinson]]"
> > which I have noticed to exist for other articles that were successfully
> > deleted, but the article "Zayn Malik" exists. The  most recent event has
> > the comment
> > "[[WP:CSD#G6|G6]]: Deleted to make way for move" which would imply the
> > other deletions weren't successful but the article still exists.
> >
> > Thanks,
> >
> > Doris
> >
> > On Thu, Nov 4, 2021 at 3:20 AM Adam Wight 
> wrote:
> >
> > > On 11/4/21 8:09 AM, D Z wrote:
> > >
> > > > Hi Adam,
> > > >
> > > > Thanks for your reply. The qitem api returns missing for this article
> > but
> > > > the article exists:
> > > >
> > > >
> > >
> >
> https://www.wikidata.org/w/api.php?action=wbgetentities=json=eswiki=Playas%20de%20L%C2%B4Atalaya%20y%20Focar%C3%B3n=1
> > > >
> > > > The Wikipedia page link
> > > > 
> > is
> > > > here.
> > >
> > > It seems that the issue is the apostrophe after "L", in the wikidata
> > > query it is "´" and the wikipedia link above uses "'".  Maybe something
> > > in your query script is normalizing the fancy apostrophe to a simple
> > > one?  I would check for proper UTF-8 handling.
> > >
> > > > Would you know if there is a way to input article revision ID or
> pageid
> > > > instead of source title for the logevents API? The strings seem to be
> > > > problematic at times.
> > >
> > > This was prescient :-).  But I don't see any record of the article
> being
> > > deleted, 

[Wiki-research-l] Re: How to access deleted Wikipedia articles

2021-10-27 Thread WereSpielChequers
Hi Doris,

If you look at the links for some examples you may find that much of the
difference is articles that are redirected. Sometimes redirected and merged.

Non notable fictional characters are usually redirected to the article on
the film or book they come from. If you look at the history of the redirect
you may well find an article.

WSC

On Wed, 27 Oct 2021 at 15:50, Adam Wight  wrote:

> The "logevents" API should return the same data as Special:Log. For
> example,
>
>
> https://en.wikipedia.org/w/api.php?action=query=logevents=Category:Recipients%20of%20the%20Order%20of%20the%20Tower%20and%20Sword
>
> This can be filtered further to just delete events, and so on.
>
> But if you only want to know whether an article exists or not, "missing"
> should be accurate.  Can you share some example URLs for which the page
> exists, but the API returns "missing"?
>
> Kind regards,
> Adam W.
>
> On 10/27/21 3:40 AM, D Z wrote:
> > Hello All,
> >
> > I am doing research investigating the role of machine translation in
> > Wikipedia articles. I am having trouble with how to know if an article
> has
> > been deleted from Wikipedia. Specifically, I am getting a list of
> articles
> > from the cxtranslation list and I would like to know which articles are
> no
> > longer on Wikipedia. I see that there is the deletion log form
> >  but is there an API
> or
> > some way to access something like this form so I could check if a mass
> > amount of articles have been deleted?
> >
> > I have used the Media Wiki API  to
> get
> > articles and the API returns missing for some articles, but this does not
> > seem to be fully accurate for determining if an article has been deleted
> > because the API has returned 'missing' for articles that do exist.
> >
> > To summarize, my main question is: given an article language edition and
> > article title, or an article pageid, is there an API to check if the
> > article has been deleted?
> >
> > Any help would be greatly appreciated!
> >
> > Thanks,
> >
> > Doris Zhou
> > ___
> > Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> > To unsubscribe send an email to
> wiki-research-l-le...@lists.wikimedia.org
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


[Wiki-research-l] Re: Negative views of Wikipedia in schools [was: Re: Wiki-research-l Digest, Vol 193, Issue 5

2021-09-16 Thread WereSpielChequers
Dear Mathieu,

This comes up frequently in outreach events, especially to academia.

The first point to get across is that Wikipedia is a General Interest
encyclopaedia, a tertiary source compiled from primary and secondary
sources. Anyone studying a subject at university is expected to have much
more than a general interest in that subject. It isn't new that some
students have to be encouraged to read the reading list..

The second is that Wikipedia has been improving in quality for some
time, and some people who assessed its quality in the very early years
might find themselves pleasantly surprised if they take another look at it.
Some of the studies still cited about Wikipedia are as old as 2008, and a
study from 2008 is likely to be based on data from 2007. Four fifths of all
the edits to the English Language Wikipedia have been since March 2008.

This last is especially true for people who made up their mind about
Wikipedia in the very earliest years when the priority was to achieve
quantity and inline citations were rare.

Jonathan

.

On Wed, 15 Sept 2021 at 09:25, Mathieu O'Neil 
wrote:

> Hi everyone
>
> Apologies if this has been covered previously on the list. I was inspired
> to write by the reference in the post below to the Wiki Ed Program.
>
> I am about to launch with an education scholar colleague a funded research
> project aiming to develop fact-checking techniques with Y5, Y6 and Y7
> schoolchildren in three Canberra schools (Australian Capital Territory). We
> are basing our approach to fact-checking on concepts developed by education
> scholars in the US such as "civic online reasoning" and "lateral reading":
> look away from the (potentially dubious) content; check the source. The
> easiest and most effective way to "check the source" is to look at a
> Wikipedia entry and check the reference list.
>
> In parallel, I am convening a first-year communication course on media
> literacy at the University of Canberra with 140+ students. A couple of
> weeks ago we did a group activity on Wikipedia, where students were asked
> to review and discuss a Wiki Ed Program / Wikimedia brochure ("Instructor
> Basics: How to use Wikipedia as a teaching tool") which clearly outlines
> editorial and behavioral policies such as NPOV, Reliable Sources, Assume
> Good Faith, etc.
>
> We then asked whether any prior assumptions had been challenged. It became
> clear that when they were in high-school, these students had been
> forcefully and repeatedly instructed by their teachers to NEVER use
> Wikipedia ("unreliable"). After completing the activity, students
> overwhelmingly expressed amazement about the existence of quality controls
> on Wikipedia and said their opinion of its reliability had changed.
>
> We also have anecdotal evidence that primary and secondary school teachers
> hold similar negative opinions about WP.
>
> It would be helpful for us to find out if this negative image is specific
> to the Canberra education system, or has been encountered elsewhere. To
> that end, I would very much appreciate it if anyone could point me to any
> studies or projects which explore this issue, or who could share their
> experiences of how teachers perceive Wikipedia.
>
> If you want to get in touch off-list I usually respond quickest to email
> sent at my primary address: mathieu.on...@canberra.edu.au
>
> Many thanks!
> Mathieu
>
>
>
> 
> From: wiki-research-l-requ...@lists.wikimedia.org <
> wiki-research-l-requ...@lists.wikimedia.org>
> Sent: Tuesday, September 14, 2021 22:01
> To: wiki-research-l@lists.wikimedia.org <
> wiki-research-l@lists.wikimedia.org>
> Subject: Wiki-research-l Digest, Vol 193, Issue 5
>
> Send Wiki-research-l mailing list submissions to
> wiki-research-l@lists.wikimedia.org
>
> To subscribe or unsubscribe, please visit
>
> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.wikimedia.org%2Fpostorius%2Flists%2Fwiki-research-l.lists.wikimedia.org%2Fdata=04%7C01%7Cmathieu.oneil%40anu.edu.au%7C4451361d2fd74302eef008d9a902%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C637672179490869770%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=oalgoACidjlDoO9m6Ku5OMm4RGiKGJT6bDNcmi%2FvDb8%3Dreserved=0
>
> You can reach the person managing the list at
> wiki-research-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
> Today's Topics:
>
>1. Re: [Wikimedia Research Showcase] September 15, 2021: Socialization
> on Wikipedia
>   (Janna Layton)
>
>
> --
>
> Message: 1
> Date: Mon, 13 Sep 2021 12:44:20 -0700
> From: Janna Layton 
> Subject: [Wiki-research-l] Re: [Wikimedia Research Showcase] September
> 15, 2021: Socialization on Wikipedia
> To: analyt...@lists.wikimedia.org,
> 

[Wiki-research-l] Re: Edit Summary Stats / Research?

2021-08-03 Thread WereSpielChequers
Dear Isaac,

I'm not aware of any research on this. But there are a couple of common
assumptions that you could check as part of any research.


   1. One of the reasons why any suggestion that we make edit summaries
   compulsory is that as long as they are optional, blank edit summaries are a
   great way to identify vandals.
   2. There is also a certain amount of "sneaky vandalism" denoted by edits
   that get reverted or reverted and the perpetrators get warned for vandalism
   or blocked as a "vandalism only account"
   3. Though we admins have the technology to blank people's edit summaries
   it is very rarely used




 Regards
Jonathan

On Tue, 3 Aug 2021 at 16:20, Isaac Johnson  wrote:

> Does anyone know of any research or statistics around edit summary
>  usage on Wikipedia? All
> I
> could find in a quick scan was some statistics from 2010 (
> https://meta.wikimedia.org/wiki/Usage_of_edit_summary_on_Wikipedia). I'm
> curious if anyone has more updated statistics, or, even better: a more
> thorough analysis of how edit summaries are used by editors -- i.e. how
> complete they are, to what degree they represent the "what" vs. the "why",
> how often they are misleading, etc.
>
> Best,
> Isaac
>
> --
> Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


Re: [Wiki-research-l] effects of vandalism and abuse on editors and readers

2021-01-20 Thread WereSpielChequers
Hi Aaron,

I would be very interested in that. In particular re flagged revisions as
used on the German language Wikipedia (DE) and I think some other wikis. DE
has been one of the shrinking communities, which could of course be
something unconnected if it is more related to the ratio of tablet to PC
users (Wikipedia being near uneditable on the mobile platform). If the
Portuguese and potentially other wikes are going to drop IP editing then
that also is likely to have an effect on vandalism that would be worthwhile
researching.

WSC

On Tue, 19 Jan 2021 at 19:43, Aaron Halfaker 
wrote:

> +1 WSC.   When I thought about replicating it, I expected to see a dramatic
> decline in the impact of vandalism with the advent of counter-vandalism
> tools and abuse filter.
>
> It would be interesting to see that on a cross-wiki basis as different
> wikis employ different strategies (or seemingly none at all) for
> counter-vandalism over time.
>
> On Tue, Jan 19, 2021 at 10:58 AM WereSpielChequers <
> werespielchequ...@gmail.com> wrote:
>
> > Hi Aaron,
> >
> > That was an interesting read and a bit of a time capsule. 2002-2006 is a
> > bit before I started editing Wikipedia. Before many of the tools such as
> > huggle that give vandalfighters such an advantage over vandals, I think
> > before the era of bot reversion of vandalism when vandalism had to be
> > reverted by humans rather than computers, and certainly before the edit
> > filters that prevent much, possibly most vandalism from even being saved.
> > It also seems to predate the whole panoply of page protection that stops
> > vandals even editing many common vandalism targets (they do say that
> every
> > single article is available for anyone to edit).
> >
> > It would be interesting to see a study now when recent changes patrollers
> > boast of the times they have got to some vandalism faster than Cluebot.
> >
> > I know there were predictions in the early years that eventually the
> tidal
> > wave of vandalism would overwhelm the defenders of the wiki, that study
> > seems to have been part of that. I wonder if anyone in 2004 predicted
> that
> > we would get to the current situation where adolescent vandalism has
> turned
> > out to be so predictable that dealing with it has been mostly automated
> and
> > now we are more worried about spam than vandalism.
> >
> > WSC
> >
> > On Mon, 18 Jan 2021 at 23:52, Aaron Halfaker 
> > wrote:
> >
> > > See page 7 of Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K.,
> > > Terveen, L., & Riedl, J. (2007, November). Creating, destroying, and
> > > restoring value in Wikipedia. In *Proceedings of the 2007 international
> > ACM
> > > conference on Supporting group work* (pp. 259-268).
> > > http://reidster.net/pubs/group282-priedhorsky.pdf
> > >
> > > They discuss the probability of a page view of Wikipedia containing
> > > vandalism rising over time.  I wanted to replicate this analysis and
> > extend
> > > it past 2007 but I never got the chance.  I think the methodology is
> > really
> > > interesting though.
> > >
> > > It doesn't directly answer the question but it does get at the *impact*
> > of
> > > vandalism.
> > >
> > > On Mon, Jan 18, 2021 at 12:13 PM Isaac Johnson 
> > > wrote:
> > >
> > > > To WSC's point about the difficulty of detecting such behavior or
> > > surveying
> > > > at a point in which it would still be salient, I'd add that in
> general
> > we
> > > > have a large gap in our knowledge about why people choose to stop
> > editing
> > > > because almost all of our survey mechanisms depend on existing
> > logged-in
> > > > usage of the wikis. This is a challenge with many other websites too
> > but
> > > > it's generally easier to find and survey who, for instance, has left
> > > > Facebook (example
> > > > <
> > > >
> > >
> >
> http://socialmedia.soc.northwestern.edu/wp-content/uploads/2013/05/CHI2013-FBLL.pdf
> > > > >)
> > > > by collecting a random sample of people than it is to find and survey
> > > > someone who was a former editor of Wikipedia. There were surveys that
> > did
> > > > ask about major barriers to editing (which presumably contribute to
> > > > burnout) such as the 2012 survey:
> > > >
> > > >
> > >
> >
> https://upload.wikimedia.org/wikipedia/commons/8/81/Editor_Survey_2012_-_Wikipedia_editing_experience.pdf#

Re: [Wiki-research-l] effects of vandalism and abuse on editors and readers

2021-01-19 Thread WereSpielChequers
ity_Insights_2020_Report/Thriving_Movement#Safe_and_Secure_Spaces
> >   - 2015 Harassment Survey:
> >   https://meta.wikimedia.org/wiki/Research:Harassment_survey_2015
> >- The body of work around barriers to newcomers might have some good
> >insights too -- e.g.,
> >
> >
> https://www-users.cs.umn.edu/~halfaker/publications/The_Rise_and_Decline/
> >
> >
> > On Sun, Jan 17, 2021 at 5:44 AM WereSpielChequers <
> > werespielchequ...@gmail.com> wrote:
> >
> > > Hi Amir,
> > >
> > > This is one of those areas of research where we really need the annual
> > > editor survey. I think it ran once after the 2009/10 Strategy process,
> > and
> > > I don't know if the best questions got included.
> > >
> > > But the best  time to ask editors what prompted them to  start editing
> > has
> > > to be fairly soon after they started as memories fade. I once went back
> > to
> > > my early edits and the edit I remembered starting me editing barely
> made
> > it
> > > into my first 50.
> > >
> > > There is a longstanding theory that a lot of new editors start or
> started
> > > to fix some vandalism that they saw, and that this group went into
> steep
> > > decline a decade ago with the rise of Cluebot and other antivandalism
> > tools
> > > that work faster than a newbie could. But without an annual survey to
> ask
> > > editors what prompted them to edit you are going to struggle to
> research
> > > this. Of course you could look at the early logged in edits of
> > > active/prolific wikipedians, but if it is true that many/most
> Wikipedians
> > > start with some IP edits, the earliest edits of many Wikipedians won't
> be
> > > available.
> > >
> > > Abuse one assumes has a differential effect on the targets of abuse,
> > > disproportionately women, gays and ethnic minorities. But I'd be
> inclined
> > > to look at stuff targeted at their user and usertalkpages rather than
> > > talkpages and edit summaries, though an email survey of former editors
> > > would be useful.
> > >
> > > My suspicion is that when we revert, block and maybe even revdel or
> > > oversight abuse we assume that fixes the problem, and if we want to
> > tackle
> > > abuse we need more edit filters to prevent such abuse from going live.
> > >
> > > WSC
> > >
> > > On Sat, 16 Jan 2021 at 15:16, Amir E. Aharoni <
> > > amir.ahar...@mail.huji.ac.il>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there any research about the effect of vandalism in wiki content
> > pages
> > > > on readers, experienced editors, and new and potential editors?
> > > >
> > > > And of abuse in discussion pages and edit summaries on experienced
> > > editors
> > > > and new and potential editors?
> > > >
> > > > Intuitively and anecdotally one could think of the following:
> > > > 1. Vandalism in content pages (articles) wastes editors' and
> > patrollers'
> > > > time. This (probably) doesn't require proof (or does it?). But some
> > > people
> > > > say it also causes some experienced editors to burn out and leave. Is
> > > there
> > > > any data about it, beyond intuition?
> > > >
> > > > 2. Does vandalism *measurably* affect the perception of the wikis'
> > > > reliability? (This may be wildly different in different languages and
> > > > wikis.)
> > > >
> > > > 3. Abusive language on discussion pages and edit summaries affects
> > > editors,
> > > > and may cause them to reduce their editing, to stop editing about
> > certain
> > > > topics, or to leave the wiki entirely. Is this effect measurable? How
> > > does
> > > > it differ for various groups by gender, age, religion, country,
> > > > professional and educational background, seniority at the wiki, etc.?
> > > >
> > > > Thanks! :)
> > > >
> > > > --
> > > > Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> > > > http://aharoni.wordpress.com
> > > > ‪“We're living in pieces,
> > > > I want to live in peace.” – T. Moore‬
> > > > ___
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> >
> >
> > --
> > Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] effects of vandalism and abuse on editors and readers

2021-01-19 Thread WereSpielChequers
I'm sure there has been a survey of former editors done using the email
this user function (as I remember it one of the more common responses was I
haven't left yet). However this would not be a great way to survey re
harassment as harassed people are more likely to close an email account or
disable the email future.

As for how many readers saw vandalism in the era before edit filters etc,
it didn't need to be many readers who saw it, and many of those to remove
it for this to be an important way to recruit editors. We have such a huge
imbalance between readers and editors that even if only 1% of readers saw
vandalism and only 1% of those fixed it, that would still be an extra
hundred editors for every million readers.

WSC

On Mon, 18 Jan 2021 at 20:13, Isaac Johnson  wrote:

> To WSC's point about the difficulty of detecting such behavior or surveying
> at a point in which it would still be salient, I'd add that in general we
> have a large gap in our knowledge about why people choose to stop editing
> because almost all of our survey mechanisms depend on existing logged-in
> usage of the wikis. This is a challenge with many other websites too but
> it's generally easier to find and survey who, for instance, has left
> Facebook (example
> <
> http://socialmedia.soc.northwestern.edu/wp-content/uploads/2013/05/CHI2013-FBLL.pdf
> >)
> by collecting a random sample of people than it is to find and survey
> someone who was a former editor of Wikipedia. There were surveys that did
> ask about major barriers to editing (which presumably contribute to
> burnout) such as the 2012 survey:
>
> https://upload.wikimedia.org/wikipedia/commons/8/81/Editor_Survey_2012_-_Wikipedia_editing_experience.pdf#page=17
> (see the editor survey category
> <https://meta.wikimedia.org/wiki/Category:Editor_surveys> if you're
> looking
> for others)
>
> Some things that come to mind though:
>
>- I suspect very few readers see vandalism in their daily browsing (as a
>very frequent, long-term reader of English Wikipedia, I have trouble
>recalling encountering any clear vandalism in the course of normal
>reading). That said, I do suspect that most people have seen plenty of
>stories of outlandish vandalism to Wikipedia -- some legitimate but many
>more about vandalism that literally lasted minutes -- that may lead to
>lower trust. Whether or not lower trust in Wikipedia leads to lower
>readership is a separate question though. Jonathan Morgan ran some
> recent
>surveys on reader trust and what factors affected it that might be
>relevant:
>
> https://meta.wikimedia.org/wiki/Research:The_role_of_citations_in_how_readers_evaluate_Wikipedia_articles#Second_round_survey
>- Specifically in the context of harassment and gender equity:
>   - Harassment as barrier:
>
> https://meta.wikimedia.org/wiki/Gender_equity_report_2018/Barriers_to_equity
>   - Edit summaries in particular as harassment:
>   https://www.elizabethwhittaker.net/wmf-internship (more details
>   <
> https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#July_2019>
>   )
>   - Annual Community Insights Reports often have a section on this --
>   e.g.,
>
> https://meta.wikimedia.org/wiki/Community_Insights/Community_Insights_2020_Report/Thriving_Movement#Safe_and_Secure_Spaces
>   - 2015 Harassment Survey:
>   https://meta.wikimedia.org/wiki/Research:Harassment_survey_2015
>- The body of work around barriers to newcomers might have some good
>insights too -- e.g.,
>
> https://www-users.cs.umn.edu/~halfaker/publications/The_Rise_and_Decline/
>
>
> On Sun, Jan 17, 2021 at 5:44 AM WereSpielChequers <
> werespielchequ...@gmail.com> wrote:
>
> > Hi Amir,
> >
> > This is one of those areas of research where we really need the annual
> > editor survey. I think it ran once after the 2009/10 Strategy process,
> and
> > I don't know if the best questions got included.
> >
> > But the best  time to ask editors what prompted them to  start editing
> has
> > to be fairly soon after they started as memories fade. I once went back
> to
> > my early edits and the edit I remembered starting me editing barely made
> it
> > into my first 50.
> >
> > There is a longstanding theory that a lot of new editors start or started
> > to fix some vandalism that they saw, and that this group went into steep
> > decline a decade ago with the rise of Cluebot and other antivandalism
> tools
> > that work faster than a newbie could. But without an annual survey to ask
> > editors what prompted them to edit you are going to struggle to research
> > this. Of course you could look at the early logged in e

Re: [Wiki-research-l] effects of vandalism and abuse on editors and readers

2021-01-17 Thread WereSpielChequers
Hi Amir,

This is one of those areas of research where we really need the annual
editor survey. I think it ran once after the 2009/10 Strategy process, and
I don't know if the best questions got included.

But the best  time to ask editors what prompted them to  start editing has
to be fairly soon after they started as memories fade. I once went back to
my early edits and the edit I remembered starting me editing barely made it
into my first 50.

There is a longstanding theory that a lot of new editors start or started
to fix some vandalism that they saw, and that this group went into steep
decline a decade ago with the rise of Cluebot and other antivandalism tools
that work faster than a newbie could. But without an annual survey to ask
editors what prompted them to edit you are going to struggle to research
this. Of course you could look at the early logged in edits of
active/prolific wikipedians, but if it is true that many/most Wikipedians
start with some IP edits, the earliest edits of many Wikipedians won't be
available.

Abuse one assumes has a differential effect on the targets of abuse,
disproportionately women, gays and ethnic minorities. But I'd be inclined
to look at stuff targeted at their user and usertalkpages rather than
talkpages and edit summaries, though an email survey of former editors
would be useful.

My suspicion is that when we revert, block and maybe even revdel or
oversight abuse we assume that fixes the problem, and if we want to tackle
abuse we need more edit filters to prevent such abuse from going live.

WSC

On Sat, 16 Jan 2021 at 15:16, Amir E. Aharoni 
wrote:

> Hi,
>
> Is there any research about the effect of vandalism in wiki content pages
> on readers, experienced editors, and new and potential editors?
>
> And of abuse in discussion pages and edit summaries on experienced editors
> and new and potential editors?
>
> Intuitively and anecdotally one could think of the following:
> 1. Vandalism in content pages (articles) wastes editors' and patrollers'
> time. This (probably) doesn't require proof (or does it?). But some people
> say it also causes some experienced editors to burn out and leave. Is there
> any data about it, beyond intuition?
>
> 2. Does vandalism *measurably* affect the perception of the wikis'
> reliability? (This may be wildly different in different languages and
> wikis.)
>
> 3. Abusive language on discussion pages and edit summaries affects editors,
> and may cause them to reduce their editing, to stop editing about certain
> topics, or to leave the wiki entirely. Is this effect measurable? How does
> it differ for various groups by gender, age, religion, country,
> professional and educational background, seniority at the wiki, etc.?
>
> Thanks! :)
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How to quantifying "effort" or "time spent" put into articles?

2020-10-21 Thread WereSpielChequers
Johan makes an important point about the adding of references. I'd just add
that offline references generally take more time than online ones. That
time might be time you'd have spent anyway, whether you subscribe to a
particular magazine or would have read that book to stay up with your area
of expertise. But it is generally more time consuming than adding content
with an online cite.

At the other end of the scale, edits marked as AWB as most of mine are, are
edits where you usually only see the paragraph that you are about to
change. Hence AWB edits often run to several per minute. But you can
easily assume that when someone saves over 60 edits in an hour they are
averaging less than a minute on each of them. Someone saving an edit every
couple of hours may be coming in from the garden during each rain shower,
or they may be working solidly on Wikipedia through that time, and you can
at best put an estimate on that.

Jonathan

On Tue, 20 Oct 2020 at 20:37, Johan Jönsson  wrote:

> A few comments from an editing perspective, in case anything here is
> useful:
>
> I think Levenshtein distance might be a useful concept here, given the
> indication that I've read through and made some sort of decision around a
> whole article or a significant part of an article – both for additions and
> subtractions.
>
> When it comes to article content, the most important signifier of effort
> spent on an edit beyond text length that comes to mind is whether a new ref
> tag is added. If I'm referencing something, there's a fair chance that I've
> not only identified a shortage or deficiency, but potentially spent time
> both finding a source and reading through it to be able to reference it,
> even if it results in a short sentence.
>
> In some languages, translations of other Wikipedia articles are common;
> there might be a big difference between adding the same type of content
> translated from another language version and writing it from scratch.
>
> //Johan Jönsson
> --
>
> Den tis 20 okt. 2020 kl 20:32 skrev Nate E TeBlunthuis :
>
> > Greetings!
> >
> > Quantifying effort is obviously a fraught prospect, but Geiger and
> > Halfaker [1] used edit sessions defined as consecutive edits by an editor
> > without a gap longer than an hour to quantify the total number of labor
> > hours spent on Wikipedia.  I'm familiar with other papers that use this
> > approach to measure things like editor experience.
> >
> > I'm curious about the amount of effort put into each particular article.
> > Edit sessions seem like a good approach, but there are some problems:
> >
> >   *   How much time does an edit session of length 1 take?
> >   *   Should article edit sessions be consecutive in the same article?
> >   *   What if someone makes an edit to related article in the middle of
> > their session?
> >
> > I wonder what folks here think about alternatives for quantifying effort
> > to an article like
> >
> >   1.  Number of wikitext characters added/removed
> >   2.  Levenshtein (edit) distance (of characters or tokens)
> >   3.  Simply the number of edits
> >
> > Thanks for your help!
> >
> > [1] Geiger, R. S., & Halfaker, A. (2013). Using edit sessions to measure
> > participation in Wikipedia. Proceedings of the 2013 Conference on
> Computer
> > Supported Cooperative Work, 861–870.
> > http://dl.acm.org/citation.cfm?id=2441873
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] WikiHist.html: English Wikipedia's Full Revision History in HTML Format

2020-09-11 Thread WereSpielChequers
I wouldn't use the phrase "Wikipedia’s deliberate policy of permanently
deleting the
entire history of deleted pages". Quite a few "deleted" pages do actually
get restored, and depending on the deletion process it can be quite easy to
get much deleted content back. Especially if someone volunteers to
reference an unreferenced page or a budding footballer actually gets to
play at professional or international level, or indeed a political
candidate is elected. Almost all "deleted" content still exists and could
be restored by a volunteer admin in the right circumstances. However
Wikipedia's deletion processes are more than a little complex, many
articles have incomplete histories because admins have revision deleted
particular revisions that include copyright violations and or some really
libellous stuff. Some of the really nasty stuff gets "oversighted" - those
revisions are not even visible to administrators.

There is also the issue that some of the earliest material is not
available. stats on admin actions only go back to December 2004, and while
there is some content from before then, I am not sure if all the stuff
deleted before then is available.

Regards

WSC

On Fri, 11 Sep 2020 at 10:22, Federico Leva (Nemo) 
wrote:

> Robert West, 11/09/20 11:29:
> > local instances of MediaWiki,
> > enhanced with the capacity of correct historical macro expansion.
>
> Interesting. I see this doesn't include deleted templates. Have you
> considered using historical dumps?
>
> «We emphasize that the limitation of deleted pages, tem- plates, and
> modules is not introduced by our parsing process. Rather, it is
> inherited from Wikipedia’s deliberate policy of permanently deleting the
> entire history of deleted pages.»
>
> A relevant task is
> https://phabricator.wikimedia.org/T2851
>
> See also the various discussions about Memento, like
> https://phabricator.wikimedia.org/T164654
>
> Federico
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wiki-research-l Digest, Vol 179, Issue 7

2020-07-13 Thread WereSpielChequers
Dear MacKenzie,

One other thing re those AFD logs, AFD is only one of our deletion
processes, and possibly not the most important one for your purposes.

AFD is a seven day process for articles that are expected to be contentious
deletions. We have other processes that are supposed to be for less
contentious subjects, I suspect that many, possibly most of the articles on
academics that get deleted are via the speedy deletion process, in
particular by code A7 "no credible assertion of importance or
significance". I have seen people argue that merely being a university
professor is not a credible assertion of importance or significance, and
less contentiously (in Wikipedia terms) if the only claim of significance
in an article is that someone is an assistant professor, then I would not
be surprised if the article was deleted per A7.

One problem with articles being deleted per A7 is that unless you can look
at the article you usually don't know whether the article was about the
academic you were interested in or the adolescent "pro skateboarder" of the
same name. Prod and BLPProd will be similar, though I doubt many articles
about academics will have been deleted per BLPProd as that is for
completely unsourced biographies of living people.

Regards

WereSpielChequers

On Sun, 12 Jul 2020 at 01:32, Mackenzie Lemieux 
wrote:

> Thank you everyone for your comments and suggestions on the topic of
> gaining access to deleted articles! I will reach out to
> le...@wikimedia.org
> to inquire about researcher status.
>
> I have one more question, do any of you know if there is a way to look at
> the entire history of this page?
>
> https://en.m.wikipedia.org/wiki/Wikipedia:WikiProject_Deletion_sorting/Academics_and_educators
> If
> I am unable to gain access to deleted articles, I figured I could try to
> parse this page for data on factors leading to article flagging for
> deletion, but I would need to go back in time longer than one month as the
> page currently only goes back to June 18th.
>
> let me know!
>
> Warmly,
> Mackenzie Lemieux
>
> On Sat, Jul 11, 2020 at 5:00 AM <
> wiki-research-l-requ...@lists.wikimedia.org>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l@lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-requ...@lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-ow...@lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >1. Re: Requesting access to deleted pages forresearch purposes
> >   (WereSpielChequers)
> >2. CICM 2020,    July 26-31: Call for Online Participation
> >   (Serge Autexier)
> >
> >
> > --
> >
> > Message: 1
> > Date: Fri, 10 Jul 2020 16:05:07 +0100
> > From: WereSpielChequers 
> > To: Research into Wikimedia content and communities
> > 
> > Subject: Re: [Wiki-research-l] Requesting access to deleted pages for
> > research purposes
> > Message-ID:
> > <
> > caaanwp26z5c_yjsragonnd9ylh+l+ewyicfyuin-dd2_t37...@mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Mackenie,
> >
> > You may be correct in either or both of your hypotheses, but you might
> also
> > want to check out two other related ones.
> >
> > 1 Some academic institutions may have an element of misogyny in their HR
> > policies, leading to such situations as an academic becoming notable for
> > their work to the point where they merit a Wikipedia article, before they
> > become a full professor.
> >
> > 2 In Wikipedia's drive to address the gender skew in our content, we may
> > have some editors creating articles on women who don't yet meet our
> > notability criteria. Such articles are of course highly likely to be
> > deleted.
> >
> > There is another way to approach this, check primary and secondary
> sources
> > to see how Wikipedia compares against them. For example, we have articles
> > on every female Fellow of the Royal Society, and we achieved that almost
> a
> > decade ago. I don't know if we yet have articles on all the blokes..  I
> > expect we have articles on every Nobel Prize Win

Re: [Wiki-research-l] Wiki-research-l Digest, Vol 179, Issue 7

2020-07-12 Thread WereSpielChequers
Hi Mackenzie,

I've looked at the logs and nothing has been deleted from the history of
that page.

But rather than wade through the history, you might want to go to the
archives, they are linked from the top of the page,and  they go back to
2007, so they cover two thirds of the life of Wikipeda, and the era when
four fifths of all the edits Wikipedai has ever had have taken place.

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Deletion_sorting/Academics_and_educators/archiveand


https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Deletion_sorting/Academics_and_educators/archive_2
Regards

Jonathan

On Sun, 12 Jul 2020 at 01:32, Mackenzie Lemieux 
wrote:

> Thank you everyone for your comments and suggestions on the topic of
> gaining access to deleted articles! I will reach out to
> le...@wikimedia.org
> to inquire about researcher status.
>
> I have one more question, do any of you know if there is a way to look at
> the entire history of this page?
>
> https://en.m.wikipedia.org/wiki/Wikipedia:WikiProject_Deletion_sorting/Academics_and_educators
> If
> I am unable to gain access to deleted articles, I figured I could try to
> parse this page for data on factors leading to article flagging for
> deletion, but I would need to go back in time longer than one month as the
> page currently only goes back to June 18th.
>
> let me know!
>
> Warmly,
> Mackenzie Lemieux
>
> On Sat, Jul 11, 2020 at 5:00 AM <
> wiki-research-l-requ...@lists.wikimedia.org>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> > wiki-research-l@lists.wikimedia.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> > wiki-research-l-requ...@lists.wikimedia.org
> >
> > You can reach the person managing the list at
> > wiki-research-l-ow...@lists.wikimedia.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >1. Re: Requesting access to deleted pages forresearch purposes
> >   (WereSpielChequers)
> >2. CICM 2020,July 26-31: Call for Online Participation
> >   (Serge Autexier)
> >
> >
> > --
> >
> > Message: 1
> > Date: Fri, 10 Jul 2020 16:05:07 +0100
> > From: WereSpielChequers 
> > To: Research into Wikimedia content and communities
> > 
> > Subject: Re: [Wiki-research-l] Requesting access to deleted pages for
> > research purposes
> > Message-ID:
> > <
> > caaanwp26z5c_yjsragonnd9ylh+l+ewyicfyuin-dd2_t37...@mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Mackenie,
> >
> > You may be correct in either or both of your hypotheses, but you might
> also
> > want to check out two other related ones.
> >
> > 1 Some academic institutions may have an element of misogyny in their HR
> > policies, leading to such situations as an academic becoming notable for
> > their work to the point where they merit a Wikipedia article, before they
> > become a full professor.
> >
> > 2 In Wikipedia's drive to address the gender skew in our content, we may
> > have some editors creating articles on women who don't yet meet our
> > notability criteria. Such articles are of course highly likely to be
> > deleted.
> >
> > There is another way to approach this, check primary and secondary
> sources
> > to see how Wikipedia compares against them. For example, we have articles
> > on every female Fellow of the Royal Society, and we achieved that almost
> a
> > decade ago. I don't know if we yet have articles on all the blokes..  I
> > expect we have articles on every Nobel Prize Winner by now, but there
> will
> > be less well known awards and lists of people in STEM.
> >
> > One problem in looking at deletion discussions is that they don't always
> > say what the person is known for, and so you can have confusion between
> > multiple people of the same name. I was once asked to restore a deleted
> > article so that someone could look at what was there and see if they
> could
> > make a clearer case re the notability of that eminent diplomat. After
> > looking at the deleted article, I told them not to start from the deleted
> > bit, and if it was the same person, to emphasise their subsequent car

Re: [Wiki-research-l] Requesting access to deleted pages for research purposes

2020-07-10 Thread WereSpielChequers
Hi Mackenie,

You may be correct in either or both of your hypotheses, but you might also
want to check out two other related ones.

1 Some academic institutions may have an element of misogyny in their HR
policies, leading to such situations as an academic becoming notable for
their work to the point where they merit a Wikipedia article, before they
become a full professor.

2 In Wikipedia's drive to address the gender skew in our content, we may
have some editors creating articles on women who don't yet meet our
notability criteria. Such articles are of course highly likely to be
deleted.

There is another way to approach this, check primary and secondary sources
to see how Wikipedia compares against them. For example, we have articles
on every female Fellow of the Royal Society, and we achieved that almost a
decade ago. I don't know if we yet have articles on all the blokes..  I
expect we have articles on every Nobel Prize Winner by now, but there will
be less well known awards and lists of people in STEM.

One problem in looking at deletion discussions is that they don't always
say what the person is known for, and so you can have confusion between
multiple people of the same name. I was once asked to restore a deleted
article so that someone could look at what was there and see if they could
make a clearer case re the notability of that eminent diplomat. After
looking at the deleted article, I told them not to start from the deleted
bit, and if it was the same person, to emphasise their subsequent career as
a diplomat, rather than their adolescent career as a "pro skateboarder".
So in order to find the articles on deleted female scientists, you either
need a list of deleted female scientists, or to check a lot of other
articles to find which are scientists.

Hope that's useful

WSC

On Fri, 10 Jul 2020 at 00:17, Stuart A. Yeates  wrote:

> I recently completed a project writing en.wiki articles for all female
> and indigenous professors in my country, .nz.
>
> I now write pronounless biographies, because there were a significant
> number whose gender wasn't apparent from their public persona. My
> guess is that women and LGBTIA+ minorities are incentivised to remove
> markers of their gender from their online presence to keep a lower
> profile to avoid the trolls and bigots.
>
> There were also a number who clearly appeared to be a certain
> ethnicity based on their staff photo, but where there were no reliable
> sources as to that ethnicity.
>
> I also had a one person ask for their article to be deleted. [If this
> is of interest I can send details to you directly, but I will not post
> their details to a public forum and ask you refrain from this also.]
>
> I look forward to reading your experimental design taking these
> factors into account.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
> On Fri, 10 Jul 2020 at 06:43, Mackenzie Lemieux
>  wrote:
> >
> > Dear Wiki Community,
> >
> > My name is Mackenzie Lemieux and I am a neuroscience researcher at the
> Salk
> > Institute for Biological Studies and I am interested in exploring biases
> on
> > Wikipedia.
> >
> > My research hypothesis is that gender or ethnicity mediate the rate of
> > flagging and deletion of pages for women in STEM.  I hope to
> > retrospectively analyze Wikipedia's deletion history, harvest the
> > biographical articles about scientists that have been created over the
> past
> > n years and then confirm the gender and ethnicity of a large sample.
> >
> > It appears that we can identify deleted pages with Wikipedia's deletion
> log
> > , but to actually
> see
> > the page that was deleted we need to be members of one of these Wikipedia
> > user groups:  Administrators
> > , Oversighters
> > , Researchers
> > , Checkusers
> > .
> >
> > Does anyone have advice on how to obtain researcher status or is there
> > anyone willing to collaborate who has access to the data we need?
> >
> > Warmly,
> > Mackenzie Lemieux
> >
> >
> > --
> > Mackenzie Lemieux
> > mackenzie.lemi...@gmail.com
> > cell: 416-806-0041
> > 220 Gilmour Avenue
> > Toronto, Ontario
> > M6P 3B4
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Asperges, ADHD and editors

2020-04-02 Thread WereSpielChequers
I can fully understand that Wikipedians might be reluctant to reveal this
sort of information, especially if they edit under their own name. But some
do, and there are currently 624 Wikipedians who have put themselves in the
category
https://en.wikipedia.org/w/index.php?title=Category:Wikipedians_with_Asperger_syndrome
OK that includes quite a few duplicates, but so do the other user
categories, most of which seem a lot smaller. By contrast there doesn't
seem to be an ADHD category.

Equally, I can remember several discussions where people have commented on
the need for things to be done a certain way to accommodate people on that
spectrum, while it is rare for similar comments to come up about other
issues - colour blindness and other vision issues occasionally, but after
the needs of people who use "text to speech", I wouldn't be surprised if
Aspergers was the second most common I have seen mentioned in the community.

Regards

Jonathan

On Thu, 2 Apr 2020 at 16:49, RhinosF1 -  wrote:

> Evening all,
>
> I hope everyone is doing well given the crazy world we’re living in.
>
> I was having a conversation with a few users on Discord today and we were
> wondering whether wikimedia (or users of other similiar sites would be
> fine) disproportinately fall into the category of having aspergers, ADHD
> and other simmilar conditions.
>
> It would be even better if anyone knew what sort of areas these users were
> more likely to work in.
>
> Following a chat with Issac in #wikimedia-research, I understand there
> isn’t much support for this kind of research as users may not want to
> reveal this information and there is no clear reason for collecting the
> information but if anyone knows of past research or has any information,
> that would be helpful.
>
> Stay Safe,
> RhinosF1
> --
> Thanks,
> Samuel
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] New dataset of articles tagged by WikiProjects

2020-01-16 Thread WereSpielChequers
Hi Kerry,

I suspect it is likely to be different if you differentiate between
articles tagged by people involved in a particular WikiProject, articles
tagged into a WikiProject by newpage patrollers and other taggers, and
articles tagged into all their relevant wikiprojects.

The backlog of articles not allocated to any Wikiproject at all is usually
pretty small. But that doesn't mean that all articles are fully tagged for
Wikiprojects.

A few years ago I was involved in a major cleanup operation for our then
backlog of unsourced biographies of Living People. One of our tactics was
to create reports for each relevant wikiproject showing the unsourced
biographies that were relevant to them and encouraging them to help delete
or improve the articles in that report. At one point in the cleanup we
realised that only about half of the articles we were looking at were
tagged to any Wikiproject other than Biography. So a group of volunteers
went through I think it was twenty thousand unsourced biographies that were
only tagged to WikiProject Biography and tagged them to the relevant
wikiprojects, that usually meant at least  one geographic project and one
occupational one. Some of the Wikiprojects were assiduous in improving all
the articles we found for them, Heavy Metal I remember being very
efficient. Others just trawled through and nominated the unnotables and
hoaxes for deletion, WikiProject Croatia was one of those.

With a large proportion of WikiProjects dormant at any one time, I rather
suspect that most of the tagging for WikiProjects is in effect a subset of
the categorisation process rather than a sign that someone interested in
the topic has tagged the article for their WikiProject.


Regards

Jonathan


On Wed, 15 Jan 2020 at 21:13, Kerry Raymond  wrote:

> Out of idle curiosity ...
>
> Are there significant numbers of articles NOT tagged by any WikiProject?
> In my experience on-wiki, any article (apart from ones recently created)
> are tagged by one or more WikiProjects.
>
> I guess the converse question is what articles are the most tagged by
> WikiProjects? I am often surprised at how many WikiProjects jump in to tag
> some article I have created (I am more likely to notice the tagging of
> articles I create because they automatically go on my watchlist).
>
> Kerry
>
> -Original Message-
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org]
> On Behalf Of Isaac Johnson
> Sent: Thursday, 16 January 2020 6:54 AM
> To: Research into Wikimedia content and communities <
> wiki-research-l@lists.wikimedia.org>
> Subject: [Wiki-research-l] New dataset of articles tagged by WikiProjects
>
> Hey Research Community,
> TL;DR New dataset:
>
> https://figshare.com/articles/Wikipedia_Articles_and_Associated_WikiProject_Templates/10248344
>
> More details:
>
> I wanted to notify everyone that we have published a dataset of the
> articles on English Wikipedia that have been tagged by WikiProjects [1]
> through templates on their associated talk pages. We are not planning to
> make this an ongoing release, but I have provided the script that I used to
> generate it in the Figshare item so that others might update / adjust to
> meet their needs.
>
> As anyone who has done research on WikiProjects knows, it can be
> complicated to determine what articles fit under a particular WikiProject's
> purview. The motivation for generating this dataset was to support our work
> in developing topic models for Wikipedia (see [2] for an overview), but we
> imagine that there are many other ways in which this dataset might be
> useful:
>
> * Previous work has examined how active WikiProjects are based on edits to
> their pages in the Wikipedia namespace. This dataset makes it much easier
> to identify which Wikiprojects are managing the most valuable articles on
> Wikipedia (in terms of quality or pageviews).
>
> * Many topic-level analyses of Wikipedia rely on the category network.
> Categories can be very messy and difficult to work with, but WikiProjects
> represent an alternative that often is simpler and still quite rich. For
> instance, this could be used for temporal analyses of article quality,
> demand, or distribution by topic.
>
> * While WikiProjects are English-only and therefore limited in their
> utility to other languages, we also provide the Wikidata ID and sitelinks
> -- i.e. titles for corresponding articles in other languages -- to allow
> for multilingual analyses. This could be used to compare gaps in coverage
> -- e.g., akin to past work that has used categories [3].
>
> The main challenge, besides processing time, is how to 1) effectively
> extract the WikiProject templates from talk pages, and, 2) consistently
> link them to a canonical WikiProject name and topic. For example, the
> canonical template for WikiProject Medicine is
> https://en.wikipedia.org/wiki/Template:WikiProject_Medicine but another
> one used is
> 

Re: [Wiki-research-l] gender balance of Wikipedia citations

2019-08-25 Thread WereSpielChequers
Hi Greg,

One of the major step changes in the early growth of the English Wikipedia
was when a bot called RamBot created stub articles on US places. I think
they were cited to the census. Others have created articles on rivers in
countries and various other topics by similar programmatic means. Nowadays
such article creation is unlikely to get consensus on the English
Wikipedia, but there are some languages which are very open to such
creations and have them by the million.

I'm not sure if the fastest updating of existing articles is automated or
just semiautomated. But looking at the bot requests page, it certainly
looks like some people are running such maintenance bots "updating GDP by
country" is a current bot request.
https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.

I'm not sure how "the ease of a source for purposes of converting into a
table and generating a separate article for each row" relates to gender.
But i suspect "number of times cited in wikipedia" deserves less kudos than
"number of times cited in academia".

WSC

On Sun, 25 Aug 2019 at 05:22, Greg  wrote:

> Thanks again, Kerry. I am hoping that someone with access to more resources
> (knowledge, support, etc) than I have will look into this.
>
> A few more thoughts/questions:
>
> 1. The link to the citation dataset from the Medium article ("What are the
> ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> 2. As far as I can tell, every named author in the top ten most cited
> sources on Wikipedia is male. One piece is by a working group
> 3. This line from the Medium piece struck me: "Many of these publications
> have been cited by Wikipedians across large series of articles using
> powerful bots and automated tools."
>
> Are citations being added by bots? I'm not sure that I understand that line
> correctly.
>
> Greg
>
>
>
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Question on article creation policy

2019-08-14 Thread WereSpielChequers
Hi Haifeng,

IP editors were able to create new articles until December 2005.
https://en.wikipedia.org/wiki/History_of_Wikipedia

There was a sea change on the pedia in 2007, and the number of active
editors reached its highest peak then - even the 2015/2016 rally didn't get
back to that level. I don't think that we have consensus as to what changed
in 2007 and why things changed in 2007, I suspect it was multifaceted, but
one factor will turnout to be the switch from the previous SOFIXIT culture
to the current SOTEMPLATEITFORHYPOTHETICALOTHERSTOFIX culture.

Jonathan



On Sun, 11 Aug 2019 at 16:26, Haifeng Zhang  wrote:

> Thanks a lot for providing all these information!
>
> Was there a major change in article creation policy in early 2007?
>
> Can anonymous users create new pages before then?
>
>
> Best,
>
> Haifeng Zhang
> 
> From: Wiki-research-l  on
> behalf of Su-Laine Brodsky 
> Sent: Saturday, August 10, 2019 2:44:24 AM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Question on article creation policy
>
> Hi Haifeng,
>
> Re :  A more general question is: where to find information about policy
> changes, e.g., article creation, in Wikipedia?
>
> The Wikipedia Signpost usually covers major policy changes like this one (
> https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost)
>
> As Kerry pointed out though, more subtle policy changes happen without
> much publicity. If changes are contentious enough, they might appear in an
> RfC or The Village Pump, so those are some other areas to look.
>
> Cheers,
> Su-Laine
>
> Sent from my iPhone
>
> > On Aug 9, 2019, at 11:48 AM, Haifeng Zhang 
> wrote:
> >
> > Dear folks,
> >
> > I'm checking the Article Creation page (
> https://en.wikipedia.org/wiki/Wikipedia:Article_creation), and it says:
> >
> >
> > The ability to create articles directly in mainspace is restricted<
> https://en.wikipedia.org/wiki/Wikipedia:ACPERM> to autoconfirmed users,
> though non-confirmed users and non-registered users can submit a proposed
> article through the Articles for Creation<
> https://en.wikipedia.org/wiki/Wikipedia:Articles_for_creation> process,
> where it will be reviewed and considered for publication.
> >
> >
> > Anyone knows when the restriction (e.g., registered and auto-confirmed)
> become effective? I tracked the past revisions of the page but found no
> clue. A more general question is: where to find information about policy
> changes, e.g., article creation, in Wikipedia?
> >
> >
> > Thanks,
> >
> > Haifeng
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Difference between vandal fighting with vs. without tools

2019-06-25 Thread WereSpielChequers
Most of the vandalism I deal with nowadays I pick up when I am typo fixing.
I rarely check the same typo as frequently as once a fortnight, so a lot of
the vandalism I find is from over a week ago. That means it has got past
several layers of defences, including the watchlisters  (watch lists
default to 7 days).

But when in the past i have been active at recent changes I have honed in
on edits by editors with redlinked talkpages.If they made a good edit I'd
welcome them, if it was vandalism I'd warn them. Cluebot and users of
Huggle and Stiki are great at watching for edits by accounts and people who
have previously been warned, and if you are editing manually you are
wasting time trying to compete with them. But someone with a redlinked
talkpage is either a goodfaith editor, or a sufficiently sneaky vandal not
to be picked up by cluebot and the like.

On Mon, 24 Jun 2019 at 17:19, Haifeng Zhang  wrote:

> Hi all,
>
> This might be a known fact already.
>
> Does it take less time (on average) for an editor to identify a
> vandalistic edit when using counter-vandalism tools, e.g., Huggle or STiki?
> If so, what features of these tools support such decision?
>
>
> Thanks for your time,
>
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research on Edit Size

2019-06-22 Thread WereSpielChequers
Dear Haifeng Zhang,

If I were you, looking at this, I'd watch out for templates. Templates
particularly substituted ones involve a lot of bytes that someone hasn't
typed. I recently did an edit that involved me typing {{subst|Infobox
academic}} you might be surprised how many bytes that generated. And how
many more key depressions that edit involved compared to my typical edit.
Similarly reversion can involve adding a lot of bytes, but on further
inspection you might simple be reverting a vandal who removed four
paragraphs of text that others had contributed.

You might also want to look at an editors edit rate per hour, and time
since their previous edit. If their previous edit was half an hour earlier
they might have been making a cup of tea, cutting the grass or taking a
phone call, or they might have spent half an hour on that edit. But if they
have made forty edits in that previous half hour then you are pretty safe
to assume that those edits on average represent less than a minute of work.

As well as what Kerry said, there are two things you might want to take
into consideration. Firstly those of us with experience of breaking news
stories quickly learn the hard way to save little and often, especially on
a topical subject. Take for example the article on Sarah Palin in the hours
after she was announced as John McCain's running mate. My memory was of
multiple concurrent edit wars and a tidal wave of vandalism, I went back
later and measured it as peaking at 25 edits per minute, I don't think we
even log the edits lost to edit conflicts, but in practice anyone clicking
the edit button at the top was going to get an edit conflict - your only
chance of getting an edit to save would have been to edit by section.

Secondly, over time editors pick up tools, some of which  make a big
difference to edit rates. Edit summaries are a good indicator of this,
watch for words such as Twinkle, Hotcat, Huggle and AWB.  I haven't used
Catalot on Wikipedia, but it is the reason why my edit count is higher on
Wikimedia commons, despite my spending rather more time on Wikipedia.

Regards

Jonathan



On Fri, 7 Jun 2019 at 22:44, Haifeng Zhang  wrote:

> Dear folks,
>
> Are there studies that have examined what might affect edit size (e.g., #
> of words add/delete/modify in each revision). I am especially interested in
> the impact of editor's tenure/experience.
>
> Thanks,
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] User type context sensitivity to introduction sections.

2019-02-12 Thread WereSpielChequers
Dear Stuart,

The problem with notifying the article creators and templating the articles
is that the people who wrote that content are not necessarily the ones who
can rewrite it more clearly. And templating rarely solves problems, it
often just adds more clutter to a confused article.

AutoInforming the editor probably works for people who have linked to
disambiguation pages. But otherwise as Ziko has pointed out the solution is
better writing, and you don't get that by templating. You do get that
through correcting and fixing things, Wikipedians notice when people
improve our contributions, and many of us learn from that. I certainly
have. Having a hidden category of articles with overly high reading ages
would be a good move, and could attract the sort of Wikipedians who can fix
that issue.

Some time around 2007 Wikipedia shifted from a soFixIt culture to the
current less supportive SoTemplateItForHypotheticalOthers to fix culture.
The community then went into decline, and despite the 2015 rally, in some
ways we have a smaller more toxic community now than in 2007. One theory is
that the three phenomena, community size, toxicity and templating are quite
closely related.

Jonathan


On Sun, 10 Feb 2019 at 02:02, Stuart A. Yeates  wrote:

> I believe that the English language term you are looking for is
> https://en.wikipedia.org/wiki/Plain_English and the problem is that
> en.wiki policies already require plain english. The core of the issue
> is that writing in plain english is hard and currently there are few
> tools to support editors produce it.
>
> A decent reading level test applied by section and calculated using a
> javascript tool that fitted into the standard wiki framework for tools
> would be a very useful addition. The tool could annotate the article
> and for new articles notify the article creator.  Of course, we'd need
> supporting materials to aid editors learn plain english and so forth,
> but we have to start somewhere.
>
> cheers
> stuart
>
> --
> ...let us be heard from red core to black sky
>
> On Sun, 10 Feb 2019 at 11:22, Ziko van Dijk  wrote:
> >
> > Allow me to propose something different: Wikipedia needs better writing,
> > not technical solutions. And for different target groups, we need
> different
> > encyclopedias:
> > * for children
> > * for people with disabilities, such as
> > https://en.wikipedia.org/wiki/Leichte_Sprache
> > * for scholars, e.g. "Wikipedia scholar".
> > A different wiki for every target group can be arranged in the best
> > possible way for the target group.
> >
> > Kind regards
> > Ziko
> >
> >
> >
> >
> > Am Sa., 9. Feb. 2019 um 21:55 Uhr schrieb Aaron Gray <
> > aaronngray.li...@gmail.com>:
> >
> > > I am thinking maybe we could use subdomains for layperson, and for
> schools,
> > > and maybe universities to have specialized [approved] content also ?
> Just
> > > an idea given this possible mechanism.
> > >
> > > On Sat, 9 Feb 2019 at 20:15, Aaron Gray 
> > > wrote:
> > >
> > > > Thank you please keep suggestions and pragmatics coming in !
> > > >
> > > > I looked at this problem some time ago and the extra programming for
> what
> > > > I am proposing is quite minimal utilizing existing MediaWiki
> libraries
> > > and
> > > > adding extra code to support the tag structure with defaulting to
> make it
> > > > seamless to existing articles.
> > > >
> > > > I really think this would increase the usability and audience of
> > > > Wikipedia and also might possibly allow us to integrate content from
> > > other
> > > > Wikipedia projects.
> > > >
> > > > Regards,
> > > >
> > > > Aaron
> > > >
> > > >
> > > > On Sat, 9 Feb 2019 at 07:57, Amir E. Aharoni <
> > > amir.ahar...@mail.huji.ac.il>
> > > > wrote:
> > > >
> > > >> The suggestions that bring up the Simple English Wikipedia miss the
> fact
> > > >> that it only covers the English language, which most people don't
> know,
> > > >> and
> > > >> doesn't do almost anything for the many other languages of the
> world.
> > > (I'm
> > > >> saying "almost anything" because I know that there are people who
> prefer
> > > >> to
> > > >> translate articles from the Simple English Wikipedia, and this
> > > indirectly
> > > >> benefits other languages.)
> > > >>
> > > >> One thing about how Wikipedia works that practically no-one ever
> > > >> challenges
> > > >> is that every page title is associated with a page, and the page is
> > > always
> > > >> a single big blob of sections, section headings, templates and magic
> > > >> words.
> > > >>
> > > >> What if it was not a single blob?
> > > >>
> > > >> What if all the magic words, such as NOTOC, DISPLAYTITLE, and
> > > DEFAULTSORT
> > > >> moved to a separate metadata storage?
> > > >>
> > > >> More closely to this thread's topic, what if at least some sections
> that
> > > >> all or most pages have were stored separately, so that it would be
> > > >> possible
> > > >> to parse and render them semantically? The References section, for
> > > >> example,
> > > 

Re: [Wiki-research-l] ¿Model to automatically classify if one user is bot or not?

2019-01-25 Thread WereSpielChequers
The most recent of  IngredientSortBot's 764 edits was in 2007, so if that
wiki has a bot flagging system the bot flag would have likely been removed
in the last decade. But if 764 edits makes them significant on that wiki I
doubt that wiki ever introduced bot flagging.

You can make the assumption that editors with names ending Bot are bots and
on English language  wikis you are pretty safe. If you made the assumption
that accounts ending bot were bots you would lose a bit, three of the 5,000
most active accounts on the English wikipedia are longstanding accounts
that include bot but were created before the rule about usernames ending
bot being reserved for bots.

If you want to filter out edits that *do not represent human collaboration
or community actual status *then you might also want to filter out, or
better give a low weighting to edits flagged as "minor". That feature is
heavily used on wikipedia.

Jonathan



On Fri, 25 Jan 2019 at 21:08, ABEL SERRANO JUSTE  wrote:

> I want to remove bot users from my research since they inject a lot of
> noise on the data and do not represent human collaboration or community
> actual status. The aim of the model would be to detect actual (or
> mostly-behaving-as) bot users but not flagged as *'bot'* in the mediawiki
> *bot* group; just to get rid them off from my analysis in this way, and it
> would not meant to be used to label users within the mediawiki communities.
>
> I came up with this question since I was studying the wiki:
> https://cocktails.wikia.com and I found that, one of the most prolific
> users is "IngredientSortBot" which, besides its name, has a history of
> edits very characteristic for a bot user:
> https://cocktails.wikia.com/wiki/Special:Contributions/IngredientSortBot;
> but it's not included in any bot group and, because of that, it was
> included in my analysis and thus, biasing it.
>
> El sáb., 19 ene. 2019 a las 20:42, WereSpielChequers (<
> werespielchequ...@gmail.com>) escribió:
>
> > Aside from the sensitivities of this, and yes if there wasn't any doubt
> > calling an editor a bot is not something one should do lightly, it isn't
> an
> > easy thing to either define or identify. Doing bot edits from a non bot
> > account is a big deal on Wikipedia, I have seen an admin desysopped and
> > then blocked for this. Please be aware that labelling goodfaith non bot
> > editors as bots is unethical and liable to cause another clash between
> the
> > community and researchers..
> >
> > Edits per minute might at first glance look like a safe way to go, but
> then
> > you realise that some people will spend a long time manually building up
> to
> > a situation where they click a button and that completes dozens of edits
> > almost simultaneously.
> >
> > Type of edit and similarity of a series of edits might look like a good
> way
> > to go, but what you will have difficulty identifying is that the person
> who
> > seems to be making a series of edits without individual consideration may
> > be working their way through a list of possible edits and clicking save
> or
> > skip on each of them as a manual decision. Judging the results from the
> > edits saved without knowing what led up to saving those edits won't tell
> > you if an edit was a bot edit.
> >
> > What you can do is look for dormant accounts that are no longer flagged
> as
> > bots. On the English language Wikipedia we have a list of them at
> >
> >
> https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
> > other language versions may have similar lists and are likely to have the
> > same process of removing bot flags from bot accounts that retire.
> >
> > Regards
> >
> > Jonathan
> >
> > On Sat, 19 Jan 2019 at 10:24, ABEL SERRANO JUSTE 
> wrote:
> >
> > > Hello fellow wiki investigators!
> > >
> > > I have observed that, very often in wikis, users not in the bot groups
> > are
> > > actually behaving like bots. Since the mediawiki api doesn't restrict
> > > normal users to automatize tasks through its API, you might have a
> > "normal"
> > > user, actually doing bot things. I would like to identify those and
> > > consider them as bots.
> > >
> > > Is anyone aware if there's any implemented model already to classify
> > > whether an user is a bot or not?
> > >
> > > Thanks and nice weekend!
> > >
> > > --
> > > Saludos,
> > > Abel.
> > > ___
> > > Wiki-research-l mailing list
> > > W

Re: [Wiki-research-l] ¿Model to automatically classify if one user is bot or not?

2019-01-19 Thread WereSpielChequers
Aside from the sensitivities of this, and yes if there wasn't any doubt
calling an editor a bot is not something one should do lightly, it isn't an
easy thing to either define or identify. Doing bot edits from a non bot
account is a big deal on Wikipedia, I have seen an admin desysopped and
then blocked for this. Please be aware that labelling goodfaith non bot
editors as bots is unethical and liable to cause another clash between the
community and researchers..

Edits per minute might at first glance look like a safe way to go, but then
you realise that some people will spend a long time manually building up to
a situation where they click a button and that completes dozens of edits
almost simultaneously.

Type of edit and similarity of a series of edits might look like a good way
to go, but what you will have difficulty identifying is that the person who
seems to be making a series of edits without individual consideration may
be working their way through a list of possible edits and clicking save or
skip on each of them as a manual decision. Judging the results from the
edits saved without knowing what led up to saving those edits won't tell
you if an edit was a bot edit.

What you can do is look for dormant accounts that are no longer flagged as
bots. On the English language Wikipedia we have a list of them at
https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
other language versions may have similar lists and are likely to have the
same process of removing bot flags from bot accounts that retire.

Regards

Jonathan

On Sat, 19 Jan 2019 at 10:24, ABEL SERRANO JUSTE  wrote:

> Hello fellow wiki investigators!
>
> I have observed that, very often in wikis, users not in the bot groups are
> actually behaving like bots. Since the mediawiki api doesn't restrict
> normal users to automatize tasks through its API, you might have a "normal"
> user, actually doing bot things. I would like to identify those and
> consider them as bots.
>
> Is anyone aware if there's any implemented model already to classify
> whether an user is a bot or not?
>
> Thanks and nice weekend!
>
> --
> Saludos,
> Abel.
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Readers of Wikipedia

2018-12-16 Thread WereSpielChequers
I've long seen categorisation on wikipedia as a way to bring articles to
the attention of those who follow certain categories. During the cleanup of
unreferenced biographies a few year ago this was a useful adjunct, with
several wikiprojects cleaning up all the articles legitimately categorised
for them. Some of the other Wikiprojects did at least go through and prod
or speedy the non-notables and hoaxes in their areas.

I'm pretty sure it still operates that way, categorisation of an
uncategorised article sometimes brings it to the attention of people who
know the topic.

And of course where the article doesn't contain the words in the category,
categorisation then improves search.

If like me you are a glass third full person categories make a useful
contribution.


On Sat, 15 Dec 2018 at 22:21, Kerry Raymond  wrote:

> Pointy? I think you may misunderstand  my use of the term “hostage”. I
> don’t use it with the meaning of abducting people for ransom, but in the
> sense of “subject to things beyond our control”.
>
>
>
> I agree entirely that Wikipedia should serve its readers and to that end
> “To do” lists are compiled with the intention of giving adequate coverage
> of topics perceived to be needed. Yet, many of those “To do” lists are full
> of redlinks years later because we have volunteer contributors whose
> interests / expertise may not align with the perceived needs. Whereas if
> Wikipedia employed its writers, it could direct them to write articles
> about required topics. It would be a wonderful thing if we could harness
> the volunteer energy that goes into largely unproductive activities like
> endless category reorganisation (given studies show readers rarely look
> below the reference section and don’t see or use the categories) into
> writing content that is actually needed. But alas it is not so.
>
>
>
> Kerry
>
>
>
>
>
> From: Ziko van Dijk [mailto:zvand...@gmail.com]
> Sent: Sunday, 16 December 2018 3:32 AM
> To: Kerry Raymond ; Research into Wikimedia
> content and communities 
> Subject: Re: [Wiki-research-l] Readers of Wikipedia
>
>
>
> Hello,
>
> Thanks for the link and the comments, Leila!
>
>
>
> Am Fr., 14. Dez. 2018 um 00:44 Uhr schrieb Kerry Raymond <
> kerry.raym...@gmail.com  >:
>
> hostage to the interests of their contributors (unless they actively
> remove the material). That is, you get the topics that the contributors are
> willing and able to write, no matter what the intention might be.
>
>
>
> That's a very pointy expression: "Hostage to the interests of their
> contributors"! In fact, WP should serve recipients, but the reality is
> often different. We alreday saw that Article Feedback Tool as a means to
> find out what recipients think. I would be happy with a new, less ambitious
> approach, where we don't expect recipients to contribute to the improvement
> of content but just want to know their opinion.
>
>
>
> By the way, the distincion of large and short articles I have found in
> Collison's "Encyclopedias through the ages" (or similar) from 1966. It is
> not very prominent in there, but I have elaborated on the idea in 2015,
> with a distinction of definition articles, exposition articles, longer
> articles and dissertations.
>
>
>
> An encyclopedia with "short" articles - or a meaningful combination of the
> four types above - would fit well to the original concept of hypertext not
> being an actual set of texts (or nodes), but being an individual's specific
> learning strategy or reading path.
>
>
>
> Federico: remember, most of the oldest German texts (Old High German) deal
> with Biblical topics... :-)
>
>
>
> Kind regards
>
> Ziko
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Help us understand ORES and make good tradeoffs

2018-12-15 Thread WereSpielChequers
I agree that we very rarely misidentify vandalism.

Where there is a dichotomy between quality and openness is in our handling
of new unsourced content.
There are no easy solutions here, but I would acknowledge both that a
significant proportion of new unsourced content is good faith, and also
that those who revert much of it on sight are often doing the right thing.

One difficulty for the casual observer is how do you quickly tell the
difference between someone who knows a subject that you don't and is
rejecting an unsourced and implausible edit, as opposed to someone who is
as ignorant of a subject as you and is rejecting an unsourced edit from
someone who actually knows their stuff and was trying to improve wikipedia?

Jonathan

On Thu, 13 Dec 2018 at 16:34, Pine W  wrote:

> Hi Bowen, after reading your project proposal I have a few questions and
> concerns.
>
> You mention a perceived tension between protecting newcomers and protecting
> the quality of content. I am wondering whether that is a false dichotomy.
> In my experience, test edits and blatant vandalism usually look different
> from mistakes from good faith editors.
>
> There is a feature that allows users to adjust ORES-supported edit scoring
> in our watchlists and Recdent Changes:
>
> https://www.mediawiki.org/wiki/Edit_Review_Improvements/New_filters_for_edit_review
> .
> Have you tested this feature? How would your research be useful for that
> feature's future development?
>
> I think that ORES is supposed to aid human judgment, not to substitute for
> human judgment. How certain are you that "ORES applications will play a
> role in drawing a line between acceptable freestyle edits and editing
> policies in standard."? There may well be some human patrollers who adjust
> their definitions for vandalism based on ORES recommendations, but I think
> that you would want to know to what extent ORES has that effect.
>
> I would also like to mention that Wikipedia policies and guidelines, like
> offline human laws and customs, may change over time, may have varying
> interpretations, and may have varying degrees of adherence among the
> populace.
>
> Thanks for your interest in studying ORES. I am glad that you are
> collaborating with Aaron.
>
>
>
> On Thu, Dec 13, 2018, 7:08 AM Bowen Yu  wrote:
>
> > Hello,
> >
> > ORES has been out and served for the Wikipedia community for a while, for
> > the purpose such as counter-vandalism. Having seen the wide usage and
> > effectiveness of ORES in the community, we'd like to continue working on
> > ORES development. We plan to improve and redesign ORES algorithms by
> > incorporating feedbacks of all the stakeholders involved in the entire
> ORES
> > ecosystem, such as ORES application developers, ORES application
> operators,
> > etc. We want to understand their concerns and values, and come up with
> > effective algorithmic designs that can balance trade-offs and mitigate
> > potential conflicts of interests (such as edit quality control v.s.
> > newcomer protection) to further improve ORES performance.
> >
> > We will work with Aaron Halfaker and his team to make improvements on
> ORES
> > quality control models, and identify its limitations. Here is the project
> > proposal on Meta-Wiki
> > <
> >
> https://meta.wikimedia.org/wiki/Research:Applying_Value-Sensitive_Algorithm_Design_to_ORES
> > >.
> > If you are interested or have any thoughts, please feel free to reach out
> > to me. Thanks!
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> --
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editors: research on transitions, learning over time, leaving

2017-03-20 Thread WereSpielChequers
Dear Jan,

It's a fascinating topic and one that interests me as well.

But you have to be careful with your assumptions, our data is almost always
based on user accounts, but we'd like to think we are looking at people.
Some of whom will have different accounts over time. Some of the
involvement will switch between projects - apparently half the founding
Wikidata community were previously active in the movement. Some will spend
periods of their volunteer time off wiki - many very active volunteers put
time in as Arbcom members, OTRS volunteers or chapter trustees.


Volunteers are very very different to staff or even subscribers, barely 16
years into the project we simply don't have the data to workout longterm
patterns of retention and reactivation, but the signs so far are that
Wikipedia is beginning to look like other volunteer organisations that
people have a multi decade relationship with.

A few years ago the WMF did a survey of former editors, partly to learn why
they'd left. One of the most common responses was "I haven't left yet".

WSC

On 20 March 2017 at 09:34, Jan Dittrich  wrote:

> Hello,
>
> I am looking for research on how editors transition through various levels
> of involvement in their time as editors. The questions I ask myself are:
>
> - How many people to come each month?
> - How many editors leave?
>
> …those are not too difficult to answer but…
>
> - How many people become more involved over time? E.g. How many each month
> come to a level where they are interested in handling many pages on the
> watchlist, learn the less obvious aspects of wiki culture etc.
>
> In my work as designer I am often involved in features for intermediate
> and/or very involved users and I’m wondering if there are any ballpark
> estimates of how many people learn these features each month.
>
> Jan
>
> --
> Jan Dittrich
> UX Design/ User Research
>
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Phone: +49 (0)30 219 158 26-0
> http://wikimedia.de
>
> Imagine a world, in which every single human being can freely share in the
> sum of all knowledge. That‘s our commitment.
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Time Between edits - difference between RevisionID and {{NUMBEROFEDITS}}

2017-02-05 Thread WereSpielChequers
Thanks Nemo,

Those stats work on live edits in article space, hence the figures of about
3.5 million a month in the UK as opposed to the 5 million revision IDs.

The ten million interval data is calculated from the rawest of raw data,
the actual revision IDs. They started at 1 in January 2002, so it is
possible that numberofedits is partly different due to the first years
edits, but as the next million took 17 months I'm not expecting that the
first year was a million, let alone 100 million.

If {{NUMBEROFEDITS}} is just corrupt data then that's unfortunate, but if
it were for example being incremented for each edit saved or otherwise then
that would give us a count for edit conflicts, so it would be good to find
out what it actually measures.


On 25 January 2017 at 15:37, Federico Leva (Nemo) <nemow...@gmail.com>
wrote:

> Statistics of total (content) edit rate are also available on WikiStats at
> https://stats.wikimedia.org/EN/TablesDatabaseEdits.htm etc.
>
> Edit and revert trends charts tend to be more useful:
> https://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm
>
> WereSpielChequers, 25/01/2017 16:10:
>
>> One area that perhaps someone on this list can explain is the difference
>> between number of edits as measured by revisionID and as measured by
>> NUMBEROFEDITS
>>
>
> {{NUMBEROFEDITS}} and other magic words, just like Special:Statistics,
> should be assumed to be cached and not necessarily current or correct. They
> shouldn't be relied upon for any serious usage, except perhaps after a
> successful run of initSiteStats.php --update .
>
> That said, both {{NUMBEROFEDITS}} and the total number of edits in
> Special:Statistics are supposed to count the number of revisions, included
> deleted ones.
> <https://phabricator.wikimedia.org/diffusion/MW/browse/
> master/includes/SiteStats.php;b843994408cd0b4d9f2676ae87225258e0497913$135
> >
> <https://phabricator.wikimedia.org/diffusion/MW/browse/
> master/includes/parser/Parser.php;b843994408cd0b4d9f2676ae87
> 225258e0497913$2775>
>
> Nemo
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Time Between edits - difference between RevisionID and {{NUMBEROFEDITS}}

2017-01-25 Thread WereSpielChequers
One of our longest running sets of stats on Wikipedia is the time between
ten million edits - we now have stats for this over a fifteen year period.

After emailing User:Katalaveno and getting their agreement I have moved
https://en.wikipedia.org/w/index.php?title=User:Katalaveno/TBE=no
to https://en.wikipedia.org/wiki/Wikipedia:Time_Between_Edits and am making
a few changes.

One area that perhaps someone on this list can explain is the difference
between number of edits as measured by revisionID and as measured by
NUMBEROFEDITS - the difference is over a hundred million. That is too big a
number for it to be a measure of logged admin actions, unless when you
delete a page it increments number of edits for each revision deleted. It
might be in the right ballpark to give  a measure of edit conflicts, if so
it would be very good to have a measure of something we had thought
unmeasurable.

So I'm wondering if anyone on this list knows the difference between
{{NUMBEROFEDITS}}. and revisionID

WSC
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] Identifying bots and bot edit decline

2016-10-11 Thread WereSpielChequers
On the English Wikipedia you can start with the current bots which should
all be in https://en.wikipedia.org/wiki/Category:All_Wikipedia_bots

There are also former bots in the category
https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
but we are unlikely to spot them and add them to that list unless they did
enough edits to make the list of 10,000 most active Wikipedians after they
were deflagged.

Good to hear some figures about the drop in bot editing because of the
intrawiki links. I've recent taken over maintaining one of our more
venerable stats on Wikipedia
, and I was looking for
a figure on the drop in editing due to the intrawiki change.

Anecdotally I was told that the Wikidata community was half existing
wikimedians moving to a new project and half new recruits to the community.
But I don't remember seeing detailed stats on that, it might make an
interesting Phd project for someone.. As with so many other spinoffs both
within the community to Wikitonary, various languages of Wikipedia and of
course Wikimedia Commons and to fansites in Wikia of course there will be
some loss to the community they were spunoff from.

The other, bigger and harder change to quantify is the amount of
vandalfighting, bot and manual that moved to the edit filters between 2009
and 2014. Because it was a gradual process as filters were tested and
refined it mainly looks like a general decline in editing.



On 11 October 2016 at 11:25, Taha Yasseri  wrote:

> Hi Fabian,
>
> We recently did the same exercise for this paper: Even Good Bots Fight
> .
> Have a look at the data collection, where we explained how we made a list
> of all bots.
> Also re edit statistics, see Fig S1.
>
> Happy to compare the lists and share data.
>
> Best,
> Taha
>
> On Tue, Oct 11, 2016 at 11:08 AM, Federico Leva (Nemo)  > wrote:
>
>> Wikistats knows about 8017 bot usernames according to
>> https://dumps.wikimedia.org/other/pagecounts-ez/wikistats/csv_wp_main.zip
>> (cut -f2 -d, StatisticsBots.csv | sort -u | wc -l ). Given active editors
>> tend to complain a lot if they get counted as bots, a comprehensive list
>> should probably be a superset of that one.
>>
>> Flöck, Fabian, 11/10/2016 11:15:
>>
>>> This is likely not news, so can someone enlighten me regarding what
>>> brought about that sharp decline of bot edits?
>>>
>>
>> The migration of interwiki links to Wikidata, which is very visible in
>> https://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm .
>>
>> There was also some statistic by WMF on whether active users had
>> "migrated" to Wikidata from other projects, but I can't quickly find it
>> now; maybe it was around the time of http://infodisiac.com/blog/201
>> 4/03/wikimedia-editor-trends-broken-down-by-project/ .
>>
>> Nemo
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
>
> --
> ==New Paper==
> Editorial: At the Crossroads: Lessons and Challenges in Computational
> Social Science
> 
> Borge-Holthoefer J, Moreno Y and Yasseri T
> *Front. Phys*. 4:37 (2016).
> =
>
> Dr Taha Yasseri
> http://www.oii.ox.ac.uk/people/yasseri/
> Research Fellow in Computational Social Science, Oxford Internet
> Institute,
> Research Fellow in Humanities and Social Sciences, Wolfson College,
> University of Oxford,
> and
> Faculty Fellow, Alan Turing Institute for Data Science.
>
> Tel. +44-1865-287229
> 1 St. Giles
> Oxford OX1 3JS
> UK
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-27 Thread WereSpielChequers
We already have hundreds of millions of users. A large proportion of people
who use the internet will use Wikipedia in a given month, they use it by
reading bits of it. Finding out what the barriers are for the thousands of
millions who don't use Wikipdia would be useful. No doubt there are some
who are aware of Wikipedia but didn't feel a need to consult an
encyclopaedia in the last month, and some who are not currently in the
market for an encyclopaedia because they are too young, too senile, or
locked up. But research into why people don't use Wikipedia would be
useful. Our mission is to make the sum of all knowledge available to all,
finding out how we get to the next 400 million people, and indeed what
proportion of humanity would use an encyclopaedia if it was available to
them would be a great use of research.

Of those hundreds of millions only a tiny proportion, perhaps 0.02% are
"active editors", and that on an absurdly generous definition of active (5
edits in one month).

Theory tells us that as quality continues to improve so those readers who
fix a typo or some vandalism when they see it have been editing less and
less frequently. We know that the edit filters have lost us many of the
vandals who used to be such an important part of the raw editing figures of
the site (it never ceases to amuse me that the threshold to count as an
active editor was exactly the same as the typical vandal needed to get
through four levels of warnings and then get blocked). We also know that
the rise of the Smartphone and to a lesser  extent the tablet has lost us
editors, to most tablet users and almost all smartphone users Wikipedia is
a read only website not an interactive one. But it would be good to test
that as even the most obvious explanation is only a hypothesis until
someone has tested it, better still some sort of quantification of those
various issues would be very helpful.

How we replace typo fixing and vandalism reversion as entry level
activities to editing is one of the challenges of the community, any
research on that would be very useful.

On 27 August 2016 at 08:13, Pine W  wrote:

> Thinking big here: popular internationalized computer games can have 10+
> million unit sales. Some of the most popular online games have millions of
> monthly active users. I'm wondering if the research community, including
> Design Research, can envision a way for Wikimedia to scale up from 80,000
> active monthly users to 8,000,000 active monthly users.
>
> What would we need in order to stimulate and nourish this kind of growth?
>
> What can we learn from popular internationalized games about design that
> could benefit Wikimedia on a large scale?
>
> Pine
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [WikimediaMobile] Mobile Wikipedia, Commons, Wikidata, and Pokémon

2016-07-19 Thread WereSpielChequers
RichFarmbrough has been helping me out with lists of articles that have a
UK geocode but no image.
https://en.wikipedia.org/wiki/User:Rich_Farmbrough/temp138 I've been
testing image adding as a newbie exercise. Due to the Geograph the UK is
much better covered on Commons than most other places, 0.1% of the world's
land area used to have over 10% of commons and still has about 6%

The same sort of lists could be created for other countries, but whereas in
the UK we have images on commons or can import them from the Geograph, for
most other countries this would be a prospect list for photographers. Of
course countries that lack FOP or have FOPNC will have lots of articles
about buildings that we can't photograph, but maybe we can filter out
articles about modern buildings in countries with restrictive FOP?



On 18 July 2016 at 15:07, Magnus Manske  wrote:

>
>
> On Fri, Jul 15, 2016 at 3:22 AM Pine W  wrote:
>
>> I was thinking along similar lines as Stuart, using OSM to navigate and
>> encouraging users to take photos of landmarks and other buildings where
>> that's permitted by FOP. Landmarks for which we have only small photos, old
>> photos (more than about 3 years), or no photos could be prioritized.
>>
>> *ahem* ;-)
> https://tools.wmflabs.org/wikishootme/
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what you expect)

2016-02-20 Thread WereSpielChequers
Hi Kerry, good point. I've often heard the FA crowd say that the featured 
article process has a higher proportion of women than is normal on Wikipedia. 
From my own experience there they are probably right.

Regards

Jonathan 


> On 19 Feb 2016, at 23:45, Kerry Raymond  wrote:
> 
> In IT development, it’s not unusual to find the women to be of a higher 
> standard of ability. They have to be to survive the filters in their 
> profession. It’s not uncommon to see new graduates in ending up in roles 
> based on gender: men into development, women into help desk, tech writing, 
> testing etc. Why? “Girls are good with people” (help desk), “Girls have more 
> attention to detail” (testing) etc. Then, lacking a development role on their 
> CV, it makes it harder for them to get their next job in a development role. 
> You have to be good to survive that filtering.
>  
> So I can easily believe the average women on GitHub is of a higher standard 
> of ability than the average male. I suspect the same holds true about 
> Wikipedians. Does anyone actually have the 2011 editor survey data to compare 
> male vs female on other questions like age, level of education, etc. It would 
> be interesting to know how the male and female Wikipedians of 2011 are 
> statistically different in other ways.
>  
> Kerry
>  
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org] On 
> Behalf Of Flöck, Fabian
> Sent: Friday, 19 February 2016 9:42 PM
> To: Research into Wikimedia content and communities 
> 
> Subject: Re: [Wiki-research-l] Gender bias in GitHub (but not entirely what 
> you expect)
>  
> There are several issues with this study, some of which are pointed out here 
> in a useful summary: 
> http://slatestarcodex.com/2016/02/12/before-you-get-too-excited-about-that-github-study/
>  . Especially making the gender responsible for the difference in contrast to 
> other attributes of the users that might be just linked to the gender (maybe 
> the women that join GitHub are just the very best/professional women, 
> contribute only to specific types of code, etc., etc.) , apart from some 
> other open questions re:methods, seems questionable for me.  And I also share 
> the author’s criticism of “science journalism” and it’s propensity for 
> reporting catchy results.
>  
> Fabian
>  
>  
> On 11.02.2016, at 23:20, Laura Hale  wrote:
>  
> https://www.quora.com/Has-the-female-participation-on-Quora-changed-in-the-past-6-months-if-so-how/answer/Laura-Hale
>  is not peer reviewed (though if you want my data) but I'm the only person 
> inside the community looking at gender issue on Quora.
>  
> In the past six months, there has been a noticable shift in female 
> participation type on Quora, to the point where it surpassed that of men.  It 
> isn't necessarily translating towards higher female user rates but it is on 
> the participation side.
>  
> Sincerely,
> Laura Hale
>  
> On Thu, Feb 11, 2016 at 10:30 PM, Jonathan Morgan  
> wrote:
> Thought I'd pass this along. Haven't read the whole article yet, but it 
> sounds fascinating. 
>  
> TL;DR: Looks like contributions by women are accepted more often than those 
> by men, but only if the project leader doesn't know the pull request is 
> coming from a woman.
>  
> Excellent summary: 
> http://arstechnica.com/information-technology/2016/02/data-analysis-of-github-contributions-reveals-unexpected-gender-bias/
>  
> Preprint: https://peerj.com/preprints/1733v1/
>  
> Note: this work has not yet been peer-reviewed. 
>  
> J
>  
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF)
>  
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> 
> 
>  
> --
> twitter: purplepopple
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>  
> 
> 
> 
> Gruß, 
> Fabian
> 
> --
> Fabian Flöck
> Research Associate
> Computational Social Science department @GESIS
> Unter Sachsenhausen 6-8, 50667 Cologne, Germany
> Tel: + 49 (0) 221-47694-208
> fabian.flo...@gesis.org
>  
> www.gesis.org
> www.facebook.com/gesis.org
>  
>  
>  
>  
> 
>  
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Community policing, New Page Patrol, Articles for Creation, and editor retention

2016-01-28 Thread WereSpielChequers
That's one possibility. But if we have declining wikiprojects in a negative
feedback loop at the same time as an overall stable or slowly growing
community, then either the active wikiprojects are better able to retain
existing editors and also convert more casual editors into regulars or
there is some other growth area to more than balance the local declines.

We need some research to test that model. It is also possible that some
people like ploughing their own furrow and being the undisturbed Wikipedia
expert in their own area. I'm pretty sure that one thing that drives some
people away is conflict and not everyone enjoys the process of their work
being ruthlessly edit by others. We may also have a more complex community
that needs measurement over a longer period of time, it could be that
hundreds of the wikiprojects we now think of as inactive are merely dormant
and over a longer time period many of them will have intermittent
flourishes of activity as editors join them or reactivate.

If it turns out that dormant wikiprojects have as Pine puts it "low
stickyness" then perhaps it would make sense to declare loads of inactive
projects dormant and redirect them to parent projects. On the other hand if
it turns out that simply keeping inactive wikiprojects around waiting for
the next person who cares about the topic means that when that person joins
the community they are more likely to stay then it would make sense to keep
inactive wikiproject.

In any event I suspect some reports, "unanswered newbie queries on
wikiproject talkpages" and "Wikiprojects with no watchlisters who are
currently active experienced editors" would probably be worthwhile.

WereSpielChequers



On 28 January 2016 at 23:05, Pine W <wiki.p...@gmail.com> wrote:

> I've been thinking about what David said. It seems to me that there's a
> vicious cycle of too few contributors --> languishing wikiprojects --> low
> stickiness for potential contributors who would otherwise be attracted to
> those wikiprojects. So how do we get out of it? Any suggestions?
>
> I'm wondering if Wikia has some practices that we could borrow. Any
> thoughts along that line?
>
> Pine
>
> On Sun, Jan 10, 2016 at 8:40 PM, David Goodman <dgge...@gmail.com> wrote:
>
>> There will always be difficulties in getting good volunteer patrolling of
>> some subjects, for exactly the same reason that there are difficulties in
>> getting articles on those subjects: the lack of knowledgable volunteers
>> interested  in writing about them on WP.
>>
>> What complicates the situation is that many of these subjects that are
>> relative unattractive to volunteers are very attractive to people with the
>> most blatant  forms of conflict of interest: practitioners of various
>> professions, companies in various lines of business, makers of certain
>> types of products.
>>
>> It is unfortunately impossible for a volunteer-based project to avoid
>> this, in the absence of fixed rules that can discriminate closely between
>> those articles and subjects worth fixing and those not. There is a very few
>> areas of WP where we do have such rules, (eg. WP:PROF) and decisions there
>> go quite smoothly in most cases. But there is no way of making exact
>> decision on keeping articles when relying on something as amorphous as the
>> GNG. At AfC, there is another limitation: the question is not whether an
>> article should be accepted into WP, but whether there's a decent
>> probability that the article will in fact be accepted.
>>
>> As an analogous problem, the qualification for giving accurate and
>> effective online advice about writing an article is not very common. Many
>> more WPedians can write a decent article than they can teach others to do
>> so.  Thus, even the most dedicated people can reach very few of the people
>> who ought to be reached.
>>
>> i do not mean to suggest that we should not try to do better--we should
>> try to do very much better at every step. But there is a limit to what can
>> be expected in an organization like ours.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jan 8, 2016 at 3:21 AM, Jane Darnell <jane...@gmail.com> wrote:
>>
>>> Hi Pine,
>>> I definitely think that there is enough data to start a project or
>>> workspace dedicated to creating tools that will deliver the data in ways
>>> that can support decision-making. Given 10 newbie good-faith editors, what
>>> are their types of interests and reasons for staying or leaving? Similar
>>> questions can be asked of current editors. If we break this down, I guess
>>> the main questions we have can be split into two groups; nam

Re: [Wiki-research-l] Quality issues

2015-11-20 Thread WereSpielChequers
My experience is that pretty much all Wikimedians care about quality,
though some have different, even diametrically opposed views as to what
quality means and which things are cosmetic or crucial.

My experience of the sadly dormant death anomaly project
 was that people
react positively to being told "here is a list of anomalies on your
language wikipedia" especially if those anomalies are relatively serious.
My experience of edits on many different languages is that wikipedians
appreciate someone who improves articles, even if you don't speak their
language. Dismissing any of our thousand wikis as a "black box" is I think
less helpful.

One of the great opportunities of Wikidata is to do the sort of data driven
anomaly finding that we pioneered with the death anomalies report. But we
always need to remember that there are cultural difference between wikis,
and not just in such things as the age at which we assume people are dead.
Diplomacy is a useful skill in cross wiki work.



On 20 November 2015 at 07:18, Gerard Meijssen 
wrote:

> Hoi,
> At Wikidata we often find issues with data imported from a Wikipedia.
> Lists have been produced with these issues on the Wikipedia involved and
> arguably they do present issues with the quality of Wikipedia or Wikidata
> for that matter. So far hardly anything resulted from such outreach.
>
> When Wikipedia is a black box, not communicating about with the outside
> world, at some stage the situation becomes toxic. At this moment there are
> already those at Wikidata that argue not to bother about Wikipedia quality
> because in their view, Wikipedians do not care about its own quality.
>
> Arguably known issues with quality are the easiest to solve.
>
> There are many ways to approach this subject. It is indeed a quality issue
> both for Wikidata and Wikipedia. It can be seen as a research issue; how to
> deal with quality and how do such mechanisms function if at all.
>
> I blogged about it..
> Thanks,
>  GerardM
>
>
> http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Reinforcing or incentivizing desired user behavior

2015-10-19 Thread WereSpielChequers
As Gerard points out a highly subjective and inconsistent rewards system is
an inevitable consequence of a volunteer based community. I'd add that a
one side of the inconsistency, people being overlooked, is something we can
work on by finding better tools. For example before the loss of toolserver
and the labs problems we used had a list of overlooked Autopatroller
prospects
<https://en.wikipedia.org/w/index.php?title=Wikipedia:Database_reports/Editors_eligible_for_Autopatrol_privilege=613695217>
that HJ Mitchell and I used to work through. Other aspects of inconsistency
are a great opportunity for researchers to investigate - and if research
identifies a group of overlooked editors then the community will likely
respond. Another side of the inconsistency, people cheapening the system by
self awarding barnstars or handing them out too freely, is something the
community has various mechanisms to handle; and those who would try and
change this area need to be aware of that.

Of course you can have elements of the reward system that are less
subjective and inconsistent, for example service level awards, FA stars and
so forth. I rather suspect that the barnstar system is the subjective and
inconsistent residue left after many aspects of the reward system were
codified and separated from the original barnstar system, but I'll let
someone else earn a PhD by proving or disproving that one!

On 19 October 2015 at 10:07, Gerard Meijssen <gerard.meijs...@gmail.com>
wrote:

> Hoi,
> Ask yourself, what is it that you get with a more "scientific" approach.
> Is it commitment and involvement and who gets involved when science decides
> who to select as a special case?
>
> My point is very much that arguments like this forget what it is we want
> to achieve. A barnstar is from me (my involvement) to someone else (my
> appreciation). I do not care for scientific when it follows that my
> involvement is not welcome.
> Thanks,
>  GerardM
>
> On 19 October 2015 at 07:24, Pine W <wiki.p...@gmail.com> wrote:
>
>> As much as I like the barnstar system, it's highly subjective and
>> inconsistent. I'd like to see a more systematic approach. Perhaps this
>> could be combined with some of Aaron's work about edit quality.
>>
>> On Sat, Oct 17, 2015 at 2:52 AM, WereSpielChequers <
>> werespielchequ...@gmail.com> wrote:
>>
>>> We have a complex set of "badges", some, as Kerry pointed out are
>>> available to everyone who qualifies for them, some are based on the
>>> statistics of your account - tenure, edit count, articles created. Others
>>> are based on things you've been awarded by others, the bronze stars for
>>> featured articles, but also userboxes for everything from userrights to
>>> number of DYKs. Barnstars are a key subset that can only be awarded by
>>> others. There are Barnstars available for a huge range of things, even
>>> civility and diplomacy. It would be interesting and probably salutary to do
>>> a study on which Barnstars are awarded, my suspicion is that the anti
>>> vandalism ones may well be the most frequent. I would also encourage
>>> everyone to lead by example and actually use the Barnstar system for people
>>> who have made extraordinary  contributions. But be careful not to devalue
>>> the system by for example giving one to everyone who reports a bug in
>>> visual editor - in the past when we had lots of adolescents and teenagers
>>> in the community there was a craze for creating secret pages with a
>>> Barnstar award for finding them; so if you give out Barnstars too freely
>>> you risk being thought of as the sort of immature adolescent that usually
>>> makes that sort of  mistake.
>>>
>>> Regards
>>>
>>> Jonathan
>>>
>>>
>>> On 14 Oct 2015, at 02:42, Luis Villa <lvi...@wikimedia.org> wrote:
>>>
>>> I think there's a lot to be done there (probably will blog soon about my
>>> weekend experimenting with Genius, which had pretty extensive systems for
>>> this).
>>>
>>> It is an interesting prioritization question: doing it
>>> thoroughly/systematically would require a lot of software investment,
>>> especially since we don't have structured conversation pages (which are the
>>> basis for a lot of similar contributor recognition systems).
>>>
>>> Luis
>>>
>>> On Sun, Oct 11, 2015 at 5:25 AM, Pine W <wiki.p...@gmail.com> wrote:
>>>
>>>> Kerry,
>>>>
>>>> Thanks so much for the comments. I will bring up the subjects of badges
>>>> and cobtributor KPIs with Lui

Re: [Wiki-research-l] Reinforcing or incentivizing desired user behavior

2015-10-06 Thread WereSpielChequers
 I thought if we had a "primary" badge or KPI system it was the content 
focussed ones and especially those related to Featured articles. Editcountitis 
is seen by many as a bit of a joke. But there many others including articles 
created and length of service. I do like the idea of celebrating our most 
thanked editors but I don't think the necessary information is currently public.

Regards

Jonathan 


> On 6 Oct 2015, at 07:33, Kerry Raymond  wrote:
> 
> Certainly there are a lot of sites with badges that do seem to encourage 
> certain behaviour. On Wikipedia, we have edit count and that seems to 
> generate editcountitis which (when gamed) tends to favour lots of little 
> housekeeping edits over content edits. But one of the things with badges on 
> most sites is that the site assigns the badge. Here on Wikipedia, I can put 
> any badge I want on my User Page (the pre-existing ones are mostly edit-count 
> based but I can roll my own as some users do). Indeed as I discovered, other 
> people can put badges on my user page and presumably take them away. As edit 
> count is our primary KPI, it doesn't address "cultural" attributes. Should we 
> be making more of an effort to promote other KPIs that emphasise positive 
> behaviour like thanks (given and received)? Unfortunately our main 
> interaction mechanism is writing on talk pages and it's hard to tell whether 
> any contribution on a talk page is a "positive" behaviour or a negative one 
> (short of some kind of sentiment analysis). This is an unfortunate 
> consequence of using a wiki for a conversation rather than some more 
> purpose-built tool. 
> 
> In principle one takes a KPI and then creates a badge to reward a behaviour 
> that improves that KPI. But that's all easier said than done.
> 
> For content improvements, there are probably some things we can do. For 
> example, I presume looking at the edit deltas, we could tell if an edit to an 
> article added a citation (a pair of ref tag in the new version that weren't 
> there in the old version). Adding citations is a desirable behaviour that we 
> could report on and give badges for (although obviously whether or not that 
> citation in any way supports the claim cannot be determined, so the "gaming" 
> of this is to add random citations to offline sources to lots of articles, 
> which cannot be easily verified). In which case maybe we need to give a 
> better score to an online citation on that grounds it is more likely to be 
> verifiable).
> 
> But positive "culture" or positive social behaviour is harder to detect and 
> reward. For example, we'd like to close the gendergap but firstly we don't 
> have KPI that measures it on an ongoing basis because we don't actually know 
> which contributors are male/female. And even if we had that KPI, what users 
> or their behaviours would we reward for having positive impact on that KPI? 
> In real-life, we might reward a customer who introduces a new customer. Or we 
> might have a "finders fee" for someone who introduces a "new hire". How could 
> we reward introducing new women to Wikipedia or encouraging them (perhaps 
> through mentoring) to contribute more? Or would we reward contributors who 
> contribute to articles about "women's topics" (which is addressing the 
> content gendergap rather than the contributor gendergap, which aren't the 
> same thing although many believe them to be closely linked). [I won't 
> disgress into the challenge of deciding how "female" an article topic is.]
> 
> On some sites, you need certain badges to "unlock" certain extra 
> functionalities. Are we happy for RfA to be a question of collecting up 
> enough badges? AFAIK, the only auto-implemented badge we have on Wikipedia is 
> the "auto-confirm" (4 days and 10 edits from memory).
> 
> I think badges are a good idea but I think the way Wikipedia is implemented 
> makes it challenging to machine-identify desirable behaviours to reward 
> (particularly for social/culture metrics). I think badges have (in the most 
> part) to be machine-calculated and awarded or else it just becomes a 
> popularity content (who's mates with who). I know Aaron (or someone) was 
> toying with the idea of putting a value on each edit (presumably based on 
> some training set of edit data that humans rated). I think it's not 
> impossible to come up with some set of dimensions on which an edit might be 
> valued and, using some human evaluations on a test set, come up with some 
> kind of values for each dimension. It might be rough in the first instance 
> but I guess if it incorporated some ongoing feedback mechanism, it could 
> improve over time.
> 
> A cheap thing that we could do (and I don't think we do) is have edit count 
> badges for  "last week", "last month", "last year". ATM we only have 
> "lifetime" counts, which makes it hard for the new user to get any quick 
> positive acknowledgements for their efforts. 
> 
> Kerry
> 
> 

Re: [Wiki-research-l] Verifying claims about ENWP project size

2015-09-16 Thread WereSpielChequers
I'm pretty sure that English Wikipedia is the largest English language 
encyclopaedia, but there are some humongous ones in China.

Baidu Baike with almost 12.5 million articles is way bigger than any one 
language version of Wikipedia and Baike.com formerly Hudong is about a million 
bigger still.

Ok they are more inclusionist than us, recipes included, and they have somewhat 
dropped the distinction between a dictionary and an encyclopaedia.

So you can claim that Wikipedia with near 35 million articles in 288 languages 
is the largest encyclopaedia ever. Adding wiktionary would make that even 
bigger.

Source Wikipedia - I'm afraid I don't speak Chinese to check them myself.

Of course articles is a flawed metric, combining almost all the individual 
Pokemon articles into a handful of lists reduced the number of Wikipedia 
articles by hundreds, but still left us with more information on Pokemon than I 
would want to see in a printed encyclopaedia. But then can anyone suggest a 
meaningful metric for comparing such projects; Participants? Contributed edits? 
Shelf space if printed in traditional encyclopaedia sized books? Gigabytes of 
text? Trays of microfiche?

Regards

Jonathan 


> On 16 Sep 2015, at 01:24, Jonathan Morgan  wrote:
> 
> Hi Pine,
> 
> TL;DR: best to just say it's the largest encyclopedia ever. That should be 
> safe.
> 
> Claims like this are hard to make because terms that seem concrete from afar 
> tend to break down up close. For example: What do you mean by largest? 
> 
> Largest in bytes? Words? Content "units" (articles vs. manuscripts in this 
> case, I guess)? Contributors?
> 
> What do you mean by "open text project"? Is archive.org an open text project? 
> It has 8.2 million books. How would you compare the two? Does 1 book = 1 
> article?
> 
> Having said all that, I'm curious how others have/would craft a claim like 
> this. My guess is that most of us who've written for an academic audience 
> have settled for some variant of "largest encyclopedia" (you've got to put 
> something in your Introduction paragraph, after all). What sayst?
> 
> J
> 
>> On Tue, Sep 15, 2015 at 4:45 PM, Pine W  wrote:
>> Hi researchers,
>> 
>> I could use a little help with understanding these dumps:
>> 
>> https://dumps.wikimedia.org/enwikisource/latest/
>> 
>> https://dumps.wikimedia.org/enwiki/20150901/
>> 
>> I'm trying to verify the claim that ENWP is the world's largest open text 
>> project, and to do that I need to verify that ENWP is larger than English 
>> Wikisource. Which files should I be comparing?
>> 
>> Are there any other projects that could make a claim to be a larger open 
>> text project than ENWP? Perhaps there's a library somewhere that has such a 
>> huge volume of out-of-copyright materials that the combined bytes of 
>> published text are larger than ENWP?
>> 
>> Thanks!
>> 
>> Pine
>> 
>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> 
> 
> -- 
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF)
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Improved reference handling in VisualEditor, highlighted in this week's Tech News

2015-09-01 Thread WereSpielChequers
Hi Pine,

This would be a good news story for the GLAM community and I suspect education 
as well. Is there a signpost article I can post on a couple of GLAM Facebook 
groups?

Regards

Jonathan


> On 1 Sep 2015, at 01:35, Pine W  wrote:
> 
> This is cool for the reference enthusiasts: 
> 
> "VisualEditor will now automatically create a link when you type in or paste 
> an ISBN, PMID, or RFC. [48][49]"
> 
> Could this be included in a WMF social media post? I anticipate that 
> relatively few people will test the new functionality, but the message feeds 
> into the theme of helping users to understand that Wikipedia is the 
> encyclopedia that (almost) anyone can edit. Readers of social media who would 
> be especially valuable contributors because they understand what ISBNs and 
> PMIDs are, might take an interest if this news is sent over the social media 
> channels.
> 
> Pine
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Spambots and HTTPS

2015-09-01 Thread WereSpielChequers
As you say I doubt many spammers would get into the 100 edit a month league
before being blocked.

Of course a lot of rollbackers will be in the stats, and if there had been
a drop in spam then some of the spam fighters might drop below 100 edits a
month. But since the big announcement today about a 350 account sockfarm
that had been spamming Wikipedia, I think it would be odd if spam overall
was down on the year (I seem to remember some research a few years back
that showed spam steadily rising year on year)..

My experience of spammers who create articles is that once their article is
deleted they are often left with fewer than five live edits, so unlike
vandals many of the spammers will not make the 5 edit a month figures.

On 1 September 2015 at 20:58, Pine W  wrote:

> I noticed after HTTPS was enabled by default that there were many fewer
> spambots on one of the wikis that I monitor for recent changes. Did anyone
> else noticed a decline in spambots after HTTPS was enabled?
>
> This may be relevant to discussions about the highly active editor stats.
> While I doubt that spambots and vandals succeed in getting to 100 edits on
> the larger Wikipedias very often, rollbackers might. Additionally, a
> reduction in spambots and spambot-related rollbacks might affect the number
> of new accounts registered and the number of edits per month stats.
>
> Pine
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-24 Thread WereSpielChequers
100 edits a month does indeed have the disadvantage that all edits are not
equal, there may be some people for whom that represents 100 hours
contributed, others a single hour. So an individual month could be inflated
by something as trivial as a vandalfighting bot going down for a couple of
days and a bunch of oldtimers responding to a call on IRC by coming back
and running huggle for an hour.

But 7 months in a row where the total is higher than the same month the
previous year looks to me like a pattern.

Across the 3,000 or so editors on English wikipedia who contribute over a
hundred edits per month there could be a hidden pattern of an increase in
Huggle, stiki and AWB users more than offsetting a decline in manual
editing, but unless anyone analyses that and reruns those stats on some
metric such as unique calender hours in which someone saves an edit I
think it best to treat this as an imperfect indicator of community health.
I'm not suggesting that we are out of the woods - there are other
indicators that are still looking bad, and I would love to see a better
proxy for active editors. But this is good news.



On 23 August 2015 at 19:31, Mark J. Nelson m...@anadrome.org wrote:

 WereSpielChequers werespielchequ...@gmail.com writes:

  Could you be more specific re In general I'm not sure the 100+ count is
  among the most reliable. What in particular do you think is unreliable
  about that metric?

 The main thing I have questions about with that metric is whether it's a
 good proxy for editing activity in general, or is dominated by
 fluctuations in bookkeeping contributions, i.e. people doing
 mass-moves of categories and that kind of thing (which makes it quite
 easy to get to 100 edits). This has long been a complaint about edit
 counts as a metric, which have never really been solidly validated.

 Looking through my own personal editing history, it looks like there's
 an anti-correlation between hitting the 100-edit threshold and making
 more substantial edits. In months when I work on article-writing I
 typically have only 20-30 edits, because each edit takes a lot of
 library research, so I can't make more than one or two a day. In months
 where I do more bookkeeping-type edits I can easily have 500 or 1000
 edits.

 But that's just for me; it's certainly possible that Wikipedia-wide,
 there's a good correlation between raw edit count and other kinds of
 desirable activity measures. But is there evidence of that?


 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-23 Thread WereSpielChequers
Hi Nemo,

Month-over-month growth isn't what I was talking about, not least because
the seasonal stuff and different month lengths override that.

What I noticed was that Jan 2015 the 100 edits count was ahead of Jan
2014, as was every month until June 2015 which was ahead of June 2014 
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm

Could you be more specific re In general I'm not sure the 100+ count is
among the most reliable. What in particular do you think is unreliable
about that metric?

Jonathan



On 23 August 2015 at 14:40, Federico Leva (Nemo) nemow...@gmail.com wrote:

 WereSpielChequers, 15/08/2015 15:12:

 With 8% more editors contributing over 100 edits in June 2015 than in
 June 2014 https://stats.wikimedia.org/EN/TablesWikipediaEN.htm, we
 have now had six consecutive months where this particular metric of the
 core community is looking positive.


 I'm not sure I see this pattern, there aren't even 2 consecutive months of
 month-over-month growth. In general I'm not sure the 100+ count is among
 the most reliable.

 The one (global) pattern I do see is 9 consecutive months of YoY growth at
 https://stats.wikimedia.org/EN/TablesWikimediaAllProjects_AllMonths.htm
 but I still suspect issues with deduplication or bots after the SUL
 finalisation. https://phabricator.wikimedia.org/T87738#1366152

 Would anyone on this list be aware of something that would have
 otherwise thrown that statistic?


 Suspects could perhaps be narrowed down by looking at factors shared by
 en.wiki and it.wiki, as they seem to be the only ones with a small 2015
 recovery in the trend graphs at
 https://meta.wikimedia.org/wiki/Research_talk:Active_editor_spike_2015

 Nemo


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-20 Thread WereSpielChequers
7 minutes is an average, yes?

I would agree that an editor whose hundred edits represents about 700 minutes 
per month would not achieve much more in the same amount of time. But the 
editors who do over a hundred edits a month are significantly skewed towards 
the gnomes and vandal fighters who's editing rate is more like one a minute, 
and at that point saving a couple of seconds per edit becomes more significant. 
So not surprising that this appears to be a power user phenomena and not 
something that your 5 edits per month editor would notice.

The other point is that not all time is equal. Time spent typing, searching is 
one thing, but time waiting for an edit to save is time the system is holding 
you back. So it makes total sense to me that speeding up the save time would 
improve the user experience for wiki gnomes and encourage them to do more. 
Content writers who might only save every half hour would barely notice the 
change unless they are working on larger articles where the speed up in save 
time is greater as it is proportionate to article size. Featured Articles do 
tend to be relatively large.

Regards

Jonathan Cardy


 On 19 Aug 2015, at 23:15, Aaron Halfaker aaron.halfa...@gmail.com wrote:
 
 I feel like I should expand on my skepticism of HHVM as a mechanism for the 
 observed rise in active editors.  
 
 The average edit takes 7 minutes[1,2].  HHVM reduces the time to *save* the 
 edit by a couple seconds.  7 minutes - a couple seconds = ~7 minutes.  So, 
 HHVM doesn't really help you edit substantially faster.
 
 1. Geiger, R. S.,  Halfaker, A. (2013, February). Using edit sessions to 
 measure participation in Wikipedia. In Proceedings of the 2013 conference on 
 Computer supported cooperative work (pp. 861-870). ACM.
 2. Halfaker, A., Keyes, O., Kluver, D., Thebault-Spieker, J., Nguyen, T., 
 Shores, K., ...  Warncke-Wang, M. (2015, May). User Session Identification 
 Based on Strong Regularities in Inter-activity Time. In Proceedings of the 
 24th International Conference on World Wide Web (pp. 410-418). International 
 World Wide Web Conferences Steering Committee.
 
 On Wed, Aug 19, 2015 at 5:08 PM, Aaron Halfaker ahalfa...@wikimedia.org 
 wrote:
 So, I've been digging into this a bit.  Regretfully, I don't have my results 
 written up in a nice, consumable format.  So, you'll need to deal with my 
 worklogs.  See 
 https://meta.wikimedia.org/wiki/Research_talk:Active_editor_spike_2015/Work_log/2015-07-09
 
 TL;DR: It looks like there was a sudden burst in new registrations.  Work by 
 Neil Quinn of the Editing Team suggests that these new registrations were 
 largely the result of changes to the mobile app.  I didn't specifically look 
 at 100+ monthly editors.  That seems like a fine extension of the study.  
 I'd be happy to support someone else to do that work.  I have some datasets 
 that should make it relatively easy. 
 
  If the data is correct, then [HHVM] is likely to be one of the main 
  reasons for the change.
 
 Correlation is not causation.  There's no cause to arrive at this 
 conclusion.  In my limited study of the effects of HHVM on newcomer 
 engagement, I found no meaningful effect.  I think that, before we consider 
 HHVM as a cause of this, we should at least propose a mechanism and look for 
 evidence of that mechanism.  
 
 See 
 https://meta.wikimedia.org/wiki/Research:HHVM_newcomer_engagement_experiment
 
 
 
 On Wed, Aug 19, 2015 at 10:49 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 Most of those editors will have done 33 edits or less using V/E, and some, 
 including me in 4th place, will have been having a look at V/E after the 
 attention it has had recently at Wikimania, on the signpost and on mailing 
 lists. I'm not sure that something that barely involves 10% of a group of 
 editors could have had such a big effect.
 
 More likely and just at the right time, late 2014, Erik Zachte has reminded 
 me that we had a major speed-up with php parser change. 
 
 http://hhvm.com/blog/7205/wikipedia-on-hhvm
 
 If the data is correct, then that is likely to be one of the main reasons 
 for the change.
 
 Regards
 
 Jonathan Cardy
 
 
 On 17 Aug 2015, at 19:11, Jonathan Morgan jmor...@wikimedia.org wrote:
 
 It looks like about 10% of highly active Enwiki editors have used VE in 
 the past month (across all namespaces): 
 http://quarry.wmflabs.org/query/4795
 
 On Mon, Aug 17, 2015 at 8:35 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 On a very non-scientific measure of how few editors currently use V/E, I 
 took some snapshots of the most recent 500 mainspace edits yesterday and 
 was getting circa 1% tagged as visual editor, I've just run two sample 
 this afternoon and the first had not a single edit tagged Visual editor 
 and the other only four, so unless some of those experienced users using 
 V/e have opted out of having their edits tagged V/E, I'm assuming gobs 
 and gobs are either on other language wikis, heavily

Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-19 Thread WereSpielChequers
Most of those editors will have done 33 edits or less using V/E, and some, 
including me in 4th place, will have been having a look at V/E after the 
attention it has had recently at Wikimania, on the signpost and on mailing 
lists. I'm not sure that something that barely involves 10% of a group of 
editors could have had such a big effect.

More likely and just at the right time, late 2014, Erik Zachte has reminded me 
that we had a major speed-up with php parser change. 
 
 http://hhvm.com/blog/7205/wikipedia-on-hhvm

If the data is correct, then that is likely to be one of the main reasons for 
the change.

Regards

Jonathan Cardy


 On 17 Aug 2015, at 19:11, Jonathan Morgan jmor...@wikimedia.org wrote:
 
 It looks like about 10% of highly active Enwiki editors have used VE in the 
 past month (across all namespaces): http://quarry.wmflabs.org/query/4795
 
 On Mon, Aug 17, 2015 at 8:35 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 On a very non-scientific measure of how few editors currently use V/E, I 
 took some snapshots of the most recent 500 mainspace edits yesterday and was 
 getting circa 1% tagged as visual editor, I've just run two sample this 
 afternoon and the first had not a single edit tagged Visual editor and the 
 other only four, so unless some of those experienced users using V/e have 
 opted out of having their edits tagged V/E, I'm assuming gobs and gobs are 
 either on other language wikis, heavily skewed to a time of day I haven't 
 sampled or big in number but still too small a proportion to account for the 
 increase in the number of editors doing 100 edits per month.
 
 On 17 August 2015 at 15:54, Jonathan Morgan jmor...@wikimedia.org wrote:
 There are gobs and gobs* of people using VE. Many of them are experienced 
 editors. 
 
 I'm also interested in looking at VE adoption over time (especially by 
 veteran editors). I'll sniff around and let y'all know if I find anything.
 
 No idea what might be causing the boost in active editor numbers. But it's 
 exciting to see :)
 
 Anyone else have data that bears on these questions? 
 
 - J
 
 *non-scientific estimate drawn from anecdata
 
 On Sat, Aug 15, 2015 at 9:53 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 That's an interesting theory, but are there many people actually using V/E 
 now?
 
 I've just gone back through recent changes looking for people using it, 
 and apart from half a dozen newbies I've welcomed I'm really not seeing 
 many V/E edits.
 
 Looking at the history of Wikipedia:VisualEditor/Feedback the last 500 
 edits go back three months. So apart from the Interior, you and I Kerry 
 I'm not sure there is a huge number of people testing it, and I wasn't 
 testing it in the first 6 months of this year. I did see some research 
 where they were claiming that retention rates for V/E editors were now as 
 good as for people using the classic editor, but I would be surprised if 
 there were enough people using V/E to make a difference to these figures, 
 especially as this is about the editors doing over 100 edits a month.
 
 I agree it would be interesting to track the take-up of the VE (fully or 
 partially) by editor by year of original signup. But I think the long 
 awaited boost from V?E editing is yet to come, if the regulars have 
 started to increase that is likely to be due to something else.
 
 Jonathan
 
 On 15 August 2015 at 15:11, Kerry Raymond kerry.raym...@gmail.com wrote:
 Is there any way of telling what proportion of these 8% appear to be 
 using the Visual Editor either exclusively or partially? It might be 
 interesting to track the take-up of the VE (fully or partially) by editor 
 by year of original signup.
 
  
 
 Kerry
 
  
 
 From: wiki-research-l-boun...@lists.wikimedia.org 
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of 
 WereSpielChequers
 Sent: Saturday, 15 August 2015 11:12 PM
 To: Research into Wikimedia content and communities 
 wiki-research-l@lists.wikimedia.org; The Wikimedia Foundation Research 
 Committee mailing list rco...@lists.wikimedia.org
 Subject: [Wiki-research-l] Has the recent increase in English wikipedia's 
 core community gone beyond a statistical blip?
 
  
 
 Hi,
 
 With 8% more editors contributing over 100 edits in June 2015 than in  
 June 2014, we have now had six consecutive months where this particular 
 metric of the core community is looking positive. One or two months could 
 easily be a statistical blip, especially when you compare calender months 
 that may have 5 weekends in one year and four the next. But 6 months in a 
 row does begin to look like a change in pattern.
 
 As far as caveats go I'm aware of several of the reasons why raw edit 
 count is a suspect measure, but I'm not aware of anything that has come 
 in in this year that would have artificially inflated edit counts and 
 brought more of the under  100 editors into the 100 group.
 
 I know there was a recent speedup, which should increase

Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-18 Thread WereSpielChequers
That is a lot more than I was expecting from my random samples, I was
expecting total V/E edits to be somewhere near 1% of mainspace edits, More
than 10% of the most active editors using it surprises me. But if you go to
100 in that list you find people doing 33 V/E edits in those thirty days -
these are all people who did over a 100 edits in those thirty days, and the
vast majority of them will have done even fewer V/E edits. So it would be
interesting to see what percentage of these people's edits use V/E, if it
is typically 33 then it will be around 3% - probably not enough to be a
major factor in such an increase in edits.

This sample is after all the promotion of V/E at wikimania and subsequently
on mailing lists and signpost.  I would be surprised if as many of these
editors were using V/E in the first 6 months of this year -(I'm 4th on that
list and I don't think I had more than a handful of  V/E edits in the 25
months before this summer's Wikimania) .



On 18 August 2015 at 01:04, Kerry Raymond kerry.raym...@gmail.com wrote:

 I asked her and yes the VE has made a big difference




 https://en.wikipedia.org/wiki/User_talk:Megalibrarygirl#Using_the_Visual_Editor
 (for what I said)

 https://en.wikipedia.org/wiki/User_talk:Kerry_Raymond#Visual_Editor (for
 her reply)



 So, one success story!



 Kerry



 *From:* Kerry Raymond [mailto:kerry.raym...@gmail.com]
 *Sent:* Tuesday, 18 August 2015 9:37 AM
 *To:* 'Research into Wikimedia content and communities' 
 wiki-research-l@lists.wikimedia.org
 *Subject:* RE: [Wiki-research-l] Has the recent increase in English
 wikipedia's core community gone beyond a statistical blip?



 Woo hoo! I’m #9 in the table! But seriously that’s probably less than 10%
 of my edits. For that same group, what percentage of their edits does the
 VE represent? I notice that #1 on the list User:Megalibrarygirl appears
 to be using VE almost exclusively at the present, but started out on the
 source editor. Interestingly I notice that among her recent non-VE edits
 mention adding infoboxes in the edit summary (which is something which is a
 total pain in the VE). This user has also massively increased her number of
 edits recently, might be interesting to know if the VE is a factor in this.
 I will ask her.



 Kerry



 *From:* wiki-research-l-boun...@lists.wikimedia.org [
 mailto:wiki-research-l-boun...@lists.wikimedia.org
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Jonathan
 Morgan
 *Sent:* Tuesday, 18 August 2015 4:11 AM
 *To:* Research into Wikimedia content and communities 
 wiki-research-l@lists.wikimedia.org
 *Subject:* Re: [Wiki-research-l] Has the recent increase in English
 wikipedia's core community gone beyond a statistical blip?



 It looks like about 10% of highly active Enwiki editors have used VE in
 the past month (across all namespaces):
 http://quarry.wmflabs.org/query/4795



 On Mon, Aug 17, 2015 at 8:35 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:

 On a very non-scientific measure of how few editors currently use V/E, I
 took some snapshots of the most recent 500 mainspace edits
 https://en.wikipedia.org/w/index.php?title=Special:RecentChangeslimit=500days=30yesterday
 and was getting circa 1% tagged as visual editor, I've just run two sample
 this afternoon and the first had not a single edit tagged Visual editor and
 the other only four, so unless some of those experienced users using V/e
 have opted out of having their edits tagged V/E, I'm assuming gobs and
 gobs are either on other language wikis, heavily skewed to a time of day I
 haven't sampled or big in number but still too small a proportion to
 account for the increase in the number of editors doing 100 edits per
 month.



 On 17 August 2015 at 15:54, Jonathan Morgan jmor...@wikimedia.org wrote:

 There are gobs and gobs* of people using VE. Many of them are experienced
 editors.



 I'm also interested in looking at VE adoption over time (especially by
 veteran editors). I'll sniff around and let y'all know if I find anything.



 No idea what might be causing the boost in active editor numbers. But it's
 exciting to see :)



 Anyone else have data that bears on these questions?



 - J



 *non-scientific estimate drawn from anecdata



 On Sat, Aug 15, 2015 at 9:53 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:

 That's an interesting theory, but are there many people actually using V/E
 now?

 I've just gone back through recent changes looking for people using it,
 and apart from half a dozen newbies I've welcomed I'm really not seeing
 many V/E edits.

 Looking at the history of Wikipedia:VisualEditor/Feedback
 https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedbackoffset=limit=500action=history
 the last 500 edits go back three months. So apart from the Interior, you
 and I Kerry I'm not sure there is a huge number of people testing it, and I
 wasn't testing it in the first 6 months of this year. I did see some

Re: [Wiki-research-l] Visual Editor experiment might have a problem ...

2015-08-16 Thread WereSpielChequers
Hi Kerry,

there is an experiment going on that randomly opts half if new users into V/E 
and leaves half using the classic editor. That should account for why one of 
your newbies had been opted in but not the other.

Captcha when adding citations is a longstanding problem, we need Captcha on 
account creation to keep the spam bots at bay, but somehow it also applies to 
newbies adding external links as cites, so we have a software feature that 
doesn't effect the vandals but instead targets the best of our newbies. My 
suspicion is that if we could work out when that was introduced and then 
compare it to subsequent recruitment and retention we would find that this was 
one of the most damaging mistakes we've made. 



Regards

Jonathan 


 On 17 Aug 2015, at 04:56, Kerry Raymond kerry.raym...@gmail.com wrote:
 
 I ran my first training session using the Visual Editor this morning and hit 
 what appeared to be a show-shopping bug. It appeared that the two new users 
 (thankfully I had only 2) could not create a citation. They found themselves 
 in an infinite loop of Save Page with Capcha when they tried to create a 
 citation.
  
 By the end of the session, I managed to refine the bug to a combination of 
 “new user”, “new article” (although created by me, not the new users), and 
 citations involving a live URL, duly reported at 
 https://en.wikipedia.org/wiki/Wikipedia:VisualEditor/Feedback#New_users_unable_to_create_citation_with_a_live_external_link_in_it
  
 Ironically it first happened on their newly created User Pages where we were 
 practising our new Wikipedia skills because tackling “real articles”. Then on 
 the “real articles” I had created earlier for them to use (a training 
 approach that has the benefit of not unleashing a horde of angry watchlisters 
 when they make some silly mistake, which occurs if you let new people make 
 their early edits on “popular articles”). (Spot the pattern, both were new 
 articles!).
  
 Now if this had happened to a new user sitting at home, they would have been 
 stymied. Because I was there to hold their hands in a training setting, I 
 found a way around the problem by logging them in as me and we continued the 
 training session on that basis (but not an option to the user sitting at home 
 frustratingly typing in Capcha responses until they got frustrated and walked 
 away).
  
 So, Aaron, it may be that your research on the impact of the VE was impacted 
 by this bug. I imagine that users affected would have eventually aborted the 
 edit as they were unable to save, unless by chance they were able to realise 
 that the problem was caused by their citation and either removed the citation 
 and just saved the text changes. It’s hard to say what the likelihood of a 
 new user being affected is, as the problem seemed to relate to the age of the 
 article (I am autopatrolled so I don’t think the new articles would have any 
 “might be dodgy” status flags on them, but I am not familiar with how that 
 side of things works).
  
 Also, is this experiment (or one similar) currently running? It’s just that 
 when we went into the Preferences of the two new user accounts to enable the 
 VE, one of them already had it enabled (yet I had seen both new user accounts 
 created in front of me a couple of minutes earlier), so there was no 
 possibility that this was anything other than a default setting for one of 
 the two users. I thought enabling the VE was normally strictly opt-in?
  
 Kerry
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-15 Thread WereSpielChequers
Hi,

With 8% more editors contributing over 100 edits in June 2015 than in  June
2014 https://stats.wikimedia.org/EN/TablesWikipediaEN.htm, we have now
had six consecutive months where this particular metric of the core
community is looking positive. One or two months could easily be a
statistical blip, especially when you compare calender months that may have
5 weekends in one year and four the next. But 6 months in a row does begin
to look like a change in pattern.

As far as caveats go I'm aware of several of the reasons why raw edit count
is a suspect measure, but I'm not aware of anything that has come in in
this year that would have artificially inflated edit counts and brought
more of the under  100 editors into the 100 group.

I know there was a recent speedup, which should increase subsequent edit
rates, and one of the edit filters got disabled in June, but neither of
those should be relevant to the Jan-May period.

Would anyone on this list be aware of something that would have otherwise
thrown that statistic?

Otherwise I'm considering submitting something to the Signpost.

Regards

Jonathan
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?

2015-08-15 Thread WereSpielChequers
That's an interesting theory, but are there many people actually using V/E
now?

I've just gone back through recent changes looking for people using it, and
apart from half a dozen newbies I've welcomed I'm really not seeing many
V/E edits.

Looking at the history of Wikipedia:VisualEditor/Feedback
https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedbackoffset=limit=500action=history
the last 500 edits go back three months. So apart from the Interior, you
and I Kerry I'm not sure there is a huge number of people testing it, and I
wasn't testing it in the first 6 months of this year. I did see some
research where they were claiming that retention rates for V/E editors were
now as good as for people using the classic editor, but I would be
surprised if there were enough people using V/E to make a difference to
these figures, especially as this is about the editors doing over 100 edits
a month.

I agree it would be interesting to track the take-up of the VE (fully or
partially) by editor by year of original signup. But I think the long
awaited boost from V?E editing is yet to come, if the regulars have started
to increase that is likely to be due to something else.

Jonathan

On 15 August 2015 at 15:11, Kerry Raymond kerry.raym...@gmail.com wrote:

 Is there any way of telling what proportion of these 8% appear to be using
 the Visual Editor either exclusively or partially? It might be interesting
 to track the take-up of the VE (fully or partially) by editor by year of
 original signup.



 Kerry



 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *
 WereSpielChequers
 *Sent:* Saturday, 15 August 2015 11:12 PM
 *To:* Research into Wikimedia content and communities 
 wiki-research-l@lists.wikimedia.org; The Wikimedia Foundation Research
 Committee mailing list rco...@lists.wikimedia.org
 *Subject:* [Wiki-research-l] Has the recent increase in English
 wikipedia's core community gone beyond a statistical blip?



 Hi,

 With 8% more editors contributing over 100 edits in June 2015 than in
 June 2014 https://stats.wikimedia.org/EN/TablesWikipediaEN.htm, we have
 now had six consecutive months where this particular metric of the core
 community is looking positive. One or two months could easily be a
 statistical blip, especially when you compare calender months that may have
 5 weekends in one year and four the next. But 6 months in a row does begin
 to look like a change in pattern.

 As far as caveats go I'm aware of several of the reasons why raw edit
 count is a suspect measure, but I'm not aware of anything that has come in
 in this year that would have artificially inflated edit counts and brought
 more of the under  100 editors into the 100 group.

 I know there was a recent speedup, which should increase subsequent edit
 rates, and one of the edit filters got disabled in June, but neither of
 those should be relevant to the Jan-May period.

 Would anyone on this list be aware of something that would have otherwise
 thrown that statistic?

 Otherwise I'm considering submitting something to the Signpost.

 Regards

 Jonathan



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Activity Analysis Graphs

2015-07-22 Thread WereSpielChequers
Research into editor retention issues is only a subset of editor retention 
initiatives, so a list in research space on meta is useful, but not a logical 
place to document initiatives that haven't involved research. That may sound 
surprising in this forum, but on Wikipedia there have been lots of initiatives 
that started off because some editors thought they would help editor retention, 
and there may never have been any research into whether they work. The 
interesting side of that is that there are lots of things that have been tried, 
some of which would make interesting research projects.

Regards

Jonathan / WereSpielChequers


 On 22 Jul 2015, at 18:23, Pine W wiki.p...@gmail.com wrote:
 
 Thanks Aaron, others can build on this. Would it be possible to include 
 adding links to this page in the standard procedure for WMF-funded projects 
 (grants, research, tech tools) as they are proposed, approved, updated, or 
 evaluated? I'm not sure who to ask about this since it would require 
 coordination among a variety of departments. Perhaps Luis?
 
 
 Pine
 
 
 On Wed, Jul 22, 2015 at 8:02 AM, Aaron Halfaker aaron.halfa...@gmail.com 
 wrote:
 Cool work Jeph.  Sorry to not stop by the booth Netha.  I'm only now 
 catching up on mailinglist stuff post-wikimania and didn't see the 
 invitation in time. 
 
 I have a start here: 
 https://meta.wikimedia.org/wiki/Research:Editor_retention  Regretfully, I 
 haven't expanded that in a year, but please do feel free to be bold.  I 
 highlighted my own work only because it was easiest for me to summarize at 
 the time and I was in a rust to get a few key stubs together.  Please feel 
 free to expand to other relevant literature.  I'll help as I can manage to 
 schedule the time.  
 
 -Aaron 
 
 On Mon, Jul 20, 2015 at 7:57 PM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 You could start a page on meta with subpages on specific wikis. I think you 
 will find that a devolved system will  work better than trying for a 
 centralised system. People on the English Wikipedia running current schemes 
 or aware of past ones might be willing to log them there, perhaps with a 
 category, but I can't see them doing so on meta, and I doubt other 
 languages will be different. 
 
 On Sunday, 19 July 2015, Pine W wiki.p...@gmail.com wrote:
 OK, perhaps we should have one so that we know what's being tried and what 
 has been tried. I'm not sure who to ask in WMF if they could set up a hub 
 for this kind of work. Aaron, do you know?
 
 Thanks,
 
 
 Pine
 
 
 On Sun, Jul 19, 2015 at 2:13 PM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 No.
 
 There is a wiki project that looks at this and many chapters, as well as 
 I suspect many adhoc things that individual editors do. I know of enough 
 such initiatives to know that there is no single complete list of editor 
 retention initiatives.
 
 Regards
 
 Jonathan
 
 
 On 19 Jul 2015, at 15:03, Pine W wiki.p...@gmail.com wrote:
 
 Interesting. Is there a comprehensive list somewhere of ongoing and 
 planned editor retention initiatives?
 
 Pine
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Activity Analysis Graphs

2015-07-20 Thread WereSpielChequers
You could start a page on meta with subpages on specific wikis. I think you
will find that a devolved system will  work better than trying for a
centralised system. People on the English Wikipedia running current schemes
or aware of past ones might be willing to log them there, perhaps with a
category, but I can't see them doing so on meta, and I doubt other
languages will be different.

On Sunday, 19 July 2015, Pine W wiki.p...@gmail.com wrote:

 OK, perhaps we should have one so that we know what's being tried and what
 has been tried. I'm not sure who to ask in WMF if they could set up a hub
 for this kind of work. Aaron, do you know?

 Thanks,


 Pine


 On Sun, Jul 19, 2015 at 2:13 PM, WereSpielChequers 
 werespielchequ...@gmail.com
 javascript:_e(%7B%7D,'cvml','werespielchequ...@gmail.com'); wrote:

 No.

 There is a wiki project that looks at this and many chapters, as well as
 I suspect many adhoc things that individual editors do. I know of enough
 such initiatives to know that there is no single complete list of editor
 retention initiatives.

 Regards

 Jonathan


 On 19 Jul 2015, at 15:03, Pine W wiki.p...@gmail.com
 javascript:_e(%7B%7D,'cvml','wiki.p...@gmail.com'); wrote:

 Interesting. Is there a comprehensive list somewhere of ongoing and
 planned editor retention initiatives?

 Pine

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Activity Analysis Graphs

2015-07-19 Thread WereSpielChequers
No.

There is a wiki project that looks at this and many chapters, as well as I 
suspect many adhoc things that individual editors do. I know of enough such 
initiatives to know that there is no single complete list of editor retention 
initiatives.

Regards

Jonathan


 On 19 Jul 2015, at 15:03, Pine W wiki.p...@gmail.com wrote:
 
 Interesting. Is there a comprehensive list somewhere of ongoing and planned 
 editor retention initiatives?
 
 Pine
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor Activity Analysis Graphs

2015-07-19 Thread WereSpielChequers
 in a 
 month and
   also the older editors. In fact the older editors contributed more 
 to the
   fall.
- I have not looked specifically at (No of edits in first session
after registration)
- It was [1] that got me working on the graphs :-)

 @WereSpielChequers

- Please send me a screenshot  I'll try to fix it for you.
- If you know the dates when they were introduced we could find
out what effect it had. Could you please add them to

 https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Graphs
or on the talk page.

 There are five different graphs at
 https://cosmiclattes.github.io/wikigraphs/data/en/index.html. The
 explanation for each of them can be found at the bottom of each graph. 
 I've
 generated the graphs for other wikis too 'es', 'de', 'ru' etc. I'll put
 them up as soon as I can.

 1. https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline

 On Wed, Jul 15, 2015 at 4:27 AM, Aaron Halfaker 
 aaron.halfa...@gmail.com
 javascript:_e(%7B%7D,'cvml','aaron.halfa...@gmail.com'); wrote:

 There are a lot of undefined metrics in your methods.  For example,
 what do you mean by canonical definition of edit sessions.  Is it [0]?
 Also, is there something that we learn from this longevity analysis that 
 we
 didn't learn from previous research? E.g. [1] and [2].  One point that I
 think would look into is the engagement measure used in [1] (# of edits 
 in
 first session after registration).  In my work on [1], it looked like 
 this
 stat remained consistent since 2004 and therefor didn't seem to explain 
 the
 drop in newcomer retention.

 0. https://meta.wikimedia.org/wiki/Research:Activity_session
 1. https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline
 2. https://meta.wikimedia.org/wiki/Research:Surviving_new_editor

 -Aaron

 On Tue, Jul 14, 2015 at 2:01 PM, jeph jephp...@gmail.com
 javascript:_e(%7B%7D,'cvml','jephp...@gmail.com'); wrote:

 Hi All,

 I been working on graphs to visualize the entire edit activity of in
 wiki for some time now. I'm documenting all of it at
 https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Graphs
 .

 The graphs can be viewed at
 https://cosmiclattes.github.io/wikigraphs/data/wikis.html.
 Currently only graphs for 'en' have been put up, I'll add the graphs for
 the wikis soon.

 Methodology

- The editors are split into groups based on the month in which
they made their first edit.
- The active edit sessions (value or percentage etc) for the
groups are then plotted as stacked bars or as a matrix. I've used the
canonical definition of an active edit session. The value are + or - 
 .1% of
the values on https://stats.wikimedia.org/

 Selector

- There is a selector on each graph that lets you filter the
data in the graph. On moving the cursor to the left end of the 
 selector you
will get a resize cursor. The selection can then are moved or 
 redrawn.
- In graphs 1,2 the selector filters by percentage.
- In graphs 3,4,5 the selector filters by the age of the cohort.

 Preliminary Finding

- Longevity of editors fell drastically starting Jan 06 and has
since stabilized at levels from Jan 07.

 https://meta.wikimedia.org/wiki/Research:Editor_Behaviour_Analysis_%26_Graphs#Preliminary_Results

 Would you to hear what you guys think of the graphs  any ideas you
 would have for me.

 Jeph



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Netha Hussain
 Student of Medicine and Surgery
 Govt. Medical College, Kozhikode
 Blogs :
 *nethahussain.blogspot.com
 http://nethahussain.blogspot.comswethaambari.wordpress.com
 http://swethaambari.wordpress.com*


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 javascript:_e(%7B%7D,'cvml','Wiki-research-l@lists.wikimedia.org');
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l

Re: [Wiki-research-l] Aidez à améliorer l'exhaustivité de Wikipédia en français

2015-06-26 Thread WereSpielChequers
If I may make one suggestion, have a look at people's language preferences in 
the wikis concerned. My assumption is that if you know two languages well 
enough to translate between them you are unlikely to have opted for a different 
language for system messages. I have edits in lots of different languages, but 
I only understand English and in most of the wikis where I have any edits I 
have set my language preference to English.

I don't object to receiving the email, but it was completely wasted on me. 

Regards

Jonathan 


 On 26 Jun 2015, at 19:40, Leila Zia le...@wikimedia.org wrote:
 
 Hi everyone,
 
 Thank you for your feedback. It's really appreciated. My responses below, all 
 in one-batch to avoid many emails to the list. Sorry if it's too long in 
 advance.
 
 2015-06-25 16:50 GMT-07:00 Samuel Klein meta...@gmail.com:
 This is such a delightful experience.  Whoever is working on translation 
 interfaces and translation research this way: very nicely done indeed.  
 
 Thank you! It's great to hear that you liked it. There are many things we 
 would like to improve about the algorithm and hearing that you like it makes 
 us more motivated. If you have more specific comments, feel free to leave us 
 a comment on the talk page. 
 
 The translation tool is owned by Language Engineering team.  You can read 
 more about it here, though I'm guessing you've already seen that. Sorry if 
 it's repetitive.
 
 On Fri, Jun 26, 2015 at 12:29 AM, Emmanuel Engelhart kel...@kiwix.org 
 wrote:
 [...]
 I have received this kind of email too. No,*this is not delightful at 
 all*. This kind of email bores me, like many other Wikipedians (see 
 https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Le_Bistro_du_jour#Wikimedia_Foundation_se_lance_dans_le_spam).
 
 AFAIK I don't have asked to received that kind of email, and the definition 
 of what you do is spamming (and please don't answer to this by talking 
 about the opt-out option, opt-in is the respectful way of doing). Can 
 you please stop this immediately?
 
 I'm sorry that you received an email when you don't like to receive one. This 
 is not nice and I apologize for that. The opt-out option is available through 
 the email you have received. We will make sure you do not receive any future 
 research related emails if you unsubscribe. The test on French Wikipedia is 
 over now.
 
 The opt-out/opt-in discussion deserves a dedicated effort considering the 
 needs of everyone involved. I'm committed for improving the communications 
 with users regarding research projects and will do what I can on that front.
 
 FYI, the Wikipedia in French has an article evaluation program (like on 
 Wikipedia in English) based on wikiprojects, so honestly I think they 
 already know pretty well where are the weakness without the help of a robot: 
 https://fr.wikipedia.org/wiki/Projet:%C3%89valuation/Index
 
 Thank you for this pointer. 
  
 On Fri, Jun 26, 2015 at 12:42 AM, Jane Darnell jane...@gmail.com wrote:
 Interesting viewpoint, Emmanuel! I am always fascinated to know what others 
 think I might be interested in, even if the other is just a bot. Like Sam 
 I was delighted, and I might even be prompted to do a translation (though 
 not one of the ones they suggested, but an article which I made myself and 
 is in the same general area). I disagree by the way, that each Wikipedia has 
 to decide on their own what is encyclopedia worthy in that language. I 
 think the projects need to start trusting each other more and be open to 
 *aggressive* translation efforts as a way to educate new (multi-lingual) 
 editors, and also to promote a neutral point of view. Let's wikibomb 
 everybody aggressively with translation suggestions!
 
 Jane, thank you for your comment. We're happy that you welcomed receiving 
 such recommendations. For the purposes of this research, we are taking the 
 following approach: we take a more global approach to identify missing 
 content, rank them by their importance, and recommend them to editors. The 
 editor should make the final call whether the recommendation they receive 
 should go to the destination language. Ideally, we want to loop back editors' 
 expertise and feedback to the algorithm, i.e., if you as an editor think a 
 recommendation is not useful in a language, we should be able to collect that 
 information from you, feed it to the algorithm, and let the algorithm learn. 
 This needs to happen down the road (hopefully not too far down) for the 
 algorithm to be able to serve the needs of each language and community. 
 
 On Fri, Jun 26, 2015 at 1:29 AM, Magnus Manske magnusman...@googlemail.com 
 wrote:
 I still wonder what made a bot think I speak French? Surely, a few minor 
 edits on fr.wp can't be the trigger?
 (well, I had two years at school, but I barely remember enough to identify 
 the language...)
  
 I'am copying from here:
 
 We determine which editors are suitable for receiving recommendations for 
 translating from the source 

Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship

2015-06-24 Thread WereSpielChequers
Dear Kerry,

Though the vast majority of my edits are precisely the sort of minor 
housekeeping edits that you describe, I agree with almost all that you say. But 
would make three little observations. 

1 the solution to the edit conflict problem is to fix the software so we have 
fewer edit conflicts. It wouldn't be a big change to have the software treat 
categories and project tags as their own sections and not reject newbies edits 
as conflicts with the taggers and the categorisers. When you are training 
newbies you can minimise these problems by getting them to start articles in 
sandboxes and to create sections. But the solution is to get a high priority 
for various low priority and won't fix bugs on phabricator that would reduce 
edit conflicts. For the research community the big opportunity is to do 
research on edit conflicts, if the research showed that they are as I believe 
the biggest biter of good faith newbies then there is a good chance that some 
programming resource could be allocated to them. If the research showed that 
they are not significant and that projects like AFT, Visual Editor, liquid 
threads, flow and the media wiki viewer really were a better investment for the 
WMF than reducing edit conflicts, then I will be astonished, and the WMF 
somewhat vindicated.

2 don't take the editors have been in decline since 2006/7 too seriously. 
These are raw figures on edits, they don't take account of the edit filters 
which during that era lost us most of our vandalism and with it the vandal 
reversion, vandal warnings, aiv reports and block messages that were generated 
in response. Nor do they allow for the migration to wikidata of things like 
intrawiki links. The truth is I'm pretty sure no-one has meaningful figures for 
community size in that era. 

3 project tagging even for currently dormant projects shouldn't cause edit 
conflicts on articles as the tags go on talk pages. Whether project tagging has 
use or not depends on your attitude about the health of the community. If we 
are experiencing uniform and irreversible decline with a dwindling band of 
editors who aren't changing their editing interests and no new recruits then I 
could see the argument that once a wiki project has become moribund it won't 
revive. If however we are broadly stable but with a steady in flow of new 
editors, then I would see dormant wiki projects as an opportunity for newish 
editors to take on a role within the community. Again, somebody could earn a 
doctorate studying this.

Regards

Jonathan


 On 23 Jun 2015, at 22:44, Kerry Raymond kerry.raym...@gmail.com wrote:
 
 Given what we that active editors have been declining since about 2006, I 
 have to wonder if a 2015 study would produce very different results from the 
 earlier period.
  
 From an entirely anecdotal perspective, I do observe that there is a lot of 
 “housekeeping” edits that go on. I create a lot of new articles and would 
 characterise my own editing as writing a lot of new content in new and 
 existing articles; this is my primary interest. However, I am both amused and 
 annoyed at the way that within moments of my edit, there can be a rash of 
 people wanting to add project tags, add esoteric categories that I cannot 
 imagine being used for navigation by real readers, replace a dash of one 
 length with a dash of another length, remove the word “comprised” (one of the 
 most annoying!), and so on. Many of these folks have massive edit counts and 
 appear (from a quick look at the last screen of recent contributions) to 
 devote themselves entirely to this kind of editing. Indeed, I go so far as to 
 say many suffer from editcountitis, a condition that often can be diagnosed 
 by the User page being largely devoted to reporting on their number of edits J
  
 IMHO, I would have to say that the value-add of these housekeeping edits is 
 mixed. Some are genuinely useful (people pick up mistakes I’ve made) or add 
 categories I am unaware of that are relevant to the topic. Some are useful if 
 you happen to believe the reader experience is genuinely improved by rigid 
 adherence the Manual of Style (I would be interested in a study on how 
 important the consistency of the use of various-length dashes and other MoS 
 detail is to the reader experience). Some like project tagging appear to be 
 utterly pointless as most of the projects involved are moribund. Other than 
 meeting some deep need to “mark your territory” like a dog (or get your edit 
 count up), what earthly point is there to project tagging unless the project 
 has some active processes to improve articles? Some are just annoying (like 
 the user who dislikes the word “comprised”) and many of these people create 
 edit conflicts for me as I add further content which is ing annoying. 
 Edit conflicts is a particular problem trying to do your second/third edit to 
 a new article, as new articles attract housekeeping edits like vultures to a 
 carcass. The 

Re: [Wiki-research-l] Community health (retitled thread)

2015-06-05 Thread WereSpielChequers
Yes, but may I also point out that one of our biggest problems on EN wiki is 
that even good faith newbies will often have their edits reverted. If you add 
uncited facts to a page you are now much more likely to have your edit reverted 
than to have someone add citation needed so I would suggest a metric that 
includes persistence v reversion of edits that are not vandalism.

Another issue worth measuring is the number of edit conflicts and the frequency 
that having an edit conflict triggers a newbies departure. This would require 
WMF help as I don't think that edit conflicts are publicly logged. But some 
research on this might resolve the divide between those who consider this a 
minor issue deserving only the lowest priority at bugzilla, and those such as 
myself who suspect this is one of the most toxic features of the pedia and 
reducing edit conflicts the easiest major improvement that could be made.

By contrast commons is a relatively lonely place. From my experience you can do 
hundreds of thousands of edits there without ever needing to archive your 
talkpage. It would be interesting to see some community health metrics that 
looked at how many interactions people have with other editors, whether thanks 
or talkpage messages. My suspicion is that editor retention will vary by 
interaction level, and there will be a sweet spot which is best for retention, 
above this interaction level some people finding things distracting, and below 
this level people leave  because they feel ignored.

Another metric, and probably one best derived from polling organisations who 
survey the general public would be to identify how many of our readers would 
fix an error if they spotted it. One of the arguments that our perceived 
decline in editor recruitment is a cost of quality is the theory that readers 
who are willing to fix obvious errors are finding fewer errors per hour of 
reading Wikipedia. I know that casual readers are less likely to spot typos and 
vandalism than they were a few years ago, but  I'm not sure the best way to 
measure this phenomenon

Regards

Jonathan Cardy


 On 5 Jun 2015, at 02:27, Stuart A. Yeates syea...@gmail.com wrote:
 
 
  Here's a list of possible metrics that we could use for measuring community 
  health. 
 
 That's a great list, with some great metrics. I'd be included to add some 
 silo-breaking metrics which measure activity across projects or across silos 
 within projects:
 
 * Number of editors with actions/edits on more than N wikis (N=2, N=3, etc)
 * Number of editors with actions/edits on more than N namespaces on the same 
 wiki (N=2, N=3, etc)
 ...
 
 cheers
 stuart
 
 
 --
 ...let us be heard from red core to black sky
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: Traffic to the portal from Zero providers

2015-05-06 Thread WereSpielChequers
When a reader comes to Wikipedia from the web we can detect their IP address 
and that usually geolocates them to a country. More often than not that then 
tells you the dominant language of that country.

If we were to default to official or dominant languages then I predict endless 
arguments as to which language(s) should be the default in which countries. The 
large expat community in some parts of the Arab world might prefer English over 
Arabic. India would want to do things by state, and a whole new front would 
emerge in the Israeli Palestine debate. 

Regards

Jonathan Cardy


 On 7 May 2015, at 05:06, Sam Katz smk...@gmail.com wrote:
 
 hey guys, you can't guess geolocation, because occasionally you'd be
 wrong. this happens to me all the time. I want to read a site in
 spanish... and then it thinks I'm in Latin America, when I'm not.
 
 --Sam
 
 On Wed, May 6, 2015 at 10:07 PM, Oliver Keyes oke...@wikimedia.org wrote:
 Possibly. But that sounds potentially wooly and sometimes inaccurate.
 
 When a browser makes a web request, it sends a header called the
 accept_language header
 (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Accept-Language)
 which indicates what languages the browser finds ideal - i.e., what
 languages the user and system are using.
 
 If we're going to make modifications here (I hope we will. But again;
 early days) I don't see a good argument for using geolocation, which
 is, as you've noted, flawed without substantial time and energy being
 applied to map those countries to probable languages. The data the
 browser already sends to the server contains the /certain/ languages.
 We can just use that.
 
 On 6 May 2015 at 22:50, Stuart A. Yeates syea...@gmail.com wrote:
 This seems like a great place to use analytics data, for each division
 in the geo-location classification, rank each of the languages by
 usage and present the top N as likely candidates (+ browser settings)
 when we need the user to pick a language.
 
 cheers
 stuart
 --
 ...let us be heard from red core to black sky
 
 
 On Thu, May 7, 2015 at 2:24 PM, Mark J. Nelson m...@anadrome.org wrote:
 
 Stuart A. Yeates syea...@gmail.com writes:
 
 Reading that excellent presentation, the thought that struck me was:
 
 If I wanted to subvert the assumption that Wikipedia == en.wiki,
 linking to http://www.wikipedia.org/ is what I'd do.
 
 A smarter http://www.wikipedia.org/ might guess geo-location and thus
 local languages.
 
 I'd also like to see something smarter done at the main page, but the
 and thus bit here is notoriously tricky.
 
 For example most geolocation-based things, like Wikidata by default,
 tend to produce funny results in Denmark. A Copenhagener is offered
 something like this choice, in order:
 
 * Danish, Greelandic, Faroese, Swedish, German, ...
 
 The reasoning here is that Danish, Greenlandic, and Faroese are official
 languages of the Danish Realm, which includes both Denmark proper, and
 two autonomous territories, Greeland and the Faroe Islands. And then
 Sweden and Germany are the two neighboring countries.
 
 But for the average Copenhagener, the following order is far more
 likely:
 
 * Danish, English, Norwegian Bokmål, ...
 
 The reason here is that Norwegian Bokmål is very close to Danish in
 written form (more than Swedish is, and especially more than Faroese is)
 while English is a widely used semi-official language in business,
 government, and education (for example about half of university theses
 are now written in English, and several major companies use it as their
 official workplace language).
 
 I think it's possible to come up with something that better aligns with
 readers' actual preferences, but it's not easy!
 
 -Mark
 
 --
 Mark J. Nelson
 Anadrome Research
 http://www.kmjn.org
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Grant Proposal: Request for Feedback (Response to Aaron Shaw)

2015-04-12 Thread WereSpielChequers
Dear Christina,

1 are you defining your super editors by total or recent edits? Whilst we have 
pretty good editor retention amongst high edit count editors, even amongst 
those with over a 100,000 edits there are inactive and semi active editors.

2 how are you going to ensure that talkpage invites are only responded to by 
the targeted editors? 

3 have you considered emailing your survey? Yes that loses you at least the 30% 
who haven't set an email, but you are much more likely to get your responses 
from the intended target group, also it is quite an effective way to contact 
the inactive and former editors who might not see a talkpage note.

4 What are you going to do to avoid trying to survey deceased Wikipedians? 
Especially with talkpage notes.

5 how does one make requests to add other questions to your survey?

6 you mention using census categories to ask the ethnicity question, may one 
ask whose census, Australia, Canada, India, the UK or the USA? Also are you 
intending to replicate the census questions or base your questions literally on 
the census categories generated from those questions?

Regards

Jonathan Cardy


 On 12 Apr 2015, at 20:49, Christina Shane-Simpson 
 christinam.sh...@gmail.com wrote:
 
 Hello Aaron and Other Wiki Researchers,
 
 Thank you for responding so quickly and thoroughly to my recent proposal!  
 Many of your concerns align with issues I’ve been discussing with my research 
 team, so I’m glad to hear that we’re overlapping in that sense.  Apologies in 
 advance for the length of the following:
 
 - - Sampling:  I completely agree with your concerns in response to 
 the (relatively) recent revisit to the original Gender Gap results.  As an 
 exploratory study, I don’t think we could accurately represent the entire 
 Wikipedia community or make causal inferences about the community as a whole 
 due to the voluntary nature of the survey and the potential for inaccuracies 
 in self-reporting.  However, I’m hoping that this preliminary project could 
 reveal a few new patterns that might be explored in greater depth at a later 
 date.
 
 Based on the Wikipedia editor rankings, I’d planned to pull the top 20% of 
 editors and post on their Talk Pages, giving us the “super-editor” sample.  
 Since the two remaining samples are more difficult to recruit, I’m currently 
 exploring the most effective way to obtain a randomized sample of the active 
 (moderate) and inactive editors (infrequent edits) – this will likely be 
 developed with the assistance of someone more skilled in programming than 
 myself.  I’ve also been speaking with a statistician about alternative 
 methods, beyond propensity-matching, where we might account for response 
 biases that are likely to occur.  However, I’d be very open to suggestions 
 from this community about effectively sampling from Wikipedia and methods 
 you’ve used to account for biases common in these surveys.  
 
 - - Self-Report Measures of Edit History:  This would only serve to 
 verify the editor ranking and provide a more thorough context by which the 
 editor feels he/she makes contributions to the Wikipedia community.  Since 
 we’ll have usernames – via Talk Pages – as you suggested, I’d like to explore 
 actual editing behaviors given that we’d have the resources to do so.
 
 -- Collaboration:  Participant fatigue is a huge concern with all of 
 these online surveys targeting active editors.  I believe you’re correct that 
 the WMF is planning another editor survey, but I had hoped to provide some 
 foundation for other themes that might be explored in these larger surveys.  
 The prior WMF surveys didn’t provide as much depth as we might need to reveal 
 any patterns in editing behaviors.  I’ve also reached out to a couple of 
 other proposals, with similar interests, to determine whether we can 
 compliment each other’s efforts.  I think these types of collaborations are 
 very do-able and may help us to limit the frequency of Wikipedia editor 
 surveys.
 
 - Missing Measures and People:  I was able to access your article, so 
 thank you for linking it!  I’ve been reviewing the literature to clarify 
 variables (such as the web use you identify) to determine which should be 
 included in the survey.  In order to keep the survey at a reasonable length, 
 I’d hoped to capture some of these editing barriers via themes captured in 
 the open-ended responses.  This might be particularly relevant in the context 
 of editors’ perceived barriers, which might vary based on the aforementioned 
 traits.  However, I agree that the study would likely benefit form some 
 further questioning about editing experiences and I’ll be adding this into 
 the proposal.
 
   - Missing People and Sampling:  Your main concern also parallels the 
 concerns of my research team.  I’ve been speaking with my team about 
 potentially recruiting a passive Wikipedia user sample that would serve as a 
 comparison.  It 

Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: [Gendergap] Wikipedia readers

2015-02-19 Thread WereSpielChequers
Dear Claudia,

As I understand it the evidence for the Gendergap being real includes:

Usernames chosen by people creating accounts
Survey responses
Gender choices in user preferences
Attendees at events
Subject preferences among editors
In languages where you can't make talk page comments without disclosing your 
gender, the gender people disclose
Discussions amongst editors by email and other online methods
Applications for reference resources.

Some of these are more independent of each other than others, the last two are 
personal experience rather than anything statistically valid. But it is 
interesting when personal experience is in accord with research.

The only exceptions that I am aware of are where we deliberately target women 
such as through gender gap events, and I've heard that campus ambassadors are 
more gender balanced. 

I don't dispute that there is a gender gap in the community, that the gender 
gap is greater amongst established editors than among newbies. As for other 
genders and whether we have put too much weight on the male/female ratio, it is 
a big glaring difference and when the debate about gender gap started several 
years ago now other ratios such as straight v gay didnt seem out of kilter. 
Since then there has been at least one mistake by ARBCOM and I suspect that the 
community isn't as Gay tolerant as I thought it was a few years back, so if 
someone is looking for a research topic it would be useful to know if the 
community's ratio of gay to straight members is changing over time.



Regards

Jonathan Cardy


 On 18 Feb 2015, at 11:23, koltzenb...@w4w.net wrote:
 
 Hi Jonathan Cardy and all, (see below for some software issues)
 
 I agree with your argument, WereSpielChequers/ Jonathan Cardy, and I would 
 like to hear more details about
 many pieces of evidence
 since these, I am told, usually form a good basis for hypotheses that might 
 be used in qualitative studies. It seems to me that my attempt at starting 
 thought experiment I quote a few lines from here
 https://lists.wikimedia.org/pipermail/wiki-research-l/2015-
 February/004188.html
 might have produced similar data; or might be restarted in a different 
 setting, maybe
 
 btw, my apologies, and thank you for your clarification. Actually, I did not 
 intend to quote the statement in any personal attribution kind of way, but 
 for 
 a reversal experiment of the wording. 
 I was assembling a few bits and pieces from different parts of different 
 threads, and this was my way of making sure people would find the context 
 again if they chose to; next time, I will try to look for a different method 
 of 
 presenting material for any language games.
 
 re the Wikipedia community I'd say that since it constitutes itself in 
 adhoc 
 teams, every user is a member, even if only for one edit or just by adding a 
 fe 
 pages to a watchlist after registration -- irrespective of the number of 
 accounts the person behind a login name might be using to join the game 
 board Wikipedia. From my point of view, there simply is a large variety in 
 how people use any of the functions (or a combination of them) that the 
 software of the platform offers -- and any and all use cases contribute to 
 what 
 makes the Wikipedia community. I do not have any romantic inclinations 
 here. If it is an open system it is an open system for all use cases and 
 their 
 inventors, be they acting adhoc way or in a kind of more systematic gaming -
 - that one might have to regard as systemic after all.
 
 so if mediawiki enables users to behave like bullies, my question would be: 
 does anyone have any insights as to the chances of changing the software to 
 make Wikipedia a less welcoming place to users behaving like bullies? 
 or would most experts currently say that mediawiki software does not have 
 anything to do with it ;-) ?
 
 best,
 Claudia
 
 -- Original Message ---
 From:WereSpielChequers werespielchequ...@gmail.com
 To:Research into Wikimedia content and communities wiki-research-
 l...@lists.wikimedia.org
 Sent:Tue, 17 Feb 2015 20:52:10 +
 Subject:Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: 
 [Gendergap] Wikipedia readers
 
 My comment It could even test the theory that the 
 community is more abrasive towards women. We know 
 that we are less successful at recruiting female 
 editors than male ones, I'm not sure if we have 
 tested whether we are more successful at retaining 
 established male editors than female ones, and if 
 so whether we are losing women because they are 
 lured away or driven away. Seems to have been 
 shortened to me saying that the community is more 
 abrasive towards women.  Before people continue 
 using that quotation and attributing it to me, may 
 I point out that I regard it as an interesting 
 theory worth researching, not as a proven 
 statement. I don't doubt that we have a massively 
 male skew in the community, I have seen too many 
 pieces

Re: [Wiki-research-l] types of research Re: a cautious note on genderstats Re: Fwd: [Gendergap] Wikipedia readers

2015-02-17 Thread WereSpielChequers
This might appear to some to be getting a little off topic for this list, but 
if you are beginning to think that of this thread I would plead for a little 
indulgence, and for people to approach this thread from the angle of how can we 
form research projects around this. Like many people I regard the dark side of 
the community as a legitimate topic for research and I would point out that the 
foundation is offering grant funds for projects targeted at the gender gap.

My reversal of Kerry's statement would be more like:

I think if we can make Wikipedia less attractive to bullies, I rather suspect 
we make it a more attractive place for everyone else.

Since we don't know how to do this (yes there are some easy part solutions out 
there, but no magic bullets, certainly none that wouldn't have troubling side 
effects) there is an opportunity for researchers to make some innovative 
proposals.

Regards

Jonathan Cardy


 On 17 Feb 2015, at 08:20, koltzenb...@w4w.net wrote:
 
 (disclaimer: research-wise, in this thread, I am speaking from a margin 
 position in a role maybe similar to the one Shakespeare potrays his fools in, 
 because it is not my field and I only have a rather vague idea of how people 
 actually undertake such studies)
 
 re 
 I think if we can make Wikipedia more attractive 
 to women, I rather suspect we make it a more 
 attractive place for everyone.
 
 what about yet another reversal game and see what happens:
 
 this would be Kerry's statement from another perspective:
 I think if we can make Wikipedia less attractive 
 to men, I rather suspect we make it a more 
 attractive place for everyone.
 
 what kind of reseach design would be needed for this?
 
 best,
 Claudia
 
 -- Original Message ---
 From:Kerry Raymond kerry.raym...@gmail.com
 To:'Research into Wikimedia content and communities' wiki-research-
 l...@lists.wikimedia.org
 Sent:Tue, 17 Feb 2015 17:59:35 +1000
 Subject:Re: [Wiki-research-l] types of research Re: a cautious note on 
 genderstats Re: Fwd: [Gendergap] Wikipedia readers
 
 I agree the issues are not necessarily about male-
 female interactions. It may be about bully-victim 
 interactions. I often suspect we are seeing an 
 online form of
 
 https://en.wikipedia.org/wiki/Stanford_prison_experiment
 
 playing out, where anyone can choose to be the 
 prison guard enforcing the rules (of which we have 
 plenty) taking advantage of the lack of real-world 
 accountability (thanks to pseudonymity).
 
 However, in terms of any kind of metric to measure 
 progress, I think measuring Male/Female/DontKnow 
 is a lot more viable than trying to count the 
 number of bullies and victims (or powerful vs less 
 powerful).
 
 I think if we can make Wikipedia more attractive 
 to women, I rather suspect we make it a more 
 attractive place for everyone.
 
 Kerry
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-
 research-l
 --- End of Original Message ---
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: [Gendergap] Wikipedia readers

2015-02-17 Thread WereSpielChequers
 WereSpielChequers, Kerry, Aaron and all,
 
 WereSpielChequers wrote:
 the community is more abrasive towards women
 
 this may be stats expert discourse, but let me show you how the question
 itself has a gendered slant.
 imagine what would happen - also in your research design - if it read: the
 community is less abrasive towards men - how does this compare to the
 first question re who are the community?
 
 and again, re phasing ten years in 2011 and four years on, which language
 version(s) are hypotheses based on?
 
 Kerry wrote:
 But I would agree that if an organisation sets a target (25% women in this
 particular case) and then does not put in place a means of measuring the
 progress against that target, one has to question the point of establishing a
 target.
 
 I think one has to question the point of not putting in place a means of
 measuring the progress...
 and also ask why, if the issue is a high priority (allegedly, one might add, 
 in
 speeches at meetings, in interviews with the press...) this organisation does
 not fund any top level research... - or does it?
 
 Aaron wrote:
 higher quality survey data
 well, and how does one recognize low quality and how come it is so low?
 and quality by whose epistemological aims and standards?
 
 causes and mechanisms that drive the gender gap (and related
 participation gaps)
 which related participation gaps do you have in mind here?
 where would these gaps be situated in terms of areas of participation?
 and, again, in which language version(s)?
 
 best,
 Claudia
 
 -- Original Message ---
 From:aaron shaw aarons...@northwestern.edu
 To:Research into Wikimedia content and communities wiki-research-
 l...@lists.wikimedia.org
 Sent:Mon, 16 Feb 2015 20:50:17 -0800
 Subject:Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd:
 [Gendergap] Wikipedia readers
 
  Hi all!
 
  Thanks, Jeremy  Dariusz for following up.
 
  On Mon, Feb 16, 2015 at 5:58 AM, Dariusz
  Jemielniak dar...@alk.edu.pl wrote:
 
   As far as I recall, they did a follow-up on this topic, and maybe a
   publication coming up?
 
  Sadly, no follow ups at the moment.
 
  If we want to have a more precise sense of the
  demographics of participants the biggest need in
  this space is simply higher quality survey data.
  My paper with Mako has a lot of detail about why
  the 2008 editor survey (and all subsequent editor
  surveys, to my knowledge) has some profound limitations.
 
  The identification and estimation of the effects
  of particular causes and mechanisms that drive the
  gender gap (and related participation gaps)
   presents an even tougher challenge for
  researchers and is an area of active inquiry.
 
  all the best,
  Aaron
 --- End of Original Message ---
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: [Gendergap] Wikipedia readers

2015-02-15 Thread WereSpielChequers
In 2011 the project was only ten years old, four more years is time for big 
changes to have occurred. Changes we know something about include the 
repercussions of the transition from manual vandal fighting to predominately 
automated vandalism rejection. This may have had more  subtle implications than 
the obvious one of the reduction in raw edit count. In 2011 we had an admin 
cadre still dominated by admins appointed in the era when good vandal fighter 
was sufficient qualification to pass RFA. Four years on the admin corps has 
changed by not changing. Roughly a fifth of our remaining admins have been 
appointed in the last four years, but through a process with a very different 
de facto criteria than before, and of course the vast majority of our admins 
are now four years older than in 2011. If the theory is true that vandal 
fighting was very attractive to teenage boys, then in 2011 our youngest admins 
might still not have been legally adult. Nowadays I doubt if we have many 
admins who are undergraduates.

Sometimes the dialogue within the movement can look like a bunch of  over 
confident thirty something's talking at a bunch of grey beards who they think 
are adolescents and who think they  are being hectored by young pups straight 
out of college. An editor survey would test theories such as the greying of the 
pedia, and as with any occasion when one has ones first look in the mirror 
after a long gap, it would tell us much about ourselves.

Another reason for doing another editor survey, and indeed a former editor's 
survey, is that  some of us have been trying to fix the Gendergap for years, it 
would be nice to see if our efforts  have had any impact. It could even test 
the theory that the community is more abrasive towards women. We know that we 
are less successful at recruiting female editors than male ones, I'm not sure 
if we have tested whether we are more successful at retaining established male 
editors than female ones, and if so whether we are losing women because they 
are lured away or driven away.

Regards

Jonathan Cardy


 On 15 Feb 2015, at 08:34, koltzenb...@w4w.net wrote:
 
 ah, thanks, GerardM,
 
 so -- if I read your reaction correctly -- the underlying hypothesis on which 
 it 
 is based says that much has changed (or may have) since those old days? 
 What information do you base this hypothesis on?
 
 my main point, anyway, is to cast a doubt as to the methods used in such 
 statistical work and interpretation of the outcome, any comments on that?
 
 see also Clearly, we need to measure some things, but we also need to be 
 highly skeptical of what we choose to measure, how we do so, and what we 
 do with the resulting data. Joseph M. Reagle Jr. (17 December 2014), 
 Measure, manage, manipulate, 
 http://reagle.org/joseph/pelican/social/measure-manage-manipulate.html
 
 best,
 Claudia
 koltzenb...@w4w.net
 My GPG-Key-ID: DDD21523
 -- Original Message ---
 From:Gerard Meijssen gerard.meijs...@gmail.com
 To:Research into Wikimedia content and communities wiki-research-
 l...@lists.wikimedia.org
 Sent:Sun, 15 Feb 2015 08:05:24 +0100
 Subject:Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd: 
 [Gendergap] Wikipedia readers
 
 Hoi,
 Obviously I know. My point is that when we talk 
 about diversity, it is because it was recognised 
 as a problem ... When papers of 2011 are quoted in 
 2015 when diversity is mentioned, it does not give 
 us a clue if the problem is as bad, worse or very 
 much improved. Consequently it is very much beside 
 the point. Thanks,   GerardM
 
 On 15 February 2015 at 07:48,
 koltzenb...@w4w.net wrote:
 
 Hi GerardM,
 
 why not have a guess ;-)
 
 Claudia
 -- Original Message ---
 From:Gerard Meijssen gerard.meijs...@gmail.com
 To:Research into Wikimedia content and communities wiki-research-
 l...@lists.wikimedia.org
 Sent:Sat, 14 Feb 2015 18:42:08 +0100
 Subject:Re: [Wiki-research-l] a cautious note on gender stats Re: Fwd:
 [Gendergap] Wikipedia readers
 
 Hoi,
 What year are we living ?
 Thanks,
 GerardM
 
 On 14 February 2015 at 17:24,
 koltzenb...@w4w.net wrote:
 
 my2cents re figures on percentages (... in a gender binary
 paradigm),
 well...
 
 I'd suggest to take into account User:Pundit's thoughtful
 considerations,
 
 author of: Jemielniak, Dariusz (2014), Common knowledge? An
 ethnography
 of Wikipedia, Stanford University Press, pp. 14-15
 
 Dariusz Jemielniak writes:
 According to Wikipedia Editors Study, published in 2011, 91
 percent of
 all Wikipedia editors are male ([reference to a study of 2011] This
 figure
 may not be accurate, since it is based on a voluntary online survey
 advertised to 31,699 registered users and resulting on 5,073
 complete
 and
 valid responses [...] it is possible that male editors are more likely
 to
 respond than female editors. Similarly, a study of self-declarations
 of
 gender showing only 16 percent are female editors (Lam et al. 2011)
 may 

Re: [Wiki-research-l] preelminary results from the Wikipedia Gender Inequality Index project - comments welcome

2015-01-12 Thread WereSpielChequers
I have spent quite a bit of time at new page patrol over the years. My 
suspicion is that many if not most of the people who create articles on newly 
signed pop stars and actors are from their management agency rather than fans, 
especially if they seem too early in their career to have fans. Sportspeople I 
suggest are more likely to be written about by fans, especially if they have 
been signed by a major team, or more importantly for Wikipedia a team with an 
actively editing fan.

On this theory the quality of articles, the number of edits, and when we had 
the Article Feedback Tool the number of is hot type comments would be a good 
indication of interest from the volunteer editing community. But article 
creation is in part a matter of the policy of the relevant talent agencies.

Sorry if that sounds overly cynical, perhaps if it were possible one would 
filter out the articles that get scarcely any views and then look at the gender 
balance of articles that are of interest to our audience as well as our 
editors.  

Regards

Jonathan Cardy


 On 11 Jan 2015, at 22:23, h hant...@gmail.com wrote:
 
 Hello Piotr and Gerard, 
 
 I think a competing hypothesis would be male gaze. That is to say, the 
 more female representation is not about a culture (defined as national, 
 ethnic, linguistic or regional, not macho/feminine), but rather a 
 gender-interest bias. Thus the more female representation could mean more 
 male dominant culture, which is against the theoretical assumption of  
 Piotr's research. 
 
 Note that East Asian Wikipedians that I know, especially those who edit 
 Chinese Wikipedia, are predominantly very young. Some of them can be highly 
 interested in opposite sex.
 
 Check the following category pages as examples:
 (1a) Female actresses of every countries in the world
 http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9C%8B%E5%A5%B3%E6%BC%94%E5%93%A1
 (1b) Male actresses of every countries in the world
 http://zh.wikipedia.org/wiki/Category:%E5%90%84%E5%9B%BD%E7%94%B7%E6%BC%94%E5%91%98
 
 (2a) Female Japanese AV (i.e. porn) actresses
 http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E5%A5%B3%E5%84%AA
 (2b) Male Japanese AV (i.e. porn) actresses
 http://zh.wikipedia.org/w/index.php?title=Category:%E6%97%A5%E6%9C%ACAV%E7%94%B7%E5%84%AA
 
 It is quiet clear that the male gaze hypothesis seems to apply here. More 
 female presentation simply because they are there to be consumed by men or 
 boys.
 
 So one of my suggestions for research is to select a few professional 
 categories that are of interest (say, politicians, poets, entertainers, etc.) 
 to do some cross-tab analysis. 
 
 Thus, I will be extremely cautious against using the current 
 metrics/methods as viable gender inequality index. 
 
 As a proponent of data normalization and geographic normalization 
 method myself, I would distinguish two sets of comparisons: one is 
 cross-country or cross-language version absolute value comparison, another is 
 cross-country or cross-language version normalized value comparison. By 
 geographic normalization, I mean that researchers must gather another set of 
 cross-country or cross-language datasets that captures some aspects of 
 realities external to Wikipedia. In this case, I would say the Wikipedia 
 represented politicians' gender ratio against the offline gender ratio of 
 politicians. In other words, data normalization allows researchers to 
 compare which language version are more or less (and how much) equal than the 
 corresponding offline societies.
 
 BTW, the methods you develop to extract gender from biography articles 
 for large-scale analysis may also be re-purpose to study other dimensions. 
 One dimension that will interest me would be nationality. It will be 
 interesting to see the coverage, focus or bias of a language version on 
 people based on nationalities. Age might be another one.
 
 Best,
 han-teng liao
 
 
 
 2015-01-11 19:01 GMT+02:00 Gerard Meijssen gerard.meijs...@gmail.com:
 Hoi,
 Having read it, I find it is still very much a Wikipedia oriented.It makes 
 use of the toolset by Markus. That is fine. the notion of diversity and 
 notability is also very much culturally defined. It would be nice to know 
 how the different wikipedias accept notability of people from other cultures 
 and if it impacts the diversity of their own articles. 
 
 I have found that many people do not have an article in the languages of 
 their own cultures. Often it has to do with an interest in a domain that is 
 more of relevance to the other culture. 
 
 Diversity is very much part of a domain; in Roman Catholicism male dominance 
 is obvious. I am curious if diversity in gender is affected by such 
 considerations and if items with a single article are more in line with what 
 is the norm for a culture, a domain.
 Thanks,
  GerardM
 
 On 10 January 2015 at 11:51, Piotr Konieczny pio...@post.pl wrote:
 Here 
 

Re: [Wiki-research-l] commentary on Wikipedia's community behaviour (Aaron gets a quote)

2014-12-15 Thread WereSpielChequers
We have problems, I don't dispute that. But ugly and bitter as 4chan? That 
has to be an exaggeration.

Regards

Jonathan Cardy


 On 13 Dec 2014, at 01:03, Andrew Lih andrew@gmail.com wrote:
 
 I certainly hope you're right Sydney. What a horrible mess.
 
 
 On Fri, Dec 12, 2014 at 5:53 PM, Sydney Poore sydney.po...@gmail.com wrote:
 I think feminists, especially those who take an interest in STEM, will pass 
 this article around.
 
 Sydney
 
 On Dec 12, 2014 5:35 PM, Andrew Lih andrew@gmail.com wrote:
 It's a good piece, but honestly I think only the dedicated tech reader will 
 make it through the entire story. There's a lot of jargon and insider 
 intrigue such that I could imagine most people never making past the 
 typewriter barf of BLP, AGF, NOR :)
 
 
 On Fri, Dec 12, 2014 at 5:26 PM, Dariusz Jemielniak dar...@alk.edu.pl 
 wrote:
 While I agree that the article is overly negative (likely because of the 
 individual experience), I think it still points to an important problem. I 
 don't perceive this article as really problematic in terms of image. Maybe 
 naively, I imagine that people will not stop donating because the 
 community is not ideal.
 
 pundit
 
 On Fri, Dec 12, 2014 at 11:16 PM, Kerry Raymond kerry.raym...@gmail.com 
 wrote:
 There’s a saying that everyone likes to eat sausages but nobody likes to 
 know how they are made.  It is not good to have negative publicity like 
 that during the annual donation campaign (irrespective of the motivations 
 of the journalist and/or the rights/wrongs of the issue being reported, 
 neither of which I intend to debate here). As a donation-funded 
 organisation, public perception matters a lot.
 
  
 
 Kerry
 
  
 
 From: Jonathan Morgan [mailto:jmor...@wikimedia.org] 
 Sent: Saturday, 13 December 2014 6:43 AM
 To: Research into Wikimedia content and communities
 Cc: Kerry Raymond
 Subject: Re: [Wiki-research-l] commentary on Wikipedia's community 
 behaviour (Aaron gets a quote)
 
  
 
 I mostly agree. On one hand, it's always nice to see a detailed 
 description of how wiki-sausage gets made in a major venue. On the other, 
 this journalist clearly has a personal axe to grind, and used his bully 
 pulpit to grind it in public.
 
  
 
 - J
 
  
 
 On Fri, Dec 12, 2014 at 1:39 AM, Federico Leva (Nemo) 
 nemow...@gmail.com wrote:
 
 1000th addition to the inconsequential rant genre.
 
 Nemo
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
  
 
 --
 
 Jonathan T. Morgan
 
 Community Research Lead
 
 Wikimedia Foundation
 
 User:Jmorgan (WMF)
 
 jmor...@wikimedia.org
 
  
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 -- 
 
 __
 prof. dr hab. Dariusz Jemielniak
 kierownik katedry Zarządzania Międzynarodowego
 i centrum badawczego CROW
 Akademia Leona Koźmińskiego
 http://www.crow.alk.edu.pl
 
 członek Akademii Młodych Uczonych Polskiej Akademii Nauk
 członek Komitetu Polityki Naukowej MNiSW
 
 Wyszła pierwsza na świecie etnografia Wikipedii Common Knowledge? An 
 Ethnography of Wikipedia (2014, Stanford University Press) mojego 
 autorstwa http://www.sup.org/book.cgi?id=24010
 
 Recenzje
 Forbes: http://www.forbes.com/fdc/welcome_mjx.shtml
 Pacific Standard: 
 http://www.psmag.com/navigation/books-and-culture/killed-wikipedia-93777/
 Motherboard: http://motherboard.vice.com/read/an-ethnography-of-wikipedia
 The Wikipedian: 
 http://thewikipedian.net/2014/10/10/dariusz-jemielniak-common-knowledge
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Tool to find poorly written articles

2014-10-24 Thread WereSpielChequers
And just to add to the complexity of James' comments; there are some people
who think that a general interest encyclopaedia should be written for a
general audience. So articles with long sentences should be improved by
rewriting into more but shorter sentences,

On 24 October 2014 19:44, James Salsman jsals...@gmail.com wrote:

 Ditty,

 Article quality is inherently subjective in the hard-AI sense. A panel of
 judges will consider accurate articles full of spelling, grammar, and
 formatting errors superior in quality to hoax, biased, spam, or out-of-date
 articles with perfect grammar, impeccable spelling, and immaculate
 formatting.

 In my studies of the short popular vital articles (WP:SPVA) the closest
 correlation with subjective mean opinion score quality I've found so far is
 sentence length. But it has diminishing returns and the raw correlation is
 +0.2 at best.

 The entirely subjective nature of article quality is additional support
 for automating accuracy review.

 Best regards,
 James


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editorengagement?

2014-09-28 Thread WereSpielChequers
 method at their 
 disposal, they give up and walk away.  Hmm, reminds me of
 
  
 
 https://en.wikipedia.org/wiki/No_taxation_without_representation
 
  
 
 and look where that ended (given the international readership of this list, 
 I’ll reserve judgement on whether or not it was a good outcome J )
 
  
 
 Personally I think we should look for the simple interventions and 
 experiment with them and see if they can turn around editor attrition before 
 we look to the complex interventions (like the fully collaborative editing 
 environment). It might be far simpler for watchlists to show a couple of 
 things (I’ll leave the specifics to the UX people) 1) that one of the 
 editors since your last visit is a newbie (maybe this could show in the 
 relevant entry in the edit history too) and 2) that the last edit was very 
 recent, suggesting the possibility that someone may be currently editing it 
 (and hence more likely to create edit conflicts if you go in). I don’t how 
 if it is a simple matter to show that the page is current open for edit (I 
 suspect not, but don’t know the internals of the code), but if it was easy, 
 that would be an even better thing to signal. We don’t need to change how 
 things work; it might be sufficient to just give clearer signals about 
 what’s going on.
 
  
 
 I note that this process of signalling is the key to highly scalable insect 
 behaviour (e.g. ants, termites, bees etc), aka stigmergy. Maybe we should 
 try a little stigmergy in Wikipedia. Don’t change how things work, just 
 provide humans (and bots) with better information about the situation and 
 hope they respond more appropriately.
 
  
 
 Kerry
 
  
 
  
 
 From: wiki-research-l-boun...@lists.wikimedia.org 
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Gerard 
 Meijssen
 Sent: Saturday, 27 September 2014 5:48 AM
 To: Research into Wikimedia content and communities
 Subject: Re: [Wiki-research-l] FW: What works for increasing 
 editorengagement?
 
  
 
 Hoi,
 
 Did you read this [1] the notion that bots are good for increasing the 
 number of editors is contentious. However, numbers from the Swedish 
 Wikipedia experience confirim exactly that bots are good. They not only 
 increase the number of readers but also the number of editors.. BIG GRIN
 
 Thanks,
 
 GerardM
 
  
 
 [1] 
 http://ultimategerardm.blogspot.nl/2014/09/wikipedia-to-bot-or-not-to-bot-ii.html
 
  
 
 On 26 September 2014 14:31, WereSpielChequers werespielchequ...@gmail.com 
 wrote:
 
 Scott,
 
  
 
 That's why the rest of my email focussed on things that we could that would 
 improve editor retention and which would be uncontentious, but also there is 
 a third question, are people's assumptions re newbie behaviour true? This is 
 where research would be useful. Where the problem lies in mutually 
 contradictory assumptions about user behaviour then the best way to break 
 the logjam is with research, now I'm confident that the research will 
 support my assumptions, but if I am wrong then I'm prepared to back 
 solutions that I have previously opposed.
 
 
 Regards
 
  
 
 Jonathan Cardy
 
  
 
 
 On 26 Sep 2014, at 09:56, Scott Hale computermacgy...@gmail.com wrote:
  
 
 On Fri, Sep 26, 2014 at 1:46 PM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 
 Attn Luca and Scott
 
  
 
 There are some things best avoided as going against community expectations. 
 I would be happy to see flagged revisions deployed on the English Wikipedia 
 but I'm well aware that there is a significant lobby against that of people 
 who believe that it is important that your edit goes live immediately. And 
 with the community somewhat burned by bad experiences with recent software 
 changes now would be a bad time to suggest such a controversial change.
 
  
 
  
 
 Yes. Completely agree, and that was the exact point of my first email:
 
 On Fri, Sep 26, 2014 at 9:15 AM, Scott Hale computermacgy...@gmail.com 
 wrote:
 
 And that is the fundamental flaw with this whole email thread. The question 
 needing to be answered isn't what increases new user retention. The real 
 question is what increases new user retention and is acceptable to the 
 most active/helpful existing users. The second question is much harder 
 than the first.
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
  
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman

Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-26 Thread WereSpielChequers
Scott,

That's why the rest of my email focussed on things that we could that would 
improve editor retention and which would be uncontentious, but also there is a 
third question, are people's assumptions re newbie behaviour true? This is 
where research would be useful. Where the problem lies in mutually 
contradictory assumptions about user behaviour then the best way to break the 
logjam is with research, now I'm confident that the research will support my 
assumptions, but if I am wrong then I'm prepared to back solutions that I have 
previously opposed.

Regards

Jonathan Cardy


 On 26 Sep 2014, at 09:56, Scott Hale computermacgy...@gmail.com wrote:
 
 
 On Fri, Sep 26, 2014 at 1:46 PM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
 Attn Luca and Scott
 
 There are some things best avoided as going against community expectations. 
 I would be happy to see flagged revisions deployed on the English Wikipedia 
 but I'm well aware that there is a significant lobby against that of people 
 who believe that it is important that your edit goes live immediately. And 
 with the community somewhat burned by bad experiences with recent software 
 changes now would be a bad time to suggest such a controversial change.
 
 Yes. Completely agree, and that was the exact point of my first email:
 
 On Fri, Sep 26, 2014 at 9:15 AM, Scott Hale computermacgy...@gmail.com 
 wrote:
 And that is the fundamental flaw with this whole email thread. The question 
 needing to be answered isn't what increases new user retention. The real 
 question is what increases new user retention and is acceptable to the most 
 active/helpful existing users. The second question is much harder than the 
 first.
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editorengagement?

2014-09-25 Thread WereSpielChequers
I don't doubt that Australian newbies editing existing well developed articles 
are going to find they are editing things on existing Australian editors watch 
lists. My experience of editathons is mostly about creating new articles or 
improving very neglected ones, usually by expanding stubs, or working from 
lists of articles without images. Such articles are unlikely to be on anyone's 
watchlist, or if they are they aren't likely to object to good faith expansion. 
But I would still expect that existing editors would be looking at edits as 
fast as they come in, not because of watchlists but because they are at recent 
changes or new page patrol watching new edits as they come in. Though if an 
Editathon is focussed on particular articles, we have one coming up at the 
Royal Opera house which will focus on some of the people historically 
associated with them, then inviting the relevant wiki project is one way to 
alert the existing editors. Thinking about this I will drop an invitation to 
the ROH editathon on the talk page of established articles we are going to 
focus on.

I think this would be an interesting topic for someone to do research, whether 
the problem is with watchlisters or patrollers should be easy to spot. When 
existing editors come into conflict with newbies at editathons, looking at the 
established editors editing are they past contributors to that article, and 
before their intervention were their edits to random new pages/random pages 
edited in the past few moments or were they to other articles they had also 
previously edited. If we can identify the types of conflicts going on we can be 
clearer about the types of changes that we need people to make. My editathons 
rarely hit problems of conflict with existing editors, and when they do it is 
usually that people insist on writing an article on a subject where they can't 
find the two independent reliable sources that I suggested they  need before 
creating an article. Partly I avoid this by starting people off with easy 
steps, signing the event page may get them an edit conflict but also shows them 
how to resolve a simple one. Creating new articles in sandboxes, and warning 
them that others will probably edit them within minutes of their moving to main 
space; Encouraging people to save frequently means they don't have complex edit 
conflicts over multiple paragraphs and telling them always to leave an edit 
summary, edit summaries are optional, so vandals rarely bother with them and 
even a one word edit summary if expand, pic or typo is code for I am not a 
vandal.

Regards

Jonathan Cardy


 On 26 Sep 2014, at 00:31, Kerry Raymond kerry.raym...@gmail.com wrote:
 
 Australian outreach events generally edit Australian content. Other 
 Australian editors are likely to be on the watch list and are likely to be in 
 the timezone. And plenty of non-Australian editors are sitting in their 
 pyjamas at all hours of the day and night waiting to pounce. Believe me, new 
 editors encounter other editors very quickly (although sometimes they don’t 
 realise it). They often think it “rude” that other people are editing the 
 article “while I am in the middle of working on it”. Their mental model of 
 collaborative editing is like a shared lawn mower. You have sole use for a 
 while; then it is passed on to the next person.
  
 Not sure I can help you with London editathons. But I do have a couple of 
 edit training days coming up in Oakey, Queensland in a few weeks:
  
 https://en.wikipedia.org/wiki/Oakey,_Queensland
  
 Kerry
  
  
 From: WereSpielChequers [mailto:werespielchequ...@gmail.com] 
 Sent: Thursday, 25 September 2014 9:38 PM
 To: kerry.raym...@gmail.com; Research into Wikimedia content and communities
 Subject: Re: [Wiki-research-l] FW: What works for increasing editorengagement?
  
 Yes, training newbies is a great way to learn and to see the flaws that we 
 mentally blank out. I also found that I need to keep a vanilla account for 
 demonstrating things to newbies, if I use my WereSpielChequers account the 
 various extra buttons confuse people. 
  
 I wouldn't worry too much about watchlisters making edits and causing edit 
 conflicts, most of the time you aren't going to be in the same time zone, 
 watchlisters even the most active ones are unlikely to check their watch list 
 more than a few times a day. So as long as you don't start newbies on highly 
 watched articles like Sarah Palin you should be OK with them. But edit 
 conflicts are a real problem for newbies and especially those creating new 
 articles. The new page patrol people need to look at articles as they are 
 created in order to pick up attack pages etc, and when they find OK articles 
 they tend to at least categorise them. Some of this could be fixed by 
 improving the software for handling edit conflicts, for example it would be 
 nice if adding a category and changing some text were not treated as a 
 conflict. However this is the sort

Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread WereSpielChequers
We have had endless discussions about this in the new page patrol community. 
Basically there is a divide between those who think it important to communicate 
with people as quickly as possible so they have a chance to fix things before 
they log off and people such as myself who think that this drives people away. 
So before we try to make people more aware that they are dealing with a newbie 
it would help if we had some neutral independent research that indicated which 
position is more grounded in reality. Simply making it clearer to patrollers 
that they are dealing with newbies is solving a non problem, we know the 
difference between newbies and regulars, we just disagree as to the best way to 
handle newbies. Investing in software to tell patrollers when they are dealing 
with newbies is unlikely to help, in fact I would be willing to bet that one of 
the criticisms will be from patrollers saying that it isn't doing that job as 
well as they can because it doesn't spot which editors are obviously 
experienced even if their latest account is not yet auto confirmed.

There is also the issue that some patrollers may not realise how many edit 
conflicts they cause by templating and categorising articles. Afterall it isn't 
going to be the templater or categoriser who loses the edit conflict, that is 
almost guaranteed to be the newbie. Of course this could be resolved by 
changing the software so that adding a category or template is not treated as 
conflicting with changing the text.

Regards

Jonathan Cardy


 On 25 Sep 2014, at 23:23, Luca de Alfaro l...@dealfaro.com wrote:
 
 Re. the edit conflicts happening when a new user is editing: 
 
 Can't one add some AJAX to the editor that notifies that one still has the 
 editing window open? Maybe editors could wait to modify work in progress, if 
 they had that indication, and if the content does not seem vandalism? 
 
 Luca
 
 On Thu, Sep 25, 2014 at 12:17 PM, James Salsman jsals...@gmail.com wrote:
 Aaron, would you please post the script you used to create
 https://commons.wikimedia.org/wiki/File:Desirable_newcomer_survival_over_time.png
 ?
 
 I would be happy to modify it to also collect the number of extant
 non-redirect articles each desirable user created.
 
  Aaron wrote:
  ... You'll find the hand-coded set of users here
   http://datasets.wikimedia.org/public-datasets/enwiki/rise-and-decline
  ...
   Categories:
  
 1. Vandals - Purposefully malicious, out to cause harm
 2. Bad-faith - Trying to be funny, not here to help or harm
 3. Good-faith - Trying to be productive, but failing
 4. Golden - Successfully contributing productively
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What works for increasing editor engagement?

2014-09-15 Thread WereSpielChequers
When my watch list went over 13,000 I changed my preferences so I only add 
things to it that I want on it, and like Kerry started to pare things back. At 
first i was just unwatching a trickle of articles, I would look at edits on my 
watch list by unfamiliar editors, revert the vandalism and unwatch if it was a 
good edit and I couldn't remember why I had watch listed it. Then I did a huge 
purge and now have only a few thousand articles watchlisted. Above a certain 
size watch lists become a chore, plus with the rise of the edit filters, 
nowadays I don't find much vandalism by looking at my watchlist.

Regards

Jonathan Cardy


 On 15 Sep 2014, at 02:35, Kerry Raymond kerry.raym...@gmail.com wrote:
 
 I have set the preference to put anything I edit on my watchlist (so I can
 be aware of any short-term reactive edits to my own edits), but I balance
 that with always asking myself when I get a watchlist notification whether
 it deserves to stay on the watchlist. I made a decision a while back that I
 can't do everything, so I chose to make Queensland geography, history and
 biography my focus and I generally pare back my watch list to articles in
 that space (plus a few other things odds and ends that I am particularly
 fond of). By doing that, I have brought my watchlist slowly down from around
 10K to about 4K which is manageable in terms of daily load, but obviously
 some topic spaces are more active than others (I wish there were more people
 interested in Queensland to share the load with).
 
 Kerry
 
 -Original Message-
 From: Stuart A. Yeates [mailto:syea...@gmail.com] 
 Sent: Monday, 15 September 2014 10:51 AM
 To: Kerry Raymond; Research into Wikimedia content and communities
 Cc: Jane Darnell
 Subject: Re: [Wiki-research-l] What works for increasing editor engagement?
 
 On Mon, Sep 15, 2014 at 12:29 PM, Kerry Raymond kerry.raym...@gmail.com
 wrote:
 I have email notification for my watch list
 
 How many items on your watchlist? I appear to accumulated 14,871 items
 on mine since I last zero'd it. Right now there are 159 changes in the
 last 24 hours.
 
 I'm not sure I could cope with that volume.
 
 Part of the problem is probably my participation in WP:BLP/N, which
 means that at least once a week I edit an article that's getting lots
 of edits and likely to for some time.
 
 cheers
 stuart
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread WereSpielChequers
If your bot is only running automated reports in its own userspace then it
doesn't need a bot flag. But it probably wont be a very active bot so may
not be a problem for your stats

On the English language wikipedia you are going to be fairly close if you
exclude all accounts which currently have a bot flag, this list of former
botshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
(I
occasionally maintain this in order for the list of editors by edit count
to work, as of a couple of weeks ago when I last checked I believe it to be
a comprehensive list of retired bots with 6,000 or more edits), and perhaps
the individual with a very high edit count who has in the past been blocked
for running unauthorised bots on his user account. (I won't name that
account on list, but since it also contains a large number of manual edits,
the true answer is that you can't get an exact divide between bots and non
bots by classifying every account as either a bot or a human).

If you are minded to treat all accounts containing the word syllable bot as
bots, then you might want to tweak that to count anyone on
thesehttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits
 two 
listshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/5001%E2%80%931as
human even if their name includes bot. I check those lists
occasionally
and make sure that the only bots included are human editors.


On 18 May 2014 20:33, R.Stuart Geiger sgei...@gmail.com wrote:

 Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will get
 no mercy. :-)

 But seriously, my tl;dr: instead of asking if an account is or isn't a
 bot, ask if a set of edits are or are not automated

 Great responses so far: searching usernames for *bot will exclude non-bot
 users who were registered before the username policy change (although *Bot
 is a bit better), and the logging table is a great way to collect bot
 flags. However, Scott is right -- the bot flag (or *Bot username) doesn't
 signify a bot, it signifies a bureaucrat recognizing that a user account
 successfully went through the Bot Approval Group process. If I see an
 account with a bot flag, I can generally assume the edits that account
 makes are initiated by an automated software agent. This is especially the
 case in the main namespace. The inverse assumption is not nearly as easy: I
 can't assume that every edit made from an account *without* a bot flag was
 *not* an automated edit.

 About unauthorized bots: yes, there are a relatively small number of
 Wikipedians who, on occasion, run fully-automated, continuously-operating
 bots without approval. Complicating this, if someone is going to take the
 time to build and run a bot, but isn't going to create a separate account
 for it, then it is likely that they are also using that account to do
 non-automated edits. Sometimes new bot developers will run an unauthorized
 bot under their own account during the initial stages of development, and
 only later in the process will they create a separate bot account and seek
 formal approval and flagging. It can get tricky when you exclude all the
 edits from an account for being automated based on a single suspicious set
 of edits.

 More commonly, there are many more people who use automated batch tools
 like AutoWikiBrowser to support one-off tasks, like mass find-and-replace
 or category cleanup. Accounts powered by AWB are technically not bots,
 only because a human has to sit there and click save for every batch edit
 that is made. Some people will create a separate bot account for AWB work
 and get it approved and flagged, but many more will not bother. Then
 there are people using semi-automated, human-in-the-loop tools like Huggle
 to do vandal fighting. I find that the really hard question is whether
 you include or exclude these different kinds of 'cyborgs', because it
 really makes you think hard about what exactly you're measuring. Is
 someone who does a mass find-and-replace on all articles in a category a
 co-author of each article they edit? Is a vandal fighter patrolling the
 recent changes feed with Huggle a co-author of all the articles they edit
 when they revert vandalism and then move on to the next diff? What about
 somebody using rollback in the web browser? If so, what is it that makes
 these entities authors and ClueBot NG not an author?

 When you think about it, user accounts are actually pretty remarkable in
 that they allow such a diverse set of uses and agents to be attributed to a
 single entity. So when it comes to identifying automation, I personally
 think it is better to shift the unit of analysis from the user account to
 the individual edit. A bot flag lets you assume all edits from an account
 are automated, but you can use a range of approaches to identifying sets of
 automated edits from non-flagged accounts. Then I have a set of regex SQL
 queries in the Query Library 

Re: [Wiki-research-l] Polling the watcher's of a page. Possible?

2014-01-01 Thread WereSpielChequers
Max,

I wouldn't know if the Foundation was even aware of the incident, they
weren't the source of the data. But it was rather high profile in the
community.

I expect there have been other issues of data being extracted for
researchers, but watchlist data  is for some people a sensitive issue,
hence my alternative suggestion. If you want to go forward with this I'd
suggest either finding a better way to look at how groups of editors focus
on the same articles, or doing something with anonymised watchlist data -
you might get that from the WMF or indeed by posting your credentials as a
researcher and inviting contributors to email you their watchlists for some
research that you will anonymise.

I think it would be interesting to see some research on how closely an
editors watchlist reflects their editing, and how large a watchlist gets
before it becomes so big that an editor no longer stays on top of it. But
you'd also need to ask a few questions such as under what circumstances do
you take a page off your watchlist






On 1 January 2014 05:44, Klein,Max kle...@oclc.org wrote:


  Jonathan,

 So is that it then. Is foundation feeling too burned to ever give out the
 data again? Has there been other precedent since then of releasing data to
 academics?



  Kerry,

 Thanks for the link to the paper. I just saw this in the latest
 newsletter.


  Brian,
 The idea of sending a script to follow other editors and then survey them
 would be a good way to train a learning algorithm. I hadn't thought of
 that, mostly I expected to just pour over some old edits. Thanks for the
 idea.


  Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023


  --
 *From:* wiki-research-l-boun...@lists.wikimedia.org 
 wiki-research-l-boun...@lists.wikimedia.org on behalf of
 WereSpielChequers werespielchequ...@gmail.com
 *Sent:* Tuesday, December 31, 2013 4:31 AM
 *To:* Research into Wikimedia content and communities
 *Subject:* Re: [Wiki-research-l] Polling the watcher's of a page.
 Possible?

  How many watchlisters a page has is a sensitive issue, we've already had
 one incident where a researcher acquired a list of unwatched pages for a
 vandalism experiment.

  However anyone who watches a page will also have that pages talkpage on
 their watchlist, so while you can't directly contact everyone who has that
 page on their watchlist you could conceivably attract the attention of some
 of them by a message on its talkpage. But if you were doing more than one
 or two of them you would need your note to be very relevant to the
 watchlisters of that page.

  Regards

  Jonathan


 On 31 December 2013 10:36, Brian Keegan b.kee...@neu.edu wrote:

  Check out Michael Kummer's paper that looks at a similar topic
 (contagion in pageviews among linked articles) from an econometrics
 perspective: Spillovers in Networks of User Generated Content – Evidence
 from 23 Natural Experiments on Wikipedia

  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199



  On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond 
 kerry.raym...@gmail.comwrote:

   No, you can’t for reasons on privacy. See:



 https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy



 But, I concur with your theory that edits are contagious. I often find
 that when I get the notification that a watched page has changed, I go and
 look at the page. While I am there, I often spot a “little thing that needs
 doing”, which sometimes is just a simple single edit and other times
 initiates a marathon of editing activity for the next couple of days J



 If you want to test this theory, I think using at the set of editors of
 the page might be a pretty good approximation of the watchlist. A lot of
 people have the “add the pages and files I edit to my watchlist” set in
 their preferences (I know I do).



 For the purpose of declaring one edit as being contagious (that is,
 causes another edit), what criteria would you use? I would assume you need
 some time bounds here. I think there needs to be “kick-off” edits
 identified. These would be edits that occurred sufficiently long after the
 previous edit that contagion could not be factor. Then after the kick-off
 edit, you would be looking for one or more “reaction” edits that occurred
 fairly quickly after one another, suggesting a contagion based on
 watchlists. So it seems there are two time parameters: the kick-off
 threshold and the reaction threshold. I don’t think these are necessarily
 the same value (i.e. is there is some grey zone in-between where the edits
 can be categorised as neither kick-off nor reaction?).



 In terms of setting these threshold(s), you might need some real-life
 data to train on. So maybe you could start by asking if some editors would
 send you a copy of their watchlist and you could write a script that
 compared it with their edit history over the same time frame (plus a bit to
 cater for bursty-ness). From that you could come up with a set of edits

Re: [Wiki-research-l] Polling the watcher's of a page. Possible?

2013-12-31 Thread WereSpielChequers
How many watchlisters a page has is a sensitive issue, we've already had
one incident where a researcher acquired a list of unwatched pages for a
vandalism experiment.

However anyone who watches a page will also have that pages talkpage on
their watchlist, so while you can't directly contact everyone who has that
page on their watchlist you could conceivably attract the attention of some
of them by a message on its talkpage. But if you were doing more than one
or two of them you would need your note to be very relevant to the
watchlisters of that page.

Regards

Jonathan


On 31 December 2013 10:36, Brian Keegan b.kee...@neu.edu wrote:

 Check out Michael Kummer's paper that looks at a similar topic
 (contagion in pageviews among linked articles) from an econometrics
 perspective: Spillovers in Networks of User Generated Content – Evidence
 from 23 Natural Experiments on Wikipedia

 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199



 On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond kerry.raym...@gmail.comwrote:

  No, you can’t for reasons on privacy. See:



 https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy



 But, I concur with your theory that edits are contagious. I often find
 that when I get the notification that a watched page has changed, I go and
 look at the page. While I am there, I often spot a “little thing that needs
 doing”, which sometimes is just a simple single edit and other times
 initiates a marathon of editing activity for the next couple of days J



 If you want to test this theory, I think using at the set of editors of
 the page might be a pretty good approximation of the watchlist. A lot of
 people have the “add the pages and files I edit to my watchlist” set in
 their preferences (I know I do).



 For the purpose of declaring one edit as being contagious (that is,
 causes another edit), what criteria would you use? I would assume you need
 some time bounds here. I think there needs to be “kick-off” edits
 identified. These would be edits that occurred sufficiently long after the
 previous edit that contagion could not be factor. Then after the kick-off
 edit, you would be looking for one or more “reaction” edits that occurred
 fairly quickly after one another, suggesting a contagion based on
 watchlists. So it seems there are two time parameters: the kick-off
 threshold and the reaction threshold. I don’t think these are necessarily
 the same value (i.e. is there is some grey zone in-between where the edits
 can be categorised as neither kick-off nor reaction?).



 In terms of setting these threshold(s), you might need some real-life
 data to train on. So maybe you could start by asking if some editors would
 send you a copy of their watchlist and you could write a script that
 compared it with their edit history over the same time frame (plus a bit to
 cater for bursty-ness). From that you could come up with a set of edits
 that look like contagious ones and you could ask the editors to say “yes /
 no / don’t remember” to try to see if 1) contagion appears to be happening
 2) what the time thresholds need to be. Then test it on a bigger set of
 data using edit history as a proxy for watchlists.



 Kerry








  --

 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Klein,Max
 *Sent:* Tuesday, 31 December 2013 2:26 PM
 *To:* wiki-research-l@lists.wikimedia.org
 *Subject:* [Wiki-research-l] Polling the watcher's of a page. Possible?



 Hello Research,

 It it possible to query for the watchers of a page? It does not seem to
 be in the API, nor is the watchers or wl_user table in the Data Base
 replicas (where I thought MediaWiki stores it. I imagine this is for
 privacy reasons, correct? If so, how would one gain access?

 I have been talking with an econophysicist who thinks that we could
 apply a contagion algorithm, to see which edits are contagious.  (I met
 this econopyhicist at the Berkeley Data Science Faire at which Wikimedia
 Analytics presented, so it was worth it in the end).

   Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Brian C. Keegan, Ph.D.
 Post-Doctoral Research Fellow, Lazer Lab
 College of Social Sciences and Humanities, Northeastern University
 Fellow, Institute for Quantitative Social Sciences, Harvard University
 Affiliate, Berkman Center for Internet  Society, Harvard Law School

 b.kee...@neu.edu
 www.brianckeegan.com
 M: 617.803.6971
 O: 617.373.7200
 Skype: bckeegan

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread WereSpielChequers
Re other dimensions or heuristics:

Very few articles are rated as Featured, and not that many as Good, if you
are going to use that rating
systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/AssessmentI'd
suggest also including the lower levels, and indeed whether an article
has been assessed and typically how long it takes for a new article to be
assessed. Uganda for example has 1 Featured article, 3 Good Articles and
nearly 400 unassessed on the English language
Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content
.

For a crowd sourced project like Wikipedia the size of the crowd is crucial
and varies hugely per article. So I'd suggest counting the number of
different editors other than bots who have contributed to the article. It
might also be worth getting some measure of local internet speed or usage
level as context. There was a big upgrade to East Africa's Internet
connection a few years ago. For Wikipedia the crucial metric is the size of
the Internet comfortable population with some free time and ready access to
PCs, I'm not sure we've yet measured how long it takes from people getting
internet access to their being sufficiently confident to edit Wikipedia
articles, I suspect the answer is age related,  but it would be worth
checking the various editor surveys to see if this has been collected yet.
My understanding is that in much of Africa many people are bypassing the
whole PC thing and going straight to smartphones, and of course for
mobilephone users Wikipedia is essentially a queryable media rather than an
interactive editable one.

Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to be a
measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.

Thankfully the Article Feedback tool has been almost eradicated from the
English language Wikipedia, I don't know if it is still on French or
Swahili. I don't see it as being connected to the quality of article,
thouugh it should be an interesting measure of how loved or hated a given
celebrity was during the time the tool was deployed. So I'd suggest
ignoring it in your research on article quality.

Hope that helps

Jonathan


On 15 December 2013 06:15, Klein,Max kle...@oclc.org wrote:

  Wiki Research Junkies,

 I am investigating the comparative quality of articles about  Cote
 d'Ivoire and Uganda versus other countries. I wanted to answer the question
 of what makes high-quality articles? Can anyone point me to any existing
 research on heuristics of Article Quality? That is, determining an articles
 quality by the wikitext properties, without human rating? I would also
 consider using data from the Article Feedback Tools, if there were dumps
 available for each Article in English, French, and Swahili Wikipedias.
 This is all the raw data I can seem to find
 http://toolserver.org/~dartar/aft5/dumps/

 The heuristic technique that I currently using is training a naive
 Bayesian filter based on:

-

Per Section.
 -

   Text length in each section
   -

   Infoboxes in each section.
-

  Filled parameters in each infobox
   -

   Images in each section
-

Good Article, Featured Article?
-

Then Normalize on Page Views per on population / speakers of native
language

 Can you also think of any other dimensions or heuristics to
 programatically rate?


  Best,
   Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread WereSpielChequers
Re Laura's comment.

I don't dispute that there are plenty of high quality articles which have
had only one or two contributors. However my assumption and experience is
that in general the more editors the better the quality, and I'd love to
see that assumption tested by research. There may be some maximum above
which quality does not rise, and there are clearly a number of gifted
members of the community whose work is as good as our best crowdsourced
work, especially when the crowdsourcing element is to address the minor
imperfection that comes from their own blind spot. It would be well
worthwhile to learn if Women's football is an exception to this, or indeed
if my own confidence in crowd sourcing is mistaken

I should also add that while I wouldn't filter out minor edits you might as
well filter out reverted edits and their reversion. Some of our articles
are notorious vandal targets and their quality is usually unaffected by a
hundred vandalisms and reversions of vandalism per annum. Beaver before it
was semi protected in Autumn
2011https://en.wikipedia.org/w/index.php?title=Beaveroffset=20111211084232action=historybeing
a case in point. This also feeds into Kerry's point that many
assessments are outdated. An article that has been a vandalism target might
have been edited a hundred times since it was assessed, and yet it is
likely to have changed less than one with only half a dozen edits all of
which added content.

Jonathan


On 15 December 2013 09:44, Laura Hale la...@fanhistory.com wrote:


 On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:

 Re other dimensions or heuristics:

 Very few articles are rated as Featured, and not that many as Good, if
 you are going to use that rating 
 systemhttps://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/AssessmentI'd
  suggest also including the lower levels, and indeed whether an article
 has been assessed and typically how long it takes for a new article to be
 assessed. Uganda for example has 1 Featured article, 3 Good Articles and
 nearly 400 unassessed on the English language 
 Wikipediahttps://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content
 .

 For a crowd sourced project like Wikipedia the size of the crowd is
 crucial and varies hugely per article. So I'd suggest counting the number
 of different editors other than bots who have contributed to the article.


 Except why would this be something that would be an indicator of quality?
  I've done an analysis recently of football player biographies where I
 looked at the total volume of edits, date created, total number of
 citations and total number of pictures and none of these factors correlates
 to article quality.  You can have an article with 1,400 editors and still
 have it be assessed as a start.  Indeed, some of the lesser known articles
 may actually attract specialist contributors who almost exclusively write
 to one topic and then take the article to DYK, GA, A or FA.  The end result
 is you have articles with low page views that are really great that are
 maintained by one or two writers.



 Whether or not a Wikipedia article has references is a quality dimension
 you might want to look at. At least on EN it is widely assumed to
 be a measure of quality, though I don't recall ever seeing a study of the
 relative accuracy of cited and uncited Wikipedia information.

 Yeah, I'd be skeptical of this overall though it might be bad.  The
 problem is you could get say one contentious section of the article that
 ends up fully cited or overcited while the rest of the article ends up
 poorly cited.  At the same time, you can get B articles that really should
 be GAs but people have been burned by that process so they just take it to
 B and left it there.  I have heard this quite a few time from female
 Wikipedians operating in certain places that the process actually puts them
 off.

 --
 twitter: purplepopple
 blog: ozziesport.com

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How to collect all the admin-specific edits for a subset of Wp admins

2013-11-15 Thread WereSpielChequers
Hi Jerome,

Just a random note of caution, there are also admin actions such as closing
RFCs altering userrights and protecting and unprotecting pages. So if you
discover that some of your 120 are inactive you might want to check if they
are active in those areas - most of us are relatively specialised. Perhaps
more importantly the logs won't show how often admins have declined an
action, I have declined hundreds of deletion tags, others will have
declined hundreds of unblock requests.

Also I suspect that the logs only go back to Dec 2004 - I know that most
prior data is missing.

Jonathan


On 15 November 2013 11:27, Jérôme Hergueux jerome.hergu...@gmail.comwrote:

 Hi all,

 FYI: the solution proposed below worked just fine. Thanks Dario! :)

 Cheers,

 Jérôme.


 2013/10/10 Dario Taraborelli dtarabore...@wikimedia.org

 Hi Jerôme,

 most of the actions you refer to are not stored as edits by mediawiki.
 They can be accessed via the logging table [1] (with log_type 'delete' or
 'block'), which is replicated on tool labs (you can apply for a tool labs
 account if you don't have one [2]).

 HTH

 Dario

 [1] https://www.mediawiki.org/wiki/Manual:Logging_table
 [2] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help

 On Oct 10, 2013, at 10:02 AM, Klein,Max kle...@oclc.org wrote:

 Hello Jerome,

 I'm not sure this is the best way, but pywikipediabot [1] has a library
 called pagegenerators.py and there is a function *def
 UserContributionsGenerator(username)* (around line 706). That would
 allow you to iterate through theses user names, and I bet there will be a
 special marking for deletions/undeletions. If not, worst comes to worse you
 can use a regular expression for those words.

 [1] https://meta.wikimedia.org/wiki/pywikipediabot

 When you use have a pywikibot-hammer everything looks like a
 pywikibot-nail!

 Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023

 --
 *From:* wiki-research-l-boun...@lists.wikimedia.org 
 wiki-research-l-boun...@lists.wikimedia.org on behalf of Jérôme
 Hergueux jerome.hergu...@gmail.com
 *Sent:* Thursday, October 10, 2013 3:11 AM
 *To:* wiki-research-l@lists.wikimedia.org
 *Subject:* [Wiki-research-l] How to collect all the admin-specific edits
 for a subset of Wp admins

 Dear all,

 I am starting this thread in the hope that some of the great Wiki
 researchers on this list could advise me on a data collection problem.

 Here is the question: for a each of 120 Wikipedia admins (for whom I have
 the usernames and unique numeric ids), I would like to reliably count the
 number of times they (i) deleted a page (ii) undeleted (i.e. restored) a
 page (iii) protected a page (iv) blocked a user and (v) unblocked a user.

 Those types of edits all correspond to a specific action in the
 Wikipedia API documentation page (http://en.wikipedia.org/w/api.php):
 action=delete,action=undelete, action=protect, action=block and
 action=unblock.

 I don't know, however, what would be the best strategy to go about
 collecting those edits. Does anyone have an idea about which data
 collection strategy I should adopt in this case? Is there a way to query
 the Wikipedia API directly, or should I look for some specific markers in
 the edit summaries?

 I would be very grateful for any advice of feedback!
 Thanks much for your attention and time. :)

 Best,

 Jérôme.
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Readable characters vs. size in bytes of articles

2013-08-10 Thread WereSpielChequers
Hi Fabian,

I can honestly say I had never seen an article like Timeline of
architectural styles 1000–present
But even with that one and removing everything I could interpret as hidden
or code generated I wound up with a lot more than 95 bytes:

6000BC–1000AD • 1000–1750 • 1750–1900 1900–Present
Architectural style Architecture timeline
Julian calendar Gregorian calendar Neoclassical Georgian
Sicilian Baroque
English Baroque
Rococo
Palladianism
Jacobean
Baroque
Elizabethan
Mannerism
Spanish Colonial
Manueline
Tudor
High Renaissance
Renaissance
Perpendicular Period
Brick Gothic
Decorated Period
Early English Period
Gothic
Norman
Romanesque
Byzantine
Roman
Ancient Greek
Ancient Egyptian
Sumerian
Neolithic

So my suspicion is that part of the reason that you and Aaron are getting
different results is because your methods of extracting display bytes are
different. To get just 95 bytes from this article I think that the program
you used you would have had to strip out at least some of the linked words.

Regards

Jonathan

On 6 August 2013 14:55, Floeck, Fabian (AIFB) fabian.flo...@kit.edu wrote:

 @Jonathan: Good point, but I'm actually not stripping the content of
 tables, just the mark-up of the tables. (Also I leave the whitespaces in
 and count them, just remove line breaks, as the cleaning leaves a lot of
 empty lines) I checked the results manually in over 50 cases and what my
 script outputs is almost exactly what you get when you take the article
 text of a page and copy and paste it into a text editor or Word from the
 browser by hand, including tables and infoboxes.
 So it finds exactly what I wanted, the readable, displayed text portion of
 an article. Remember that I said I also remove Disambiguation articles
 (indicated by disambiguation in the article name or category name. Reason
 being that I wanted articles with running text). They probably have a
 higher correlation as they don't use templates very much I think. As for
 the remaining difference in the corr coefficients, it could also be caused
 by the manner of cleaning.
 The shortest I got was Timeline of architectural styles 1000–present with
 95 chars, but this example reveals that sometimes, you would have to
 include characters inside pictures (can you tell me by chance how frequent
 these types of code-generated pictures are?).
 But there are also these examples like the Veer Teja Vidhya Mandir
 School  I mentioned (chars= 404) were the template is simply highly
 underused and bloats the syntax.

 @Federico: You are completely right, size in bytes is a good indicator for
 many things; you could for example argue it accurately measures the work
 put into an article by the editors, as constructing the Wikisyntax can be a
 big part of a good article.

 @Aaron: You've severely limited the range of your regressor and therefor
 invalidated a set of assumptions for the correlation.
 You seem to be very confused about some statistical concepts:
 1. You mix up the concept of inference statistics with a descriptive
 statistical analysis when you tell me that reporting the result of an
 experiment on a sample (and as nothing more was this declared) is a
 mistake. All I said was that in this sample, with my (ad-hoc!) method,
 this is the result. No inference about the rest of the articles beyond that
 sample. Turns out that I was correct, no mistake whatsoever. For me it was
 interesting enough to post to the list that there is no correlation between
 the two variables in *this* sample. Which is still a very interesting
 result as obviously, for at least in this byte size range (maybe others?),
 there is no or just a tiny correlation to the display char size.  I'm happy
 that you took the time to investigate articles outside this sample, that's
 the kind of input for which I turned to the research list.
 2. I didn't  invalidate anything, I ran a completely appropriate Pearson
 correlation over a sample I chose, however unrepresentative that sample may
 be (again: inference vs descriptive statistics). FYI: A correlation doesn't
 have a *regressor*, as you don't have to decide what is the independent
 and the dependent variable. That's regression; which adds no substantial
 information here imho (you can draw a fitted R^2 line on a scatterplot just
 fine without doing a regression).
 Moreover, you repeatedly ignored the fact in your replication that I also
 filtered out Disambiguation articles. Of course, then you wont get the
 exact same results as me.


 As soon as I find the time, I will run my stuff also over a sample outside
 the limited 5800-6000 byte range to see what comes out.


 Best,

 Fabian
 On 06.08.2013, at 10:53, Federico Leva (Nemo) nemow...@gmail.com wrote:

 Ziko van Dijk, 06/08/2013 02:12:

 Hello,
 When in 2008 I made some observations on language versions, it struck me
 that in some cases the wikisyntax and the meta article information was
 more KB than the whole encyclopedic content of an article. For example,
 the wikicode of 

Re: [Wiki-research-l] Readable characters vs. size in bytes of articles

2013-08-05 Thread WereSpielChequers
 API call. I don't think your content_length is the length
 of the readable front-end text as I used it.
 (On a side note: I'm unsure why you paste the complete results of a linear
 regression, as a Pearson correlation will perfectly suffice in such a
 simple bivariate case. They - due to the nature of these statistical
 methods - of course yield the same results in this case. Or was there any
 important extra information that I missed in these regression results?).

 Best,

 Fabian


 [1] http://www.crummy.com/software/BeautifulSoup/bs4/doc/
 [2] http://en.wikipedia.org/wiki/William_Goldenberg#cite_note-1


 On 05.08.2013, at 01:15, Aaron Halfaker aaron.halfa...@gmail.com wrote:

  (note that I posted this yesterday, but the message bounced due to the
 attached scatter plot.  I just uploaded the plot to commons and re-sent)
 
  I just replicated this analysis.  I think you might have made some
 mistakes.
 
  I took a random sample of non-redirect articles from English Wikipedia
 and compared the byte_length (from database) to the content_length (from
 API, tags and comments stripped).c
 
  I get a pearson correlation coef of 0,9514766.
 
  See the scatter plot including a linear regression line.  See also the
 regress output below.
 
  Call:
  lm(formula = byte_len ~ content_length, data = pages)
 
  Residuals:
 Min 1Q Median 3QMax
  -38263   -419 82592  37605
 
  Coefficients:
  Estimate Std. Error t value Pr(|t|)
  (Intercept)-97.40412   72.46523  -1.3440.179
  content_length   1.149910.00832 138.210   2e-16 ***
  ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
  Residual standard error: 2722 on 1998 degrees of freedom
  Multiple R-squared: 0.9053,   Adjusted R-squared: 0.9053
  F-statistic: 1.91e+04 on 1 and 1998 DF,  p-value:  2.2e-16
 
 
  On Mon, Aug 5, 2013 at 12:59 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:
  Hi Fabian,
 
  That's interesting. When you say you stripped out the html did you also
 strip out the other parts of the references? Some citation styles will take
 up more bytes than others, and citation style is supposed to be consistent
 at the article level.
 
  It would also make a difference whether you included or excluded alt
 text from readable material as I suspect it is non granular - ie if someone
 is going to create alt text for one picture in an article they will do so
 for all pictures.
 
  More significantly there is a big difference in standards of referencing
 , broadly the higher the assessed quality and or the more contentious the
 article the more references there will be.
 
  I would expect that if you factored that in there would be some
 correlation between readable length and bytes within assessed classes of
 quality, and the outliers would include some of the controversial articles
 like Jerusalem (353 references)
 
  Hope that helps.
 
  Jonathan
 
 
  On 2 August 2013 18:24, Floeck, Fabian (AIFB) fabian.flo...@kit.edu
 wrote:
  Hi,
  to whoever is interested in this (and I hope I didn't just repeat
 someone else's experiments on this):
 
  I wanted to know if a long or short article in terms of how much
 readable material (excluding pictures) is presented to the reader in the
 front-end is correlated to the byte size of the Wikisyntax which can be
 obtained from the DB or API; as people often define the length of an
 article by its length in bytes.
 
  TL;DR: Turns out size in bytes is a really, really bad indicator for the
 actual, readable content of a Wikipedia article, even worse than I thought.
 
  We curled the front-end HTML of all articles of the English Wikipedia
 (ns=0, no disambiguation, no redirects) between 5800 and 6000 bytes (as
 around 5900 bytes is the total en.wiki average for these articles). = 41981
 articles.
  Results for size in characters (w/ whitespaces) after cleaning the HTML
 out:
  Min= 95 Max= 49441 Mean=4794.41 Std. Deviation=1712.748
 
  Especially the gap between Min and Max was interesting. But templates
 make it possible.
  (See e.g. Veer Teja Vidhya Mandir School, Martin Callanan --
 Allthough for the ladder you could argue that expandable template listings
 are not really main reading content..)
 
  Effectively, correlation for readable character size with byte size =
 0.04 (i.e. none) in the sample.
 
  If someone already did this or a similar analysis, I'd appreciate
 pointers.
 
  Best,
 
  Fabian
 
 
 
 
  --
  Karlsruhe Institute of Technology (KIT)
  Institute of Applied Informatics and Formal Description Methods
 
  Dipl.-Medwiss. Fabian Flöck
  Research Associate
 
  Building 11.40, Room 222
  KIT-Campus South
  D-76128 Karlsruhe
 
  Phone: +49 721 608 4 6584
  Fax: +49 721 608 4 6580
  Skype: f.floeck_work
  E-Mail: fabian.flo...@kit.edu
  WWW: http://www.aifb.kit.edu/web/Fabian_Flöck
 
  KIT – University of the State of Baden-Wuerttemberg and
  National Research Center of the Helmholtz Association

Re: [Wiki-research-l] Readable characters vs. size in bytes of articles

2013-08-05 Thread WereSpielChequers
Hi Aaron,

I'm not sure how Fabian limiting his byte length to 5,500-6,000 would make
a difference. But as you've confirmed that your formula includes both the
whitespace and the contents of tables, I suspect we just need Fabian to
confirm that he ignores both and we have an explanation for the difference
between your approaches. And since Fabian's method reduced one article by
over 98% to just 95 bytes, I would be very surprised if he is including the
text contents of tables. What was your shortest, did you get any with an
80-90% reduction? I'd be surprised if your smallest was under 580  bytes.

Jonathan

On 6 August 2013 00:39, Aaron Halfaker aaron.halfa...@gmail.com wrote:

 I am removing all HTML tags and comments to include only those characters
 that are shown on the screen.  This will include the content of tables
 without including the markup contained within.  In other words, I stripped
 anything out of the HTML that looked like a tag (e.g. foo and /bar)
 or a comment (!-- [...] --) but kept the in-between characters,
 whitespace and all.

 It seems much more reasonable to me that the difference is due to the fact
 that Fabian's dataset is limited to a very narrow range of bytes.  To check
 this hypothesis, I drew a new sample of pages with byte length between 5800
 and 6000.

 The pearson correlation that I found for that sample is* 0.06466406.  *This
 corresponds nicely to the poor correlation that Fabian found.
 *
 *
 I've update the plot[1] to show the difference visually.

 -Aaron

 1.
 http://commons.wikimedia.org/wiki/File:Bytes.content_length.scatter.correlation.enwiki.png


 On Tue, Aug 6, 2013 at 6:04 AM, WereSpielChequers 
 werespielchequ...@gmail.com wrote:

 Thanks both of you,

 I suspect that you two are using very different rules to define readable
 characters, and for Aaron to get a close correlation and Fabian not to get
 any correlation implies to me that Fabian is stripping out the things that
 are not linked to article size, and that Aaron may be leaving such things
 in.

 For reasons that I'm going to pretend I don't understand, we have some
 articles with a lot of redundant spaces. Others with so few you'd be
 correct in thinking that certain editors have been making semiautomated
 edits to strip out those spaces. I suspect that Fabian's formulae ignores
 redundant spaces, and that Aaron's does not.

 I picked on alt text because it is very patchy across the pedia, but
 usually consistent at article level. I.e if someone has written a whole
 paragraph of alt text for one picture they have probably done so for every
 picture in an article, and conversely many articles will have no alt text
 at all.

 Similarly we have headings, and counterintuitively it is the subheadings
 that add most non display characters. So an article like Peasant's revolt
 will have 32 equals signs for its 8 headings, but 60 equal signs for its 10
 subheadings. 92 bytes which I suspect one or both of you will have stripped
 out. The actual display text of course omits all 92 of those bytes, but
 repeats the content of those headings and subheadings in the contents
 section.

 The size of sections varies enormously  from one article to another, and
 if there are three or fewer sections the contents section is not generated
 at all. I suspect that the average length of section headings also has
 quite a bit of variance as it is a stylistic choice. So I would expect that
 a display bytes count that simply stripped out the multiple equal signs
 would still be a pretty good correlation with article size, but a display
 bytes count that factored in the complication that headings and subheadings
 are displayed twice as they are repeated in the contents field, would have
 another factor drifting it away from a good correlation with raw byte count.

 But probably the biggest variance will be over infoboxes, tables,
 picture  captions, hidden comments and the like. If you strip all of them
 out, including perhaps even the headings, captions and table contents, then
 you are going to get a very poor fit between article length and readable
 byte size. But I would be surprised if you could get Fabian's minimum
 display size of 95 bytes from 6,000 byte articles without having at least
 one article that consisted almost entirely of tables and which had been
 reduced to a sentence or two of narrative. So my suspicion is that Aaron's
 plot is at least including the displayed contents of tables et al whilst
 Fabian is only measuring the prose sections and completely stripping out
 anything in a table.

 Both approaches of course have their merits, and there are even some
 editors who were recent edit warring to keep articles they cared about free
 from clutter by infoboxes and tables.

 Regards

 Jonathan


 On 5 August 2013 21:16, Floeck, Fabian (AIFB) fabian.flo...@kit.eduwrote:

 Hi,

 thanks for your feedback Jonathan and Aaron.

 @Jonathan: You are rightfully pointing at some things that could have
 been done

Re: [Wiki-research-l] Research:Anatomy of English Wikipedia Did You Know traffic

2013-08-04 Thread WereSpielChequers
Hi Laura and Kerry,

One point to remember when comparing views of DYKs with other processes
such as GAs is that DYKs get a slot on the mainpage. In that sense they are
best compared to in the news items and the Featured Article of the Day.
Though I'm pretty sure they don't individually get as many hits as the
latter.

Longer term the things that one would expect would increase readership
would be incoming links, redirects, categories and article completeness. If
you add a section to an article covering a new aspect such as this
particular hill fort being one of the few homes of a particular orchid or
having had a WWII anti aircraft emplacement there in the forties then you
can expect to come up in relevant searches and thereby get additional hits.

Some of this is straightforward, if something has some alternative names
then making sure we have redirects for them will enable more people to find
the article.

Some is more complex. I'm not sure how far down an article the search
engines will go, but I assume that the search engines give most weight to
the first paragraph and therefore the lede and the redirects need to
contain the words that people are most likely to be searching for when they
want to find this article.

Jonathan


On 3 August 2013 17:31, Laura Hale la...@fanhistory.com wrote:



 On Saturday, August 3, 2013, Kerry Raymond wrote:

 Hi, Laura!


 Hi Kerry.  Thanks for the comments. :)


 I wonder if a variable worth considering is the number of views of the
 DYK vs the average number of page views of the article(s) (per
 day/week/month or whatever) promoted by the DYK *before* the publication of
 the DYK (obviously this can only measured for expanded articles rather than
 new ones). The hypothesis here is that more popular topics make more
 popular DYKs.


 This is actually one of the areas that is worth looking at further.
  People have attempted to time DYKs to coincide with certain events.
  TonyTheTiger is actually very good at doing this for some his hooks.  It
 can and sometimes does create tension in the project as people try to get
 things timed for these events and not everyone wants to oblige them.  (One
 situation that particulary comes to mine is the Kony2012 article at
 http://en.wikipedia.org/wiki/Kony_2012 where the article was stalled at
 DYK because a reviewer did not want to time it to coincide with an already
 large media blitz.)  It just would require a lot of subject knowledge to
 do any indepth research on this topic and looking through T:TDYK to see
 where things are in the special holding areas often to identify some of
 these.


 Another interesting variable is number of page views of the article in
 the days/weeks/months after the DYK. It would be interesting to know the
 extent to which DYKs drive additional interest in the topic both in the
 short term and whether any increase in interest is sustained longer term. I
 would hypothesize any initial sharp increase during the DYK, with a sharp
 fall-off after the DYK finishes but with a small sustained elevation.


 Yes, my casual observation has been that historically, articles get an
 average page views per month bump after DYK that they do not enjoy with
 other processes like GA or peer review.  (This casual observation and
 assumption further research would bear it out as likely fact is based on
 the fact that you have rapid content development other processes do not
 require, and then subsequent SEO stengthening by appearing on the front
 page.)  I think having looked at the articles the hypothesis is true, but
 would need a great deal of additional data that you also have two mini
 traffic bumps prior to appearing at DYK, with the first being from the
 contributors working on the article, and the second as a result of the DYK
 review.


 It would also be interesting to see if articles mentioned in DYKs show
 any increased edit activity OR the creation of new inbound links to the
 article in the short or long term, but I am less sure about what is the
 baseline for comparison (given that a DYK article will have recently been
 created or expanded, suggesting an abnormally high level of edit activity
 immediately preceding the DYK). Possible proxies are articles in the same
 categories?


 The possible baseline would be new articles that meet DYK articles that do
 not appear at DYK or conversely comparing the article's editing history in
 several periods: Before DYK work, during DYK expansion, during DYK review,
 the day of and the week after DYK review, and the two month period after
 the DYK.  (I had actually considered doing this type of research to look at
 the contributions and DYK, but it would serve a completely different
 purpose.  Hence, it would need to be retooled.  I think this could
 potentially be one of the strengths of DYK that people fail to consider in
 that it does give new articles of a slightly higher caliber more eyes and
 potential contributors from the established editing pool than the 

Re: [Wiki-research-l] Modeling Wikipedia admin elections using multidimensional behavioral social networks

2013-02-19 Thread WereSpielChequers
Both hypotheses don't really apply to the English language Wikipedia.

Hypothesis A assumes that people vote for candidates who they are familiar
with. There is some truth in that, and it is true of small tightly knit
communities such as the Georgian Wikipedia. But in larger and or less
tightly knit communities such as Commons or the English language Wikipedia
it is only a small part of the picture. It would be more accurate to say
that many candidates have fans and foes who will turn up at their RFA.
That's one reason why some contentious RFAs can get very high
participation, and occasionally a high profile candidate can get a very
large amount of support. But to be closed the community would have to be
opposing candidates simply because they are unfamiliar with them. What
actually happens is that most votes for or against are instances where
before the RFA the candidate was unfamiliar to the voter, and the voter
judges the candidate according to what they say in the RFA, what others and
especially the nominator say in the RFA, and of course some look at the
candidates contributions. My suspicion is that only a small minority of
voters thoroughly check the candidates contributions, but those who do have
enormous influence in the RFA , especially those who find well founded
reasons to Oppose. When an RFA that was heading for success suddenly tanks
it is usually because someone has found something problematic in the
candidate's contributions and written a well argued oppose or question that
changes the mood of the RFA.

But it is still normal on EN Wiki for an RFA to take place where most of
the supporters are people who the candidate would not consider
Wikifriends or even remember having encountered before. That was the case
with my own RFAs and for most if not all of the candidates who I have
nominated.

Hypothesis B assumes that the electorate are increasingly experienced
admins, actually the majority of the voters are usually not admins, the
most regular opposers include a number of non-admins, whilst some of the
most consistent supporters are admins who worry about the admin shortage.
My experience is that the four main electorates are:

Wannabees - people considering a run themselves. Such voters tend to oppose
people who they consider clearly less qualified than they intend to be when
they run, but are very supportive of candidates as qualified as they expect
to be by the time they run.

Friends and Foes. People who are familiar with the candidate and who will
support or oppose based on their experience of them. Some of these voters
will be admins.

Experienced non-admins with no plans to run again at RFA. There are a
number of RFA regulars who know that they couldn't pass RFA themselves and
who are very wary as to who gets the power to block them or delete their
work. In particular this includes content contributors who oppose
candidates who don't have a strong record of writing encyclopaedia
articles, frank speakers who oppose anyone they suspect of becoming a
civility policeman and even editors who oppose candidates who they deem
to be too close to the WMF.

Voters in contentious RFAs. Lots of longterm editors keep an eye on the
noticeboard that lists current RFAs and their support percentages. Marginal
RFAs attract extra scrutiny, RFAs that are near unanimous are less worth
spending time on.

Regards

WSC


On 18 February 2013 17:30, Everton Zanella Alvarenga t...@wikimedia.orgwrote:

 Abstract:

 Wikipedia admins are editors entrusted with special privileges and
 duties, responsible for the community management of Wikipedia. They
 are elected using a special procedure defined by the Wikipedia
 community, called Request for Adminship (RfA). Because of the growing
 amount of management work (quality control, coordination, maintenance)
 on the Wikipedia, the importance of admins is growing. At the same
 time, there exists evidence that the admin community is growing more
 slowly than expected. We present an analysis of the RfA procedure in
 the Polish-language Wikipedia, since the procedure’s introduction in
 2005. With the goal of discovering good candidates for new admins that
 could be accepted by the community, we model the admin elections using
 multidimensional behavioral social networks derived from the Wikipedia
 edit history. We find that we can classify the votes in the RfA
 procedures using this model with an accuracy level that should be
 sufficient to recommend candidates. We also propose and verify
 interpretations of the dimensions of the social network. We find that
 one of the dimensions, based on discussion on Wikipedia talk pages,
 can be validly interpreted as acquaintance among editors, and discuss
 the relevance of this dimension to the admin elections.

 Link: http://link.springer.com/article/10.1007/s13278-012-0092-6

 From the conclusion:

 [...] We have noticed the decreasing amount of successful admin
 elections and have formulated two hypotheses that could explain this
 

Re: [Wiki-research-l] Editor retention and meetups?

2012-11-19 Thread WereSpielChequers
I've been attending London Meetups for over three years, and anecdotally
I'd say there was a high correlation between repeat or even regular
attendance at meetups and editor retention. Of course it is possible there
are some editors who spot us, leave the pub and stop editing. I also
think that the typical wiki career = 18 months myth that was quoted a few
years ago is long gone.

What I don't know is whether meetups are more attractive to the older
editors who have settled on editing as a hobby and have a very high
retention rate and less attractive to the younger editors with their
shorter retention rate. Though obviously pub based meetups do exclude those
who are clearly below the legal drinking age.

As for advertising meetups in ways unlikely to reach newer editors,
nowadays all UK meetups are advertised on people's watchlists via geo
lookup. So we get a mix, and some of the editors we get are quite new. But
I'd agree back in the days when it was only advertised on Meta and
invitations to people with London userboxes the London Meetup was far more
cliquey. In some of my first meetups I was in minority as being a
non-admin, nowadays most attendees are not admins.


WSC

On 19 November 2012 03:58, Kerry Raymond kerry.raym...@gmail.com wrote:

 I suspect that its only fairly well-entrenched editors who attend meetups,
 but I agree it would be interesting data. I rather suspect that meetups are
 advertised in ways unlikely to reach newer editors.

 Sent from my iPad

 On 19/11/2012, at 10:16 AM, Laura Hale la...@fanhistory.com wrote:

 Hi,

 I'm wondering if anyone knows of any research on Wikimedia meetups and the
 effects on editor retention?

 Sincerely,
 Laura Hale

 --
 twitter: purplepopple
 blog: ozziesport.com

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editor retention and meetups?

2012-11-19 Thread WereSpielChequers
I met some of the Georgian editors last time I was in Tbilisi. They seem to
have a very tight community, there aren't many of them but that means they
are few enough that they can all work together on their topic of the month
. Which couldn't be more different from the London meetups where some of
the participants almost never interact on wiki.

As well as meetups we've also run editathon and other content focussed
things in London as part of our GLAM and outreach programs. Articles like
Hoxne Hoard certainly did get a lot of people editing together who had met
in real life. Their retention effects will probably be different, and you
can't measure that against non-participants as a base because there is also
bound to be a halo effect amongst the people we invite. I know from another
organisation that there are lots of people who feel happier about continued
membership of an organisation that sends them interesting looking invites,
even if they are currently too busy to take up those invites. So the total
impact of say a backstage pass at a prestigious museum is much more than
the obvious benefit to articles and retention of participants, as there
will be people who feel very differently about their or indeed their
partner's hobby if it involves such invitations.

As for the idea that people attend meetups to do well in elections, in
2010/11 I was one of the active nominators at RFA, and I can assure you
there are several editors who I've met at meetups but who have decided not
to run for adminship. So not everyone attends to boost their wiki career.
Only two of my seven successful nominations have been London meetup
regulars (though I think there've been times when London generated similar
clusters of nominations to the Wikimania one you observed). So the verdict
has to be that many don't attend to boost their wiki career, and don't
assume that those who do run attended a meetup in order to boost their
chances of winning. It sometimes just happens that I or others take the
opportunity to persuade them to volunteer to be an admin.

WSC


On 19 November 2012 19:44, Laura Hale la...@fanhistory.com wrote:



 On Tue, Nov 20, 2012 at 6:33 AM, Steven Walling swall...@wikimedia.orgwrote:



 Making a correlation between IRL meetings and activity is difficult
 unless you do it by hand. And then there's the question of what you might
 use as a control group as a basis for comparison.


 I'd assume local culture plays a role and that any group looked at would
 not necessarily be usable beyond that... but for action type research, very
 usable. :)



 --
 twitter: purplepopple
 blog: ozziesport.com


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Minor stats on Wikipedia

2012-10-31 Thread WereSpielChequers
Which language version of Wikipedia are you interested in?

If is English then column B in
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm gives you new editors
by month up to September - just tot up the most recent twelve to get a
figure as at the start of October.

However I'd add a word of caution re taking the number of pages in
userspace as a figure for the number of users with a userpage. Some editors
have lots of pages in userspace, it is a good place for drafts, essays, the
quirkier userboxes and so forth, most commonly sandboxes but also
guestbooks, adoption programs, recall criteria, mentoring programs and so
forth. Some editors create new articles straight into mainspace, others
have a sandbox and work on one new article at a time, such editors might
only have two pages in userspace. But editors who work on several articles
in parallel may well have scores of pages in userspace.

WSC

On 31 October 2012 22:08, Piotr Konieczny pio...@post.pl wrote:

  Would anyone have/know where to find any of the following estimates for
 English Wikipedia, either as a number or as % of the total population of
 editors (which is known):
 * of people who edited Wikipedia anonymously
 * of Wikipedians with a userpage
 * of Wikipedians who have been registered for less than a year
 * of Wikipedians who have been registered for less than a month

 The data does not have to be current.

 --
 Piotr Konieczny

 To be defeated and not submit, is victory; to be victorious and rest on 
 one's laurels, is defeat. --Józef Pilsudski


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [pre-print] Value production in acollaborativeenvironment

2012-09-08 Thread WereSpielChequers
There are definitely areas of the world which have slow access to the
internet, some organisations compensate for this by having local caching
services. But the bottlenecks are based on the network structure and its
mismatch to demand levels, so are not purely a function of distance. A year
or two back there was a major upgrade to East Africa, and I've hear
complaints that various countries have slow access. Presumably its possible
that this could affect some parts of the US. Some of the reasons are
covered in http://en.wikipedia.org/wiki/Network_congestion

It would be an interesting bit of research to try and see if the areas of
the world with low Wikipedia editing levels correlated well with areas of
low speed access to the Internet. If the data is available it might be
possible to deduce it by looking at things such as average times between
last hitting preview and hitting save by editors in particular geographies.

WSC

On 8 September 2012 00:57, Kerry Raymond kerry.raym...@gmail.com wrote:

 ** ** **

 Similarly Internet penetration is very high here in **Australia**(available 
 to every home no matter how remote) and most home access is
 broadband (I think we came in 2nd after South Korea in some
 recent survey). There is also free access via public libraries, schools,
 etc (government policy is that everyone should have access). My impression
 is that most Australian WP editors do it from home.

 ** **

 I am not particularly convinced that being in North America has some great
 advantage wrt to the servers in Florida. I might be half a world
 away but I don’t find that makes any difference to editing WP compared with
 using some web service closer to home – we have massive great undersea
 cables to carry the data across the Pacific Ocean. I guess some countries
 might experience slower speeds if they don’t have adequate network
 infrastructure in place but I don’t think it can be automatically assumed
 that geographic distance is a barrier.

 ** **

 Kerry

 ** **
  --

 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *
 WereSpielChequers
 *Sent:* Saturday, 8 September 2012 12:47 AM

 *To:* Research into Wikimedia content and communities
 *Subject:* Re: [Wiki-research-l] [pre-print] Value production in
 acollaborativeenvironment

  ** **

 Hi Taha,

 I think you might want to review your assumptions about Internet access.
 My understanding was that the **US** ranked behind **Canada** and
 Northwest Europe, though ahead of **Europe** as a whole.

 http://en.wikipedia.org/wiki/File:InternetPenetrationWorldMap.svg

 However that is a somewhat simplistic take on things. The **US** benefits
 from faster connection speeds to the servers in Florida, so
 active editors there can get more done in an hour.

 But the **US** has more of a pro business set of employment laws than **
 Europe**, especially mainland NW Europe. This makes it easier for US
 companies to run surveillance on their employees internet use. So if there
 are still any editors editing from work they are more likely to be in **
 Europe**.

 The vast majority of our editing is probably being done in people's own
 time on domestic use IT equipment, so the base you really need to look for
 is domestic broadband penetration. But on top of that a more urban culture
 with more access to libraries and free PCs within them is probably also
 helping the UK.

 There's probably also a big cultural thing here. Even if people don't try
 to edit articles about global warming or especially evolution there has got
 to be some effect on their participation in Wikipedia. Wikipedia is an
 encyclopaedia based we hope on reliable sources, so those people who have a
 problem with science and academia are bound to find Wikipedia a less
 congenial environment. There is bound to be some link between that and our
 different editing rates on the two sides of the pond.

 WSC

 On 7 September 2012 14:09, Taha Yasseri taha.yas...@gmail.com wrote:

 Hi,
 Thank you very much for the feedbacks.
 Actually I would basically agree to most of the points mentioned by you
 both. However, let me quote the original paragraph from the extended paper
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030091(not
  the review article):

 Considering the large population of English speakers in North America
 compared to Europe, and *the fact that the Internet is most developed in
 North America,* the estimation of around only half share for north
 America to English WP is a puzzle, which definitely needs further
 multidisciplinary studies. In the case of Simple English WP, the European
 share is even larger, which is not surprising, together with the fact that
 the share of **Far East** increased, since this WP is meant to be of use
 by non-native speakers (though, not necessarily written by them). Note

Re: [Wiki-research-l] [pre-print] Value production in a collaborativeenvironment

2012-09-07 Thread WereSpielChequers
It may well be surprising to people in North America and especially the USA
that North America provides only half the edits to EN wikipedia, especially
as it did start in the US. But editing rates here in the UK are
significantly higher than in the US, and that helps make up for the
population imbalance.  EN Wiki also has significant numbers of editors from
outside the English speaking world.

I'm pretty sure that a secondary motivation for some of our editors is that
editing the English language Wikipedia is a great way to practice and
improve their written English. Conversely it may be a way for migrants to
retain a native tongue and even pass it on to their children. So no
surprise that the US has a much greater proportion of editors in
non-English projects than the UK has. As to why we have these patterns, I
suspect that several factors are in play,

The US is a land of substantial immigration from non-English speaking
countries and this may explain the large amount of editing of non-English
Wikipedias from the US.

English Wikipedia supports many different varieties of English - the
compromise between English, American English and other versions has been to
let the first major author of an article set the language version. By
contrast German, Dutch and many other wikipedia languages have standardised
on one dominant dialect. I would hypothesis that this compromise is
significantly more natural and acceptable to Brits, Australians and others
than it is to speakers of American English. At least one of the significant
attempts to launch a rival did so with a policy of American English, I'm
not aware of a serious attempt to launch a Wikipedia rival in which
American English was deprecated. While Conservapedia won't have drawn off
many Wikipedia editors, I suspect that just as Brits are generally more
used to hearing American English on TV and Films than is the reverse, we
may also be more familiar with seeing it in print.

And then of course there is our weather.

Other factors could include differences in leisure time and Internet
access. Especially amongst those with the free time to edit.


http://en.wikipedia.org/wiki/Wikipedia:Edits_by_project_and_country_of_origincould
do with updating, and maybe we should try to get some questions into
a future editor survey as to why people edit in languages other than their
native one.

Regards

WSC

On 6 September 2012 21:40, Kerry Raymond kerry.raym...@gmail.com wrote:

 **

 Firstly, thanks for the paper. I enjoyed reading it (although I am not a
 statistician so some of it went over my head).

 ** **

 In 4.1.3 Edits Origin, there is the sentence “Surprisingly, it turned out
 that English WP is almost equally edited by North Americans and editors
 from the rest of the world [110]”. That sentence comes across as implying
 that North American has some special relationship to the English language
 relative to the rest of the world (a claim that seems somewhat at odds with
 the language originating outside of **North America**). I presume the
 surprise was in relation to the proportion of English speakers in North
 America and I think the sentence would be better if this was made clear,
 e.g. Given that X% of English speakers reside in **North America**,
 surprisingly ….”

 ** **

 However, my ball park estimate would be that about half the world’s
 English speakers are in **North America** (which would make it a very
 unsurprising observation that English WP is “equally edited”). According to
 http://en.wikipedia.org/wiki/English_language#Countries_in_order_of_total_speakersNorth
  America (USA+Canada) constitutes about 62% of English speakers, but
 that’s probably an over-estimate given that it is based on the “major
 English-speaking nations” but at least it’s a citable statistic that make
 the finding a bit more surprising. Of course, maybe it’s simpler just to
 not be surprised and just say “English WP is almost equally edited …”.

 ** **

 Aside, I really don’t know whether it’s possible to get the numbers to
 truly know how many people speak a language well enough to be likely to be
 willing to edit WP in that language in order to compare it to the location
 where the edits originate. There’s probably an interesting research topic
 in relation to level of skills in a language and comfort zone in terms of
 editing WP in that language. I speculate that many people might be
 confident to do simple edits in a language in which they have a lower level
 of fluency but that larger edits might only be done by the more fluent. And
 I suspect the language(s) in which you read WP probably limit the languages
 in which you edit it (since reading an article is often a trigger to edit
 it).

 ** **

 Kerry

 ** **

 ** **

 ** **
  --

 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Taha Yasseri
 *Sent:* Thursday, 6 September 2012 7:06 PM
 *To:* 

Re: [Wiki-research-l] Wiki history of one article on War of 1812: rjensen responds

2012-09-07 Thread WereSpielChequers
Hi Richard,

I'd say there were many overlapping roles in Wikipedia, and those of us who
take on the tasks of keeping the pedia free of vandalism and spam are
probably more likely to do so under a pseudonym. I'd certainly recommend
that those who edit under their own names don't get involved in the
deleting of attack pages and certain other tasks that annoy hotheads.
Ignoring death threats is so much easier when you know they can't find
you.  As a community we also have a strong skew towards introversion, and I
suspect this has some correlation with those who choose anonymity or
pseudonymity. But as with credentials there are problems with half
measures. Most people will recognise WereSpielChequers as an obvious
pseudonym, but we have had people edit under pseudonyms that appear to be
real names, including some of our most disruptive editors. Perhaps that
would make a good topic for a researcher some time.

As for credentials there is the legacy of the Essjay incident. Those who
want to assert their professional qualifications are of course free to out
themselves completely - I've known editors whose userpage mutually links to
a profile at their university. But anyone asserting that their view should
prevail because they have a relevant qualification may have a credibility
issue if they aren't prepared to create such a link.

The Article Feedback tool is covered at
http://en.wikipedia.org/wiki/Wikipedia:Article_Feedback_Tool personally I'm
far from being a fan. I fear it will divert people from improving articles
to commenting on them, the designers seem to have ignored the cost in
volunteer time of wading through huge piles of crud to find the useful
comments and it is an annoyingly large box that disfigures articles. But it
is a major community attempt to get feedback from our readers, and its
critics don't dispute that we want to serve our readers better. We just
don't see the value in endless pages of OMG dontcha just luv him
comments.

As for the focus on readers, most of the writers who I have chatted with
about their motivation are very much motivated to communicate topics that
are important to them to their readers. Some consider that what is
important to them is or should be important to everyone. Others can be very
frank in acknowledging that if they were being paid to write then they
wouldn't be paid to write about their topic. ,  The difficulty when it
comes to deciding important topics is that we can't agree what the
important articles are.  Some consider the important ones to be those that
other encyclopaedias cover, others would judge by transient fame and look
at numbers of reads or numbers of searches. All those methods have their
problems, I'm happy to concede that an article on a fairly minor popstar
will get more readers in the next year than an article on an English Hill
fort. But the Hill fort will still be there in a thousand years, and if you
measure readership over a long enough period then  relative importance will
look very different. That isn't to say that we don't have institutional
biases, but we need to work with the grain of the community. Here in London
we seem to be able to get volunteers to do outreach to some very disparate
people, and IMHO that is one of the tricks to improving our coverage of
areas where we are weak. Can we have volunteers to spend an afternoon
talking to some people from such and such an institution is a much easier
sell than your topic isn't important, please write about this other topic
instead.

WSC

On 6 September 2012 09:27, Richard Jensen rjen...@uic.edu wrote:

 THANKS to WSC
 Those are good points -- I have a few days to make edits to the page
 proofs; the article will appear in Oct 2012 J Military History.

 Comments: I have not seen any editor make actual use of the Article
 Feedback tool -- are there examples?  Yes Wikipedians are very proud of
 their vast half-billion-person audience.  However they do not ask what
 features are most useful for a high school student or teacher/ a university
 student/ etc

 As for who does the work, I looked closely at the big military articles
 especially 1812,  also WWI, WW2, Am Civil War, Am Revolution  and found
 that the occasional editors  IP's contributed very little useful content.
 That is also my experience with the political articles on presidents 
 prime ministers  main political parties.

 Boasting like Mike Fink?-- well I  read 500+ requests for access to
 Questia, Highbeam etc.  and looked for what boasts editors actually make.
 As for higher degrees and scholarly publications, that does not cut much
 mustard on talk pages. Very few editors -- maybe 2%--mention their
 professional expertise on their user pages.  Fewer than 1% give real names
 that would permit validation of their claims.  in Academe these rates would
 be 99%

 In a larger sense (but it's not in my article), perhaps there are two wiki
 communities, one for law enforcement  one for content. That is, we have
 vigilantes policing

Re: [Wiki-research-l] Wiki history of one article on War of 1812

2012-09-05 Thread WereSpielChequers
 Hi Richard, Interesting read, I noticed a few things, though its possible
that some may simply be that you are writing in American English.


The article itself runs 14,000 words - suggest The article itself runs
to 14,000 words

That perspective is not of much concern inside Wikipedia, for it is
operated by and for the benefit of the editors.i #sdendnote1sym Only
readers who write comments are listened to, and fewer than one in a
thousand comments. That's an interesting point of view, I've heard
concerns that we don't know enough as to what our readers want, however one
of the primary motives of most editors that I know is to make humanity's
knowledge freely available to the world,
http://commons.wikimedia.org/w/index.php?title=File:Editor_Survey_Report_-_April_2011.pdfpage=8and
I've met a number of editors who are extremely focussed on the number
of people who've read their work and ways to acquire more readers such as
getting their work on Wikipedia's mainpage. Your own later comment 
Working on Wikipedia was most rewarding because it opened up a very
large*,*new audience
* being a typical Wikipedian sentiment.* Neither of which accords with the
idea that Wikipedia is operated for its editors. If you've found that to be
the view of some of Wikipedia's academic critics it might be worth
balancing that with information on the readership survey
http://meta.wikimedia.org/wiki/Readership_survey and the way that and other
metrics have been used to try and find out what our readers want. I suspect
that such criticisms also pre-date developments such as the Article
Feedback tool.


That task is handled by the “Wikipedia community,” which in practice means
a self-selected group of a couple thousand editors. As well as adding an
of I'd suggest that your numbers are out. Most of the vandal fighting,
categorisation, new page patrol and spam deletion is done a relatively
small community of a few thousand. But the people who add content are an
overlapping and rather larger group. How you measure the size of the
community is complex, and many people ignore the IP editors who actually
write a large part of the content and focus on the currently active editors
who have done over a 100 edits in the last month - at 3400 or so that group
isn't far from being a couple of thousand.
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm But it is much larger
when you consider the number of people who have contributed content in the
past but may be less active now. Our 2,000 most active editors accounted
for 20% of total edits a little over a year ago,
http://en.wikipedia.org/wiki/File:Top_Wikipedians_compared_to_the_rest_of_the_community.pngbut
even that grossly overstates our importance as the minor edits such as
typo fixes are disproportionately done by us. Suggest: That task is
handled by the “Wikipedia community,” which in practice means a
self-selected group of a few thousand frequent editors and a much larger
number of occasional participants.


Wikipedia editors almost never claim authorship of published scholarly
books and articles. That sort of expertise is not welcome in Wikipedia;
editors rarely mention they possess advanced training or degrees According
to the editor survey 26% of our editors have either a masters or a PhD.
Academic expertise is highly valued in Wikipedia, but it is best
demonstrated by the quality of ones edits and especially your sourcing.
Afterall most of our editors are here to share their expertise
http://commons.wikimedia.org/w/index.php?title=File:Editor_Survey_Report_-_April_2011.pdfpage=8


Wikipedia editors will boast like river boatmen about their output: how
many years they have worked on the encyclopedia, how many tens or hundreds
of thousands of edits they have made. There is some truth in that, but in
terms of status within the community featured article contributions are a
higher value currency than either tenure or edit count.


They do not gain by selling their product, and anyone suspected of writing
articles for pay on behalf public relations for an entity comes under deep
suspicion.i #sdendnote1sym As a result how many people read an article,
or how its audience has grown or fallen, or how useful it has been to the
general public are not among the criteria used to evaluate quality. That's
an interesting synthesis, there certainly is a distrust of those who edit
for pay, especially if they are from the PR industry. But I would suggest
that the distrust is more a product of people's experience with editors who
have difficulty writing neutrally about topics that they are being paid to
promote. A couple of good contrasts were mentioned in the translation
sessions at Wikimania in Gdansk in 2010, Google and its charity arm Google
org both presented about paid editing they'd commissioned in Indic
languages. The uncontentious operation was done by the charity arm,
translating English Wikipedia articles on medical articles into various
south Asian Wikipedia versions. Rather more 

Re: [Wiki-research-l] New tool to help find topics for editing

2012-07-30 Thread WereSpielChequers
You might want to ask Suggest Bot users to try it out.
http://en.wikipedia.org/wiki/Wikipedia:SUGGESTBOT

My suspicion would be that more people will be interested on articles
related to topics they cover than ones in sources they can read. But both
approaches may have their users, and the experience of SuggestBot  would be
worth learning from.

WSC

On 30 July 2012 21:00, Steven Walling swall...@wikimedia.org wrote:

 On Mon, Jul 30, 2012 at 12:12 PM, Dave Musicant dmusi...@carleton.eduwrote:

 Hi folks -

 Our research team at Carleton College has just launched a new tool that
 recommends Wikipedia articles to edit based on news that you're interested
 in. Most news sites have Twitter or RSS feeds that update as new articles
 are published. wikiFeed (our tool) invites editors to put in their
 preferred news sources' Twitter or RSS feeds - from politics to pop
 culture, or whatever - and finds the most relevant Wikipedia articles to
 edit based on that content.

 We're trying to conduct a study on the how well wikiFeed works, and would
 love it if you or students of yours could sign up, try it, and continue
 using it if they find it useful. Can you pass the word along, and/or try it
 yourself if you're interested?

 Here's our website:

 http://wikistudy.mathcs.**carleton.eduhttp://wikistudy.mathcs.carleton.edu

 Thanks for your help!


 --
 Dave


 This is awesome. Is the source available, or at least some documentation
 of your architecture?

 --
 Steven Walling
 https://wikimediafoundation.org/




 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia's response to 2012 Aurora shooting

2012-07-23 Thread WereSpielChequers

 ** **

 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Taha Yasseri
 *Sent:* Monday, 23 July 2012 3:20 AM

 *To:* Research into Wikimedia content and communities
 *Subject:* Re: [Wiki-research-l] Wikipedia's response to 2012 Aurora
 shooting

 ** **

 I resend my previous message that is not delivered yet. Sorry for
 potential duplicate receiving .

 Now, after two days, there are 30 Wikipedia language editions who have
 covered the event (have an article on it).
 Here: http://wwm.phy.bme.hu/blog.html, see the dynamics, i.e. number of
 covering WPs versus time, measured in minutes and counted from the event
 time (t=0).
 For those who are familiar with spreading phenomena, the curve comes as
 no surprise. What is surprising, is the fast reaction of Latvian  (3rd
 place) and rather late reaction of Japanese Wikipedia (the latter is most
 likely related to time zone effects).

 As I did this in a very unprofessional way, errors and miscalculations
 are expected, please notify if find.

 bests,
 .taha

 

 On Sun, Jul 22, 2012 at 7:06 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:

 Nice (and timely) work as usual, Brian. I was going to enable AFTv5 on
 this article but decided to hold off for a number of reason (most
 importantly the fact that we're slowly ramping up AFTv5 to enwiki and we're
 mostly focused on scalability at the moment). It'd be interesting to study
 how enabling reader feedback affects the collaborative dynamics of breaking
 news articles, especially semi-protected ones on which anonymous
 contributors don't have a voice.

 ** **

 Dario

 ** **

 ** **

 ** **

 On Jul 21, 2012, at 5:06 PM, WereSpielChequers wrote:



 

 It is currently semiprotected, there were IP edits when it was first
 created. But according to the logs it was fully protected for a while due
 to IP vandalism. However the edit history only shows it going to semi
 protection, but there were some moves which have complicated things

 ** **

 WSC

 On 21 July 2012 22:46, Taha Yasseri taha.yas...@gmail.com wrote:

 Ok! the page is protected. Sorry!

 ** **

 On Sat, Jul 21, 2012 at 11:43 PM, Taha Yasseri taha.yas...@gmail.com
 wrote:

 Thank you Brian,
 Could you also plot the absolute number of edits, and editors, (instead
 of the ratio)? Though, since the data is ready I could do it on my own too!

 Surprisingly I see no IP contribution to the article (or may be only
 few), not in accord with my expectation for such a topic.

 cheers,
 .Taha

 On Sat, Jul 21, 2012 at 9:05 PM, Brian Keegan bkee...@northwestern.edu
 wrote:

 My preliminary analysis of (English) Wikipedia's response to the 2012
 Aurora shootings. Data is available at the bottom:

 ** **

 http://www.brianckeegan.com/2012/07/2012-aurora-shootings/
 

 ** **

 --
 Brian C. Keegan
 Ph.D. Student - Media, Technology,  Society
 School of Communication, Northwestern University

 Science of Networks in Communities, Laboratory for Collaborative
 Technology

 ** **

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Taha.



 

 --
 Taha.

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ** **

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 ** **


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Taha.

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Brian C. Keegan
 Ph.D. Student - Media, Technology,  Society
 School of Communication, Northwestern University

 Science of Networks in Communities, Laboratory for Collaborative Technology

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] RCom and the Subject Recruitment Approvals Group

2012-07-19 Thread WereSpielChequers
The current system is not ideal, but I would suggest we need a few more
active participants rather than to subdivide a fairly quiet team.

A while ago I did the second review at
https://meta.wikimedia.org/wiki/Research_talk:Women_and_Wikipedia:_Contributions_in_a_Collaborative_Online_Spaceand
am still waiting for feedback - though my review was after such a long
wait that the researcher may have given up on us.

More reviewers would help ensure we give a faster response. Perhaps we
should move the reviews to the specific project? I'm sure it would be
easier to get Wikipedians to comment if we were discussing things on
Wikipedia.

An invitation for new participants might be timely

WSC

On 19 July 2012 21:54, ENWP Pine deyntest...@hotmail.com wrote:

 Hi Yaroslav,

 What we are discussing are alternatives to the current procedure, rather
 than specific requests for approval of recruiting. Please see
 http://en.wikipedia.org/wiki/**Wikipedia:Recruitment_policyhttp://en.wikipedia.org/wiki/Wikipedia:Recruitment_policy,
 https://meta.wikimedia.org/**wiki/Subject_Recruitment_**Approvals_Grouphttps://meta.wikimedia.org/wiki/Subject_Recruitment_Approvals_Group,
 https://meta.wikimedia.org/**wiki/Research_talk:Committee/**
 Areas_of_interest/Subject_**recruitmenthttps://meta.wikimedia.org/wiki/Research_talk:Committee/Areas_of_interest/Subject_recruitment,
 and the comments on project review and subject review at
 https://meta.wikimedia.org/**wiki/Research:Committee/**Reorganizationhttps://meta.wikimedia.org/wiki/Research:Committee/Reorganization
 .

 Pine



 -Original Message- From: Yaroslav M. Blanter
 Sent: Thursday, 19 July, 2012 13:31

 To: Research into Wikimedia content and communities
 Subject: Re: [Wiki-research-l] RCom and the Subject Recruitment
 ApprovalsGroup

 Hi Pine,

 I do not think it is correct. The proposals requiring SR discussions
 are being announced on Rcom mailing list, and then it is just a matter
 of who is available. For instance, for the last proposal I was
 traveling, and then I saw that it has already been reviewed, and there
 is nothing justifying my post-deadline reaction, so I just did not
 react. For some other proposals, I was actively participating in the
 evaluation.

 Cheers
 Yaroslav


 Regarding the procedure for subject recruitment approvals, I get the
 impression that at least two small groups of editors have worked on
 separate proposals. I would suggest creating a working group to
 integrate these and/or to decide to forward both of the proposals to
 the community as alternatives.

 Thanks very much.



 __**_
 Wiki-research-l mailing list
 Wiki-research-l@lists.**wikimedia.orgWiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 __**_
 Wiki-research-l mailing list
 Wiki-research-l@lists.**wikimedia.orgWiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wisdom of the crowd vs. wisdom of the experts and insiders

2012-07-08 Thread WereSpielChequers
The General Notability Guideline is our friend here. Because we require
articles to be verifiable that particular scenario doesn't apply - we
frequently have people try and add articles and content in situations as
unverifiable as the one the NY Times details. But we reject such content.

Where I believe our crowdsourcing model breaks down is when we don't have a
crowd, or we work too quickly for crowds to form:

- Speedy deletion where an admin and maybe one other editor will summarily
delete stuff, in theory only if it meets some strict criteria.

- Our smaller wikis. We now have about a thousand, and the wisdom of crowds
is inherently vulnerable to subdivision of crowds. A One wiki per
language, plus one multilingual wiki for all those things where we work
across languages would be a better model.

WSC

On 8 July 2012 00:18, ENWP Pine deyntest...@hotmail.com wrote:


 I thought this was interesting so I’m passing it along. This sentence
 particularly caught my attention: “The answer, I think, is to take the best
 of what both experts and markets have to offer, realizing that the
 combination of the two offers a better window onto the future than either
 alone.” Substitute the word “crowds” for “markets”, and perhaps there is
 something here that could be applied to Wikipedia in our quest for quality,
 mixing the best of expertise and crowdsourcing. I’d be very interested in
 hearing comments from other Wikipedians.


 https://www.nytimes.com/2012/07/08/sunday-review/when-the-crowd-isnt-wise.html

 Cheers,

 Pine

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

2012-06-27 Thread WereSpielChequers
Hi Fabian,

That looks interesting, but I wondered if you were aware of some of the
possible results when you are editing Wikipedia articles section by section?

If an article has multiple sections then it doesn't matter how many edits
have been made to other sections, if you want to undo the most recent edit
to a particular section then you can just hit undo or rollback and revert
it. The contents of the whole article will be a new and potentially unique
revision as one section will have reverted to what it was before it was
vandalised and the other sections will be as they were before the latest
revert.

You could get some interesting examples by looking at the history of the
article on Sarah Palin on the night she became John McCain's running mate.
The edit rate peaked at 25 edits per minute, that should make it a good
example of an article where edits were only being done one section at a
time as anyone who tried to edit the whole article would have been pretty
much guaranteed an edit conflict. As I remember it there were multiple edit
wars taking place simultaneously in different sections of the article, none
would have taken the whole article back to a previous version, just one
section.

WereSpielChequers

On 27 June 2012 18:05, Floeck, Fabian (AIFB) fabian.flo...@kit.edu wrote:

 For those of you who are interested in reverts:
 I just presented our paper on accurate revert detection at the ACM
 Hypertext and Social Media conference 2012, showing a significant accuracy
 (and coverage) gain compared to the widely used method of finding identical
 revisions (via MD5 hash values) to detect reverts, proving that our method
 detects edit pairs that are significantly more likely to be actual reverts
 according to editors perception of a revert and the Wikipedia definition.
 35% of the reverts found by the MD5 method in our sample are not assessed
 to be reverts by more than 80% of our survey participants (accuracy 0%).
 The provided new method finds different reverts for these 35% plus 12%
 more, which show a 70% accuracy.

 Find the PDF slides, paper and results here:
 http://people.aifb.kit.edu/ffl/reverts/

 I'll be happy to answer any questions.


 More in detail:
 The MD5 hash method employed by many researchers to identify reverts (as
 some others, like using edit  comments) is acknowledged to produce some
 inaccuracies as far as the Wikipedia definition of a revert (reverses the
 actions of any editors, undoing the actions..) is concerned. The extent
 of these inaccuracies is usually judged to be not too large, as naturally,
 most reverting edits are carried out immediately after the edit to be
 reverted, being an identity revert (Wikipedia definition: ..*normally* 
 results
 in the page being restored to a version that existed previously). Still,
 there has not been a user evaluation assessing how well the detected
 reverts conform with the Wikipedia definition and what users actually
 perceive as a revert. We developed and evaluated an alternative method to
 the MD5 identity revert and show a significant increase in accuracy (and
 coverage).
 34% of the reverts detected by the MD5 hash method in our sample actually
 fail to be acknowledged as full reverts by more than 80% of users in our
 study, while our new method performs much better, finding different reverts
 for these 34% wrongly detected reverts plus 12% more reverts, showing an
 accuracy of 70% for these newly found edit pairs actually being reverts
 according to the users. The increased accuracy performance between the
 reverts detected only by the MD5 and only by our new method is highly
 significant, while reverts detected by both methods also perform
 significantly better than those only detected by the MD5 method.

 Trade-off:
 Although this method is much slower than the MD5 method (as it is using
 DIFFs between revisions) it reflects much better what users (and the
 Wikipedia community as a whole) see as a revert. It thereby is a valid
 alternative if you are interested in the antagonistic relationships between
 users on a more detailed and accurate level. There is quite some potential
 to make it even faster by combining the two methods, decreasing the number
 of DIFFs to be performed, let's see if we can come around doing that :)

 The scripts and results listed in the paper can be found at
 http://people.aifb.kit.edu/ffl/reverts/

 Best,

 Fabian


 --
 Karlsruhe Institute of Technology (KIT)
 Institute of Applied Informatics and Formal Description Methods

 Dipl.-Medwiss. Fabian Flöck
 Research Associate

 Building 11.40, Room 222
 KIT-Campus South
 D-76128 Karlsruhe

 Phone: +49 721 608 4 6584
 Skype: f.floeck_work
 E-Mail: fabian.flo...@kit.edu
 WWW: http://www.aifb.kit.edu/web/Fabian_Flöck

 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https

Re: [Wiki-research-l] Dynamics of Conflicts in Wikipedia

2012-06-25 Thread WereSpielChequers
Hi Kerry,

There have been several nationalistic and or religious disputes that have
involved the same protagonists over numerous articles on that contentious
topic. Pretty much any topic that is controversial in real life will be
controversial on Wikipedia, with the added possibility that the Internet is
a wonderful device for putting people into contact with people from very
different cultures and with viewpoints that might not exist in their real
life society/culture/country

One good list of things that have been controversial on Wikipedia is the
list of general sanctions decreed by ARBCOM
http://en.wikipedia.org/wiki/Wikipedia:Arbitration/Active_sanctions#General_sanctions

TTFN

WereSpielChequers

On 25 June 2012 08:26, Kerry Raymond kerry.raym...@gmail.com wrote:

 ** ** ** ** **

 Thank you for sharing your paper. I found it very interesting that there
 are good metrics that enable detection of articles with conflict. I have a
 couple of questions, which might well go beyond your current study but I’d
 welcome your thoughts.

 ** **

 My first question is whether or not you think this metric or some variant
 can be used to detect current conflict in articles (rather than the
 existence of past conflict). My thinking is that if conflict can be
 detected early, it may be possible for the peacemakers to guide the
 conflict to a consensus rather than attempt to do so once hostilities are
 well-established.

 ** **

 Another question relates to warring editors. If I read it right, you
 looked for pairs (or groups) of editors that were reverting one another’s
 changes (i.e. an edit war) in an article. However, is conflict limited to
 just one article? Is it possible that warring editors on one article may
 then engage in conflicts over other articles simultaneously or later,
 either because of the same issue that caused the earlier disagreements or
 because they had developed a dislike for one another and were ready to find
 excuses to be unpleasant to each other. That is, are we just looking at
 articles that are controversial (in some way) or are we also looking at
 pairs (or groups) of editors who are actively hostile to one another. It
 might be interesting to know if editors who have been involved in edit wars
 go on to peacefully co-exist with one another on other articles, go to war
 with them over other articles, or simply never happen to encounter each
 other again (WP being a big place). If they do go on to war again, was it
 because they are both active on articles within similar categories (e.g.
 sexuality) or because one/both is stalking the other (which you might
 suspect if they had conflicts across a range of topics, especially where
 one of them had no prior edit history in that category (e.g. start warring
 over Ben Franklin and then continue it in Pumpkin).

 ** **

 Kerry

 ** **

 ** **

 ** **

 ** **

 ** **
  --

 *From:* wiki-research-l-boun...@lists.wikimedia.org [mailto:
 wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Taha Yasseri
 *Sent:* Friday, 22 June 2012 8:15 AM
 *To:* Research into Wikimedia content and communities
 *Subject:* [Wiki-research-l] Dynamics of Conflicts in Wikipedia

 ** **

 Dear Wikipedia researchers!
 Our manuscript on is now released by PLoS ONE and available at:
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0038869

 I would delightedly take your comments and remarks.

 bests
 .Taha


 Dr. Taha Yasseri.
 -
 www.phy.bme.hu/~yasseri http://www.phy.bme.hu/%7Eyasseri

 Department of Theoretical Physics
 Institute** of **Physics
 Budapest** **University of Technology and Economics

 Budafoki út 8.
 H- Budapest**, **Hungary

 tel: +36 1 463 4110
 fax: +36 1 463 3567
 -


 --
 Taha.

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] open letter to researchers

2012-05-26 Thread WereSpielChequers
Even if we weren't in a recession, money is not an unlimited resource. The
fair comparison is not between those in the class who pass and those who
fail to get the research grant; But between those who applied for the class
and those who applied for the grant.

WSC

On 22 May 2012 20:45, Joe Corneli holtzerman...@gmail.com wrote:

 I thought this might be of interest particularly in light of the
 recent conversations
 here about academics vs wikipedians. - Joe

 Abstract

 Since access to research funding is difficult, particularly for young
 researchers, we consider a change in approach: We are the funding
 opportunity! I'll develop this idea further in the comments that
 follow.  This is an open letter to circulate to research mailing
 lists which I hope will bring in new interest in the Free Technology
 Guild.

 Keywords: research funding, postgraduate training

 A critique of the way research is funded

 Considering the historical technologies for doing science, it makes
 sense that public funding for research is administered via a
 competitive, hierarchical model. Science is too big for everyone to
 get together in one room and discuss.  However, contemporary
 communication technologies and open practices seem to promise
 something different: a sustained public conversation about research.
 The new way of doing things would redeem the intellectual capital
 currently lost in rejected research proposals, and would provide
 postgraduate and postdoctoral researchers with additional learning
 opportunities through a system of peer support.

 JISC recently ran an experiment moving in this direction (the JISC
 Elevator), but the actual incentive structure ended up being similar
 to other grant funding schemes, with 6 of 26 proposals funded
 (http://www.jisc.ac.uk/blog/crowd/). It strikes me that if we saw the
 same numbers in a classroom setting (6 pass, 20 fail), we would find
 that pretty appalling. Of course, people have the opportunity to
 re-apply with changes in response to another call, but the overheads
 in that approach are quite high. What if instead of a winners-take-all
 competitive model, we took a more collaborative and learning-oriented
 approach to funding research, with applicants working together, in
 consultation with funders -- until their ideas were ready? In the end,
 it's not so much about increasing the acceptance rate, but increasing
 the throughput of good ideas! Open peer review couldn't save the
 most flawed proposals; nevertheless, it could help expose and
 understand the flaws -- allowing contributors to learn from their
 mistakes and move on.

 With such an approach, funding for research and postgraduate
 training would be fruitfully combined. This modest proposal hinges on
 one simple point: transparency. Much as the taxpayer should have
 access to research results they pay for (cf. the recent of appointment
 of Jimmy Wales as a UK government advisor) and scientists should
 have access to the journals that they publish in (cf. Winston Hide's
 recent resignation as editor of Genomics), so to do we as
 citizen-scientists have a moral imperative to be transparent about how
 research funding is allocated, and how research is done. Not just
 transparent: positively pastoral.


 The Free Technology Guild: a candidate solution

 Suppose someone needs to put together a team of four persons: a
 programmer, a statistician, an anthropologist, and a small-scale
 capitalist. This team would have the project to create a new social
 media tool over the course of 3 months; the plan is to make money
 through a subscription model. As an open online community for work on
 technology projects, the Free Technology Guild
 (http://campus.ftacademy.org/wiki/index.php/Free_Technology_Guild)
 could help:

 * by helping the project designer specify the input/output
 requirements for the project;

 * by helping the right people for the job find and join the project;

 * by providing peer support and mentoring to participants throughout
 the duration of the project.

 Because everything is developed in the open (code, models, ethnography),
 everyone wins, including downstream users, who can replicate the same
 approach with any suitable changes on demand. (And, in case things go
 badly, those results can be shared too -- the broader community can help
 everyone involved learn from these experiences in a constructive fashion.)


 What is needed now

 We are currently building the FTG on a volunteer basis, but within the
 year we hope to set up a service marketplace where we and others can
 contribute and charge for services related to free/open technology,
 science, and software. Although we have criticised the current mode of
 research funding as inefficient, we would be enthusiastic about
 contributing to grant proposals that would support our work to build a
 different kind of system.  But without waiting for funding to arrive,
 we are actively recruiting volunteers to form the foundation of the
 Free 

Re: [Wiki-research-l] - solutions re academe Wiki

2012-05-23 Thread WereSpielChequers
Hi Richard, you queried in a previous posting whether relations between
Academia and Wikipedians were better in the UK. But I suspect that no-one
is truly in a position to answer that. In both the US and the UK the
situation will be complex, some Academics are Wikipedians, some Academics
judge us by the quality we'd achieved by 2006 and really need to check
again and reassess the project. Some Academics respect and value us for the
way we try to teach today's kids not to cut and paste. Others despair at us
as the source of much of the plagiarism they receive from students.

Of course this is a very different issue to the debate about Open source
freely available journals, a debate where some people on this list have
strongly held and diametrically opposed views. Wikipedia is a Tertiary
source not a Primary or Secondary one and cannot exist without those
primary and secondary sources. So their continued health matters to us, but
clearly there is a divide as to how that continued health is to be
achieved, and indeed defined. Wikimedia is itself very much a part of the
open source movement, but that doesn't mean that all Wikimedians believe
that everything should be open source.

As for your two suggestions about attending scholarly conferences and
working with libraries, there has been a different emphasis between the US
and the UK in the last couple of years. Here in the UK we have prioritised
outreach to GLAM sector (Galleries, Libraries, Archives and Museums),
whilst the US prioritised Universities.

That seems to be shifting, with the UK expanding its education links: 
http://uk.wikimedia.org/wiki/EduWiki_Conference_2012
http://uk.wikimedia.org/wiki/Education_strategy


Whilst the US is now expanding its GLAM program.

I have participated in editathons we've had in the UK at both the British
Museum and the Victoria and Albert Museum, I didn't take part in the
British library one, but I gather it was a success. I think that would
count as one of your training program for experienced Wiki editors at a
major research library. The sort of articles coming out of these
collaborations include http://www.en.wikipedia.org/wiki/Hoxne_hoard

WSC



On 23 May 2012 05:30, Richard Jensen rjen...@uic.edu wrote:

 Sadly I think this discussion demonstrates some hostility toward academe.
  (here's a quote from yesterday addressed to me on this list: ...knowledge
 robberbarons standing athwart history imagining they and their institutions
 alone, had the requisite skills and expertise to engage in knowledge
 production. Until they didn't. Enjoy your new neighbors in trash heap of
 history.  I would code his emotional tone as hostile)

 Well it's always nice to see people citing the lessons of history,
 especially since I'm a specialist in that sort of OR.   But the underlying
 hostility is a problem that bothers me a lot and I have been trying to
 think of ways to bridge the gap.  There is in operation a Wikimedia
 Foundation  Education program that is small and will not, in my opinion,
 scale up easily to the size needed.  In any case the Foundation plans to
 cut the US-Canada program  loose in 12 months to go its own way. see
 http://en.wikipedia.org/wiki/**Wikipedia:Education_Working_**
 Group/Wikimedia_Foundation_**Rolehttp://en.wikipedia.org/wiki/Wikipedia:Education_Working_Group/Wikimedia_Foundation_Role

 My own thinking is currently along two lines:

 a) set up a highly visible Wiki prsence at scholarly conventions (in
 multiple disciplines) with 1) Wiki people at booths to explain the secrets
 of Wikipedia to interested academics and 2) hands-on workshops to show
 professors how to integrate student projects into their classes.  (and yes,
 professors given paid time off to attend these conventions, often plus
 travel money.)

 b) run a training program for experienced Wiki editors at a major research
 library. (I'm thinking just of Wiki history editors here.) For those who
 want it provide access to sources like JSTOR. Bring in historians covering
 main historiographical themes. I think this could help hundreds of editors
 find new topics, methods and sources that would lead to hundreds of
 thousands of better edits.

 Richard Jensen



 __**_
 Wiki-research-l mailing list
 Wiki-research-l@lists.**wikimedia.orgWiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] the gulf between Wikipedia and Academe

2012-05-21 Thread WereSpielChequers
Hi Richard,

Apart from Featured Article work, I suspect that a very large proportion of
our referencing is driven by Google search and latterly Google Books. There
have been a few schemes to give the more active editors accounts with
various reference sources - some Highbeam accounts were recently divvied
out, and a large proportion of us in the UK can get such subscriptions via
our libraries. But if the first phase of Wikipedia was people writing what
they knew, we are still largely in the second phase with most of the
sourcing done via the Internet.

It would be interesting to see if there were many takers for a training
session on using other sources, but with the majority of our editors, and
especially the content creators, being graduates, post graduates or current
undergraduates it would be a fair assumption that a very large proportion
of our editors know how to access journals, but it would be interesting to
find out whether they don't do so due to lack of time lack of access or
some other reason.


As for the idea that students use the pedia and professors disparage it,
that is of course something of a simplification, a few months ago I met
someone who'd been to a Cambridge meetup and been in the minority of
non-professors present. But Cambridge will of course be ahead of the game
in this sort of thing. I suspect the main issue here is conservatism, and
in a few years time Academics who are hostile to Wikipedia will be as
common as Academics who despise electronic calculators.

This issue of experts and Wikipedia is more complex. Wikipedians are
rightly suspicious of experts who claim that their innate knowledge
should override that of reliable sources. But experts who clearly know
their subject, can communicate it to a general audience and can furnish
sources to back up their content are usually well respected, especially if
they waive pseudonymity and use their userpage to link to their University
page. The areas where that doesn't quite work tend to be ones where
Academic views are contentious in real life. Climate change being an
extreme example.


Regards


WSC

On 21 May 2012 18:26, Richard Jensen rjen...@uic.edu wrote:

 Han-Teng Liao highlights a very serious issue regarding the large gulf
 between Wikipedia and academe. University students appear to be
 enthusiastic users of Wikipedia while the professors either shy away or are
 quite hostile and warn their students against Wikipedia.

 One factor is academe's culture of original research and personal
 responsibility by name for publications, versus Wikipedia's culture of
 anonymity and its rejection of the notion that an editor can be respected
 as an expert.

 A second factor is the need for editors to have free access to published
 reliable secondary sources. I think Google-scholar and Amazon have solved
 much of the editors' access problem regarding books.

 As for journals--which is where this debate started--I do not think that
 open access will help Wiki editors much because I am struck by how rarely
 Wiki articles (on historical topics) cite any journal articles.  I've
 offered to help editors get JSTOR articles but no one ever asks.  There is
 something in the Wiki culture that's amiss here. Possibly it's that few
 Wiki editors ever took the graduate history courses that explain how to use
 scholarly journals.

 Maybe we need a program to help our editors overcome this gap and give
 them access to a massive base of highly relevant RS.

 Richard Jensen


 __**_
 Wiki-research-l mailing list
 Wiki-research-l@lists.**wikimedia.orgWiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population?

2012-05-17 Thread WereSpielChequers
Piotr,

I've had a reasonable success rate by filing requests at
http://en.wikipedia.org/wiki/Wikipedia:Bot_requests. Several programmers
keep an eye on it and if they think the task interesting and useful you may
get lucky.

WSC

On 16 May 2012 18:09, Piotr Konieczny pio...@post.pl wrote:

  Dario,

 Thanks, but the last time I looked into this, running queries required
 knowing how to code going way beyond a simple knowledge of wiki syntax or
 excel functions. I think it was at WikiSym few years back where we raised
 that issue - that much of the data Wikimedia provides is limited to the
 small subset of scholars who can code with pretty names like Java or Pearl
 and such. I am pretty sure this is the reason for why social sciences have
 been lagging in Wikipedia research since day one...

 Now, if I am wrong about any of the above, do let me know. But the last
 time I looked at
 https://wiki.toolserver.org/view/Database_access#Command-line_access it
 didn't look too user friendly (for a non-coder).

 Is there any place where a non-coder can ask a Toolserv coder to run some
 of those queries? I'd be happy to trade some of my Wiki skills (as in,
 writing a DYK, or reviewing a GA) for such assistance :)

 --
 Piotr Konieczny

 To be defeated and not submit, is victory; to be victorious and rest on 
 one's laurels, is defeat. --Józef Pilsudski


 On 5/10/2012 2:29 PM, Dario Taraborelli wrote:

 Piotr,

  if you are interested in getting fresh figures about lifetime edit
 counts I recommend you register an account on the toolserver where you can
 run queries against the user table (which holds cumulative edit counts
 across all namespaces for a specific wiki). For namespace-specific counts
 you will need to use the revision table and that's much more time
 consuming.

  On a related note, this real-time dashboard I just uploaded to the
 toolserver (representing account registrations and the fraction of new
 users clicking on the edit button or passing the 1 edit threshold ) could
 be of interest http://toolserver.org/~dartar/reg2/

  Best
 Dario

   On May 10, 2012, at 10:57 AM, WereSpielChequers wrote:

 Hi Piotr,

 You might make the assumption that the difference between 4 million and 16
 million is largely editors who never get out of userspace, my experience is
 that such users are relatively rare, or at least won't dominate that 12
 million.

 I'm fairly sure that there will be a number of different groups in that 12
 million. Steve Walling, Aaron or Maryana may be able to help analyse or at
 least explain them.

 Significant groups in the 12 million will definitely include:

 1 People who registered an account and tried but never successfully saved
 an edit because when they looked they saw a wall of code and they don't do
 html. The WMF is investing a lot of money in WYSIWYG editing software in
 the hope that this will enable goodfaith but not very technical people to
 edit Wikipedia.

 2 Vandals since 2007. We have edit filters that are trying to dissuade
 vandals from saving their first edit because it triggers  one of our tests
 for probably being vandalism. These filters only came in during the last
 few years and have been improved over time - so they are deterring a
 significant proportion of recent badfaith editors from ever saving an edit.

 3 Visitors from other wikis. One of the features of Single User Login is
 that if you are logged in and you click on a link that takes you to another
 wikimedia wiki, your account becomes active at that wiki even if you never
 go near the edit button. My account is active on 92 wikis and I've edited
 in rather less than half of them. I won't go into all the reasons why one
 might visit other wikis, but if you see that an article you've written has
 equivalents in several other languages I consider it human nature to click
 on the links and look at the article. Even if you don't use Google
 translate, the choice of image and the size of the paragraphs is often
 enough to tell you whether someone has translated your work or started
 afresh.

 4 Editors whose articles have been deleted. About a quarter of new editors
 start by creating a new article rather than by editing existing articles. A
 large majority of such articles get deleted and their authors depart. If
 the 4 million is only measured on surviving edits to article space then
 there will be many hundreds of thousands whose only article space edits
 have been deleted.

 5 Zombie accounts. We now have programs that prevent people opening
 accounts that are overly similar to the names of existing editors, but
 before these filters came in many editors would protect themselves from
 such impersonation by creating such  zombie accounts themselves and
 marking their userpage with a link to their main account.

 6 Edit conflicts. Breaking news stories attract editors like moths to
 flames, our article on Sarah Palin peaked at 25 edits per minute at one
 point during the day she became John McCain's

Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population?

2012-05-10 Thread WereSpielChequers
I'm not sure that we have exactly what your asking for.

For example we have the figure of 4,058,477 but that is for registered
accounts on the English Wikipedia that have made at least one edit to an
article. Different language versions of Wikipedia are also available, but
of course registered accounts doesn't exactly tally with Wikipedians not
least because IP editors are excluded. Also I believe that early edits -
pre 2004 may not be available and I suspect that deleted edits may not be
counted.

That said we have further stats of 1,614,938 registered accounts with = 3
article edits and 772,557 =10

So http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution is
well worth looking at, but they break at 32 and 100 not 50 which may be a
problem for you.

Hope that helps

WSC

On 9 May 2012 23:42, Piotr Konieczny p...@pitt.edu wrote:

 I was looking at official stats, but I seem to be unable to find out an
 answer to the following question:
 * how many of Wikipedia editors have X edits (or fall within a range of
 edits)
 To be more precise, I am curious how many Wikipedians have:
 * exactly 1 edit
 * between 2-9 edits
 * between 10-50 edits
 I know that the total number of registered accounts is reported at
 http://en.wikipedia.org/wiki/**Wikipedia:Wikipedianshttp://en.wikipedia.org/wiki/Wikipedia:Wikipedians

 Can anybody direct me to the right page/counter that would allow me to
 obtain the above information? I hope it is obtainable without having to
 download the dump...

 Incidentally, if anybody has those numbers, in addition to replying here
 feel free to add the information and/or source the one present at
 http://en.wikipedia.org/wiki/**Wikipedia:Wikipedianshttp://en.wikipedia.org/wiki/Wikipedia:Wikipedians

 Thanks,

 --
 Piotr Konieczny
 PhD Candidate
 Dept of Sociology
 Uni of Pittsburgh

 http://pittsburgh.academia.**edu/PiotrKonieczny/http://pittsburgh.academia.edu/PiotrKonieczny/
 http://en.wikipedia.org/wiki/**User:Piotrushttp://en.wikipedia.org/wiki/User:Piotrus


 __**_
 Wiki-research-l mailing list
 Wiki-research-l@lists.**wikimedia.orgWiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wiki-**research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


  1   2   >