Re: [Wiki-research-l] Editors: research on transitions, learning over time, leaving

2017-03-22 Thread Kerry Raymond
> A few years ago the WMF did a survey of former editors, partly to 
> learn why they'd left. One of the most common responses was "I haven't left 
> yet".

With the benefit of hindsight (a wonderful thing), that might be a bad way to 
have asked the question. A better way might have been to ask why they are no 
longer active and what circumstances/change would be likely to make them active 
again. What we really want to know if the reasons for inactivity are 
internal/external to Wikipedia and whether the conditions for re-engagement are 
internal/external to Wikipedia. And for the internal ones, we'd like to know 
more specifically what they are.

"I haven't left yet, but as soon as my new baby has started school, I might 
have the time for Wikipedia again" (i.e. the cause of inactivity  and return to 
activity is outside of Wikipedia's control).  There is not a lot Wikipedia can 
do about such a contributors.

"I left because I was sick and tired of the unpleasant way people behave, but I 
enjoyed contributing otherwise and would do so again if the culture was a lot 
nicer" is something that WP has some control over but not something you can fix 
in an afternoon.

"I left because I just found it too hard, I kept forgetting when to use [[ and 
when to use {{ and I never figured out that  thing" is someone that we 
could potentially re-engage on the spot by saying "hey, try the Visual Editor!".

Or maybe "I haven't left yet" is more literally true than we think. It is 
possible that the person is still active on Wikipedia but under a different 
user name or as an IP so they just appear to have become inactive under their 
former user name. If a person has had some unpleasant experiences on Wikipedia 
and that is why they became inactive, there are a lot of good reasons why they 
might not like to return under the same user name. Wikipedia has an infinitely 
long memory for things like bans and blocks and watch lists last forever. If 
you got yourself in trouble previously but you want to start afresh, you 
probably want to create a new account. If you had bad experiences with some 
other user who was regularly unpleasant to you, you would want a new account as 
they can watch your User page and Talk page forever to detect if you ever 
return. *Changing* your user name doesn't solve that problem, creating a new 
account does. And of course you may just have forgotten your username or your 
password and created a new account. 

Personally, I am inclined to think that the "I haven't left yet" editors (who 
aren't active under another user name) are probably effectively lost to us. 
Some other interest has almost certainly chewed up their spare time during 
their absence from Wikipedia. There's a big gap between "I'm not saying No" to 
"I'm saying Yes".

The other issue is that even if the desired circumstances for re-engagement are 
in place, you still need some kind of way to communicate this fact to the "lost 
users". Given that providing an email address isn’t mandatory on creating an 
account, we can only communicate with those who did provide an email address 
and hope it is still an active one. 

For example, perhaps we should be emailing all the "lost users" (where we can) 
periodically and saying "Hey, try that Visual Editor" or "get involved with 
#1Lib1Ref" or mentioning some other positive thing that might convince them to 
give it another go. 

It's been said (and I really don't know if it's true) that people respond 
better to being needed than to being wanted. Maybe we can use that in Project 
Boomerang. Find an article that the lost user has made a lot of contributions 
to but which hasn't grown much since (ignoring all the re-categorisations, MoS 
enforcements, reverted vandalisms, and other edits that don't greatly enhance 
the information content of an article) and tell them that article XYZ needs 
them to come and keep it up-to-date.

In sales, they often say it is 10x the effort to get a new customer than to 
retain an existing one. Maybe instead of putting  effort into onboarding new 
users (who we have to put through a massive learning curve very fast or watch 
them die the slow death of many reverts and AfC rejections), we should put more 
effort into re-engaging lost users (there's less of a learning curve to bring 
them back).

Kerry





___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Editors: research on transitions, learning over time, leaving

2017-03-22 Thread Stuart A. Yeates
I know that I was recruited to Wikipedia from then-competitor everything2,
it would be interesting to find active users who joined during E2's
precipitous decline, match their accounts and compare editing styles.

cheers
stuart

On Tuesday, March 21, 2017, WereSpielChequers 
wrote:

> Dear Jan,
>
> It's a fascinating topic and one that interests me as well.
>
> But you have to be careful with your assumptions, our data is almost always
> based on user accounts, but we'd like to think we are looking at people.
> Some of whom will have different accounts over time. Some of the
> involvement will switch between projects - apparently half the founding
> Wikidata community were previously active in the movement. Some will spend
> periods of their volunteer time off wiki - many very active volunteers put
> time in as Arbcom members, OTRS volunteers or chapter trustees.
>
>
> Volunteers are very very different to staff or even subscribers, barely 16
> years into the project we simply don't have the data to workout longterm
> patterns of retention and reactivation, but the signs so far are that
> Wikipedia is beginning to look like other volunteer organisations that
> people have a multi decade relationship with.
>
> A few years ago the WMF did a survey of former editors, partly to learn why
> they'd left. One of the most common responses was "I haven't left yet".
>
> WSC
>
> On 20 March 2017 at 09:34, Jan Dittrich  wrote:
>
> > Hello,
> >
> > I am looking for research on how editors transition through various
> levels
> > of involvement in their time as editors. The questions I ask myself are:
> >
> > - How many people to come each month?
> > - How many editors leave?
> >
> > …those are not too difficult to answer but…
> >
> > - How many people become more involved over time? E.g. How many each
> month
> > come to a level where they are interested in handling many pages on the
> > watchlist, learn the less obvious aspects of wiki culture etc.
> >
> > In my work as designer I am often involved in features for intermediate
> > and/or very involved users and I’m wondering if there are any ballpark
> > estimates of how many people learn these features each month.
> >
> > Jan
> >
> > --
> > Jan Dittrich
> > UX Design/ User Research
> >
> > Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> > Phone: +49 (0)30 219 158 26-0
> > http://wikimedia.de
> >
> > Imagine a world, in which every single human being can freely share in
> the
> > sum of all knowledge. That‘s our commitment.
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> > Körperschaften I Berlin, Steuernummer 27/029/42207.
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikidata] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
You are conflating two things that are not related. ORES is really helpful
and there is plenty of room for it to function extremely well on Wikidata.
Yes, ORES will do good things for Wikidata but it is separated from the
proposed item quality.

When an item is created because of the existence on the Kyrgyz Wikipedia,
it may not even have a label. This happens a lot and Amir, the developer of
ORES has a bot that he regularly runs to add one label. The name of the
article. When an article is added the same bot does the same work. We do
not have to rate it for a specific "level of quality" when we are to
improve our quality, this bot runs every week automatically.  When an
automated rating system sees this, it can perform the remedy..

When as I understand from Lydia, the ratings are done automagically, in an
ORES kinda way, it gains relevance. The point though is that still the
quality of all the individual items on their own is of hardly any
significance. It becomes relevant when it can accept the result of a query
and provide results on the basis of that.. For instance.. all the items
with "catalog" "black lunch table" and give me a rating for all the
articles with no articles or no articles in Spanish...

The problem with rating "item quality" is that on its own, it has no
application. I just finished adding award winners [1]. Based on the quality
of the English article, I added the award winners using "linked items" and
"petscan". I added a few items because as there are no articles. The
Turkish article for the award has plenty of red links. It would be a
quality improvement when these red links are associated with the items. An
article writer immediately finds the English article and the Wikidata
statements. That is actionable quality as it provides a way to stabilise
both Wikidata, en.wp and tr.wp. For the Turkish language the labels of the
red links may be used. When you consider quality, it is like an onion. On
the first level all the information is there, on the second level, we may
be missing education (that may still be in Freebase).. an employer.
whatever makes sense in a context

When there is not even a red link but a text like for English, a tool like
ORES could recognise the label in the text and accept it as a result that
is probably positive.

When we are really interested in quality, we need to compare the content of
the many projects including Wikidata and find that it is in balance.

My key criticism stands of the current quality standard: this is a first
step but there are severe doubt of the relevance of several aspects. As a
first iteration it will prove what is good and where it needs improvement.
But without an interface into query it is useless.
Thanks,
  GerardM



[1] https://tools.wmflabs.org/reasonator/?q=Q13582570

On 22 March 2017 at 15:33, Aaron Halfaker  wrote:

> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > What I have read is that it will be individual items that are graded.
> That
> > is not what helps you determine what items are lacking in something. When
> > you want to determine if something is lacking you need a relational
> > approach. When you approach a award like this one [1], it was added to
> make
> > the award for a person [2] more complete. No real importance is given to
> > this award, just a few more people were added because they are part of a
> > group that gets more attention from me [3]. For yet another award [4], I
> > added all the people who received the award because I was told by
> someone's
> > expert opinion that they were all notable (in the Wikipedia sense of the
> > word). I added several of these people in Wikidata. Arguably, the
> Wikidata
> > the quality for the item for the award is great but it has no article
> > associated to it in Wikipedia but that has nothing to do with the quality
> > of the information it provides. It is easy and obvious to recognise in
> one
> > level deeper that quality issues arise; the info for several people is
> > meagre at best.You cannot deny their relevance 

Re: [Wiki-research-l] [Wikidata] The basis for Wikidata quality

2017-03-22 Thread Aaron Halfaker
Hey wiki-research-l folks,

Gerard didn't actually link you to the quality criteria he takes issue
with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
unclear how that is relevant to the goal of measuring the quality of
items.  This is something I've been talking to Lydia about for a long
time.  It's been great for the few Wikis where we have models deployed in
ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
same for Wikidata.   As Lydia said, we do all sorts of fascinating things
with a model like this.  Honestly, I think the criteria is coming together
quite nicely and we're just starting a pilot labeling campaign to work
through a set of issues before starting the primary labeling drive.

1. https://ores.wikimedia.org

-Aaron



On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen 
wrote:

> Hoi,
> What I have read is that it will be individual items that are graded. That
> is not what helps you determine what items are lacking in something. When
> you want to determine if something is lacking you need a relational
> approach. When you approach a award like this one [1], it was added to make
> the award for a person [2] more complete. No real importance is given to
> this award, just a few more people were added because they are part of a
> group that gets more attention from me [3]. For yet another award [4], I
> added all the people who received the award because I was told by someone's
> expert opinion that they were all notable (in the Wikipedia sense of the
> word). I added several of these people in Wikidata. Arguably, the Wikidata
> the quality for the item for the award is great but it has no article
> associated to it in Wikipedia but that has nothing to do with the quality
> of the information it provides. It is easy and obvious to recognise in one
> level deeper that quality issues arise; the info for several people is
> meagre at best.You cannot deny their relevance though; removing them
> destroys the quality for the award.
>
> The point is that in relations you can describe quality, in the grading
> that is proposed there is nothing really that is actionable.
>
> When you add links to the mix, these same links have no bearing on the
> quality of the Wikidata item. Why would it? Links only become interesting
> when you compare the statements in Wikidata with the links to other
> articles in the same Wikipedia. This is not what this approach brings.
>
> Really, how will the grades to items make a difference. How will it help us
> understand that "items relating to railroads are lacking"? It does not.
>
> When you want to have indicators for quality; here is one.. an author (and
> its subclasses) should have a VIAF identifier. An artist with objects in
> the Getty Museum should have an ULAN number. The lack of such information
> is actionable. The number of interwiki links is not, the number of
> statements are not and even references are not that convincing.
> Thanks,
>   GerardM
>
> [1] https://tools.wmflabs.org/reasonator/?&q=29000734
> [2] https://tools.wmflabs.org/reasonator/?&q=7315382
> [3] https://tools.wmflabs.org/reasonator/?&q=3308284
> [4] https://tools.wmflabs.org/reasonator/?&q=28934266
>
> On 22 March 2017 at 11:56, Lydia Pintscher 
> wrote:
>
> > On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
> >  wrote:
> > > In your reply I find little argument why this approach is useful. I do
> > not
> > > find a result that is actionable. There is little point to this
> approach
> > > and it does not fit with well with much of the Wikidata practice.
> >
> > Gerard, the outcome will be very actionable. We will have the
> > groundwork needed to identify individual items and sets of items that
> > need improvement. If it for example turns out that our items related
> > to railroads are particularly lacking then that is something we can
> > concentrate on if we so chose. We can do editathons, data
> > partnerships, quality drives and and and.
> >
> >
> > Cheers
> > Lydia
> >
> > --
> > Lydia Pintscher - http://about.me/lydia.pintscher
> > Product Manager for Wikidata
> >
> > Wikimedia Deutschland e.V.
> > Tempelhofer Ufer 23-24
> > 10963 Berlin
> > www.wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> >
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
> >
> > ___
> > Wikidata mailing list
> > wikid...@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research

Re: [Wiki-research-l] [Wikidata] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
What I have read is that it will be individual items that are graded. That
is not what helps you determine what items are lacking in something. When
you want to determine if something is lacking you need a relational
approach. When you approach a award like this one [1], it was added to make
the award for a person [2] more complete. No real importance is given to
this award, just a few more people were added because they are part of a
group that gets more attention from me [3]. For yet another award [4], I
added all the people who received the award because I was told by someone's
expert opinion that they were all notable (in the Wikipedia sense of the
word). I added several of these people in Wikidata. Arguably, the Wikidata
the quality for the item for the award is great but it has no article
associated to it in Wikipedia but that has nothing to do with the quality
of the information it provides. It is easy and obvious to recognise in one
level deeper that quality issues arise; the info for several people is
meagre at best.You cannot deny their relevance though; removing them
destroys the quality for the award.

The point is that in relations you can describe quality, in the grading
that is proposed there is nothing really that is actionable.

When you add links to the mix, these same links have no bearing on the
quality of the Wikidata item. Why would it? Links only become interesting
when you compare the statements in Wikidata with the links to other
articles in the same Wikipedia. This is not what this approach brings.

Really, how will the grades to items make a difference. How will it help us
understand that "items relating to railroads are lacking"? It does not.

When you want to have indicators for quality; here is one.. an author (and
its subclasses) should have a VIAF identifier. An artist with objects in
the Getty Museum should have an ULAN number. The lack of such information
is actionable. The number of interwiki links is not, the number of
statements are not and even references are not that convincing.
Thanks,
  GerardM

[1] https://tools.wmflabs.org/reasonator/?&q=29000734
[2] https://tools.wmflabs.org/reasonator/?&q=7315382
[3] https://tools.wmflabs.org/reasonator/?&q=3308284
[4] https://tools.wmflabs.org/reasonator/?&q=28934266

On 22 March 2017 at 11:56, Lydia Pintscher 
wrote:

> On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
>  wrote:
> > In your reply I find little argument why this approach is useful. I do
> not
> > find a result that is actionable. There is little point to this approach
> > and it does not fit with well with much of the Wikidata practice.
>
> Gerard, the outcome will be very actionable. We will have the
> groundwork needed to identify individual items and sets of items that
> need improvement. If it for example turns out that our items related
> to railroads are particularly lacking then that is something we can
> concentrate on if we so chose. We can do editathons, data
> partnerships, quality drives and and and.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> wikid...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Lydia Pintscher
On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
 wrote:
> In your reply I find little argument why this approach is useful. I do not
> find a result that is actionable. There is little point to this approach
> and it does not fit with well with much of the Wikidata practice.

Gerard, the outcome will be very actionable. We will have the
groundwork needed to identify individual items and sets of items that
need improvement. If it for example turns out that our items related
to railroads are particularly lacking then that is something we can
concentrate on if we so chose. We can do editathons, data
partnerships, quality drives and and and.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
When you consider the "collaroborative dimension", it is utterly different
for Wikidata. An example: I just added a few statements to Dorothy Tarrant
[1].For several of those statements I added hundreds of similar statements
on other items. In order to add the award I had to add the award first. I
updated the date of birth and death. When you consider the statements on
most items, they are typically done by bot or by processes like I use. This
"collaborative dimension" is relevant to Wikipedia, not to Wikidata.

You state that previously developed criteria for Wikipedia are important.
Ok, how?

My problem with this approach is that it establishes Wikipedia think that
is not appropriate for Wikidata. You are imho correct where you say that
links are of more relevance. They are because they allow to compare links
between Wikipedia articles and links between Wikidata items. This is
actionable information because the two should largely be the same. I have
described [2] how Wikidata can help in improving the quality of any
Wikipedia and in the process improve its own quality. This can be done by
associating wiki links and red links with Wikidata items. Tools can be of
service in pointing out probable issues.

In your reply I find little argument why this approach is useful. I do not
find a result that is actionable. There is little point to this approach
and it does not fit with well with much of the Wikidata practice.
Thanks,
  GerardM


[1] https://tools.wmflabs.org/reasonator/?q=Q18783615
[2]
http://ultimategerardm.blogspot.nl/2016/01/wikipedia-lowest-hanging-fruit-from.html

On 22 March 2017 at 09:45, Piscopo A.  wrote:

> Hi GerardM
>
> I don’t know if I am one of the researchers you mention in your email, but
> I have indeed carried out research around Wikidata quality and still am.
> I asked the community to help me gather different point of views over what
> data quality means on Wikidata in a RfC a couple of months ago.
>
> If you wonder what happened to that and whether I published anything using
> that material without sharing any results with the community, well, the
> answer is no.
> What we collected is still there and still valuable, waiting to be
> properly analysed once we have enough time to dedicate more to it (which
> will happen later on this year).
>
> As for your questions, I agree with you that looking at Wikidata quality
> with the eyes of (only one) Wikipedia may not be helpful in understanding
> its quality and appreciate its peculiarities. I have tried until now to
> rely more on more general quality dimensions and metrics, and on Linked
> Data-related ones.
> This does not mean that the quality criteria previously developed for
> Wikipedia are not important. They take into account the collaborative
> dimension of the project and are definitely helpful to assess Wikidata as a
> community product.
>
> A short note in favour of my fellow researcher: an evaluation of Wikidata
> by item has been made already by identifying showcase items. Regardless of
> whether we think that evaluating quality by item is the most correct
> approach, I think it is definitely useful to show how good single units of
> information (Items) are. It is just ‘a’ measure for quality, not ‘the’
> measure for quality. As with all research, it may either prove itself to be
> an incredibly valuable contribution to Wikidata (I believe it will) or to
> be useless after all. Whatever the outcome, students/researchers working on
> Wikidata help raise and focus on issues that are important for the
> community and for the project itself, imho of course.
>
> Thanks,
> Alessandro
>
> –––
> Alessandro Piscopo
> Web and Internet Science Group
> School of Electronics and Computer Science
> University of Southampton
> email: a.pisc...@soton.ac.uk
>
> On 22 Mar 2017, at 06:27, Gerard Meijssen  mailto:gerard.meijs...@gmail.com>> wrote:
>
> Hoi,
> A student is going to start some work on Wikidata quality based on a model
> of quality that is imho seriously suspect. It is item based and assumes
> that the more interwiki links there are, the more statements there are and
> the more references there are, the item will be of a higher quality.
>
> I did protest against this approach and I did call into question that this
> work will help us achieve better quality at Wikidata. I did indicate what
> we should do to approach quality at Wikidata and I was indignantly told
> that research shows that I am wrong.
>
> The research is about Wikipedia not Wikidata and the paper quoted does not
> mention Wikidata at all. As far as I am concerned we have been quite happy
> to only see English Wikipedia based research and consequently I doubt there
> is Wikimedia based research that is truly applicable.
>
> At a previous time a student started work on a quality project for
> Wikidata; comparisons were to be made with external sources so that we
> could deduce quality. The student finished his or her research, 

Re: [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Piscopo A .
Hi GerardM

I don’t know if I am one of the researchers you mention in your email, but I 
have indeed carried out research around Wikidata quality and still am.
I asked the community to help me gather different point of views over what data 
quality means on Wikidata in a RfC a couple of months ago.

If you wonder what happened to that and whether I published anything using that 
material without sharing any results with the community, well, the answer is no.
What we collected is still there and still valuable, waiting to be properly 
analysed once we have enough time to dedicate more to it (which will happen 
later on this year).

As for your questions, I agree with you that looking at Wikidata quality with 
the eyes of (only one) Wikipedia may not be helpful in understanding its 
quality and appreciate its peculiarities. I have tried until now to rely more 
on more general quality dimensions and metrics, and on Linked Data-related ones.
This does not mean that the quality criteria previously developed for Wikipedia 
are not important. They take into account the collaborative dimension of the 
project and are definitely helpful to assess Wikidata as a community product.

A short note in favour of my fellow researcher: an evaluation of Wikidata by 
item has been made already by identifying showcase items. Regardless of whether 
we think that evaluating quality by item is the most correct approach, I think 
it is definitely useful to show how good single units of information (Items) 
are. It is just ‘a’ measure for quality, not ‘the’ measure for quality. As with 
all research, it may either prove itself to be an incredibly valuable 
contribution to Wikidata (I believe it will) or to be useless after all. 
Whatever the outcome, students/researchers working on Wikidata help raise and 
focus on issues that are important for the community and for the project 
itself, imho of course.

Thanks,
Alessandro

–––
Alessandro Piscopo
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
email: a.pisc...@soton.ac.uk

On 22 Mar 2017, at 06:27, Gerard Meijssen 
mailto:gerard.meijs...@gmail.com>> wrote:

Hoi,
A student is going to start some work on Wikidata quality based on a model
of quality that is imho seriously suspect. It is item based and assumes
that the more interwiki links there are, the more statements there are and
the more references there are, the item will be of a higher quality.

I did protest against this approach and I did call into question that this
work will help us achieve better quality at Wikidata. I did indicate what
we should do to approach quality at Wikidata and I was indignantly told
that research shows that I am wrong.

The research is about Wikipedia not Wikidata and the paper quoted does not
mention Wikidata at all. As far as I am concerned we have been quite happy
to only see English Wikipedia based research and consequently I doubt there
is Wikimedia based research that is truly applicable.

At a previous time a student started work on a quality project for
Wikidata; comparisons were to be made with external sources so that we
could deduce quality. The student finished his or her research, I assume
wrote a paper and left us with no working functionality. It is left at
that. So the model were a student can do vital work for Wikidata is also
very much in doubt.

I wrote in an e-mail to user:Epochfail:

Hoi,
You refer to a publication, the basis for quality and it is NOT about
Wikidata but about Wikipedia. What is discussed is quality for Wikidata
where other assumptions are needed. My point to data is that its quality is
in the connections that are made.

To some extend Wikidata reflects Wikipedia but not one Wikipedia, all
Wikipedias. In addition there is a large and growing set of data with no
links to Wikipedia or any of the other Wikimedia projects.

When you consider the current dataset, there are hardly any relevant
sources. They do exist by inference - items based on Wikipedia are likely
to have a source - items on an award are documented on the official website
for the award - etc.

Quality is therefore in statements being the same on items that are
identified as such.

When you consider Wikidata, it often has more items relating to a
university, an award than a Wikipedia does and often it does not link to
items representing articles in a specific Wikipedia. When you consider this
alone you have actionable difference of at least 2%.

Sure enough plenty of scope of looking at Wikidata in its own context and
NOT quoting studies that have nothing to do with Wikidata.
Thanks,
GerardM


My question to both researchers and Wikidata people is: Why would this
Wikipedia model for quality apply to Wikidata? What research is it based
on, is that research applicable and to what extend? Will the alternative
approach to quality for WIKIDATA not provide us with more and better
quality that will also be of relevance to W