Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Only using sitelinks as a weak indication of quality seems correct to me.
Also the idea that some languages are more important than other, and some
large languages are more important than other. I would really like it if
the reasoning behind the classes and the features could be spelled out.

I have serious issues with the ORES training sets, but that is another
discussion. ;/ (There is a lot of similar bot edits in the sets, and that
will train a bot-detector, which is not what we need! Grumpf…)

On Wed, Mar 22, 2017 at 3:33 PM, Aaron Halfaker 
wrote:

> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com> wrote:
>
>> Hoi,
>> What I have read is that it will be individual items that are graded. That
>> is not what helps you determine what items are lacking in something. When
>> you want to determine if something is lacking you need a relational
>> approach. When you approach a award like this one [1], it was added to
>> make
>> the award for a person [2] more complete. No real importance is given to
>> this award, just a few more people were added because they are part of a
>> group that gets more attention from me [3]. For yet another award [4], I
>> added all the people who received the award because I was told by
>> someone's
>> expert opinion that they were all notable (in the Wikipedia sense of the
>> word). I added several of these people in Wikidata. Arguably, the Wikidata
>> the quality for the item for the award is great but it has no article
>> associated to it in Wikipedia but that has nothing to do with the quality
>> of the information it provides. It is easy and obvious to recognise in one
>> level deeper that quality issues arise; the info for several people is
>> meagre at best.You cannot deny their relevance though; removing them
>> destroys the quality for the award.
>>
>> The point is that in relations you can describe quality, in the grading
>> that is proposed there is nothing really that is actionable.
>>
>> When you add links to the mix, these same links have no bearing on the
>> quality of the Wikidata item. Why would it? Links only become interesting
>> when you compare the statements in Wikidata with the links to other
>> articles in the same Wikipedia. This is not what this approach brings.
>>
>> Really, how will the grades to items make a difference. How will it help
>> us
>> understand that "items relating to railroads are lacking"? It does not.
>>
>> When you want to have indicators for quality; here is one.. an author (and
>> its subclasses) should have a VIAF identifier. An artist with objects in
>> the Getty Museum should have an ULAN number. The lack of such information
>> is actionable. The number of interwiki links is not, the number of
>> statements are not and even references are not that convincing.
>> Thanks,
>>   GerardM
>>
>> [1] https://tools.wmflabs.org/reasonator/?&q=29000734
>> [2] https://tools.wmflabs.org/reasonator/?&q=7315382
>> [3] https://tools.wmflabs.org/reasonator/?&q=3308284
>> [4] https://tools.wmflabs.org/reasonator/?&q=28934266
>>
>> On 22 March 2017 at 11:56, Lydia Pintscher 
>> wrote:
>>
>> > On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
>> >  wrote:
>> > > In your reply I find little argument why this approach is useful. I do
>> > not
>> > > find a result that is actionable. There is little point to this
>> approach
>> > > and it does not fit with well with much of the Wikidata practice.
>> >
>> > Gerard, the outcome will be very actionable. We will have the
>> > groundwork needed to identify individual items and sets of items that
>> > need improvement. If it for example turns out that our items related
>> > to railroads are particularly lacking then that is something we can
>> > concentrate on if we so chose. We can do editathons, data
>> > partnerships, quality drives and and and.
>> >
>> >
>> > Cheers
>> > Lydia
>> >
>> > --
>> > Lydia Pintscher - http://about.me/lydia.pintscher
>> > Product Manager for Wikidata
>> >
>> > Wikimedia Deutschland e.V.
>> > Tempelhofer Ufer 23-24
>> > 10963 Berlin
>> > www.wikimedia.de
>> >
>> > 

Re: [Wikidata] [wikicite-discuss] Re: Wikipedia Task Lists for Editathons using Wikidata

2017-03-22 Thread Gerard Meijssen
Hoi,
What we use is "catalog" "Black Lunch Table". It has qualifiers for the
place of the editathon. This allows us to subset the data. We only maintain
the data once. We have subsets for women only and for the people near
Toronto..

As to the data; it is maintained in Wikidata and it is shown in any
Wikipedia. So the same data can live in multiple Wikipedias and will be
updated daily or manually. As and when required. One set of data for all
purposes..

PS We could have a subset for BLT people who are also a member of "Alpha
Kappa Alpha".. or studied at Harvard. That is the power of query.
Thanks,
  GerardM

On 21 March 2017 at 13:45, Vladimir Alexiev  wrote:

> >> Listeria makes auto-updating lists on pages, based on a SPARQL query.
> >> The question is whether you have a common characteristic to catch all
> your items, since eg Category is not it (no such prop on WD).
> > I'm confused about the example that the category is not a property of
> Wikidata. Is it not a query-able property in SPARQL to generate this type
> of output?
>
> Right: there’s no property Category.  See these discussions
> https://www.wikidata.org/wiki/Wikidata:Property_proposal/
> Archive/30#category
> https://www.wikidata.org/wiki/Wikidata:Property_proposal/
> Archive/30#useful_for
>
> > After creating the Wikidata item for Visual artists of the African
> diaspora, I started adding that category to artist Wikidata items –
> > - as well as a Commons category if they had media.
> > So if I run a SPARQL query using category it won't generate results?
>
> Did you add it as “item’s main category”? That’s incorrect, since that’s
> inverse of “category’s main item”,
> and a category is supposed to have a single main item (e.g. page France vs
> category France).
>
> > I'm confused because the other suggestion was to tag items as of
> interest to Black Lunch Table. The category seems to be functioning in the
> same way, not very different.
>
> How did you tag “of interest to”?
>
> > One other question: is the task list on Listeria usable on Wikipedia
> pages, or does it need to live in Wikidata's area?
>
> You can put it on any wiki page, eg a Wikidata discussion or project page.
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
When you have a system that reports on what needs a simple response you do
not report, you add a lable. It is the lack of such considerations why it
is a Wikipedia approach and not a Wikidata approach. The tool will rate
items and it will be largely meaningless.

When the idea is that we have at least something.. Fine. But do not mistake
it for quality, meaningful quality.
Thanks,
   GerardM


Op wo 22 mrt. 2017 om 18:05 schreef Amir Ladsgroup 

> I was mentioned as "the developer of ORES". So I comment on that. Aaron
> Halfaker is the creator of ORES.  It's been his work night and day for a
> few years now. I've contributed around 20% of the code base.  But let's be
> clear, ORES is his brainchild.  There is an army of other developers who
> have contributed.  E.g. He7d3r, Jonas.agx, Aetilley, Danilo, Yuvipanda,
> Awight, Kenrick95, NealMCB, and countless translators.  The idea that a
> single person can develop something like a production machine learning
> service.  Yikes.
>
> See:
> https://github.com/wiki-ai/revscoring/graphs/contributors (the modeling
> library)
> https://github.com/wiki-ai/ores/graphs/contributors (the hosting service)
> https://github.com/wiki-ai/ores-wmflabs-deploy/graphs/contributors (our
> server configuration)
> https://github.com/wiki-ai/wikilabels/graphs/contributors (the labeling
> system)
> https://github.com/wiki-ai/editquality/graphs/contributors (the set of
> damage/vandalism detection models)
> https://github.com/wikimedia/mediawiki-extensions-ORES/graphs/contributors
> (mediawiki extension that highlights based on ORES predictions)
>
> Also, I fail to see the relation of running a labeling script to what's
> ORES is doing.
>
> Best
>
> On Wed, Mar 22, 2017 at 8:51 PM John Erling Blad  wrote:
>
> Only using sitelinks as a weak indication of quality seems correct to me.
> Also the idea that some languages are more important than other, and some
> large languages are more important than other. I would really like it if
> the reasoning behind the classes and the features could be spelled out.
>
> I have serious issues with the ORES training sets, but that is another
> discussion. ;/ (There is a lot of similar bot edits in the sets, and that
> will train a bot-detector, which is not what we need! Grumpf…)
>
> On Wed, Mar 22, 2017 at 3:33 PM, Aaron Halfaker 
> wrote:
>
> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com> wrote:
>
> Hoi,
> What I have read is that it will be individual items that are graded. That
> is not what helps you determine what items are lacking in something. When
> you want to determine if something is lacking you need a relational
> approach. When you approach a award like this one [1], it was added to make
> the award for a person [2] more complete. No real importance is given to
> this award, just a few more people were added because they are part of a
> group that gets more attention from me [3]. For yet another award [4], I
> added all the people who received the award because I was told by someone's
> expert opinion that they were all notable (in the Wikipedia sense of the
> word). I added several of these people in Wikidata. Arguably, the Wikidata
> the quality for the item for the award is great but it has no article
> associated to it in Wikipedia but that has nothing to do with the quality
> of the information it provides. It is easy and obvious to recognise in one
> level deeper that quality issues arise; the info for several people is
> meagre at best.You cannot deny their relevance though; removing them
> destroys the quality for the award.
>
> The point is that in relations you can describe quality, in the grading
> that is proposed there is nothing really that is actionable.
>
> When you add links to the mix, these same links have no bearing on the
> quality of the Wikidata item. Why would it? Links only become interesting
> when you compare the statements in Wikidata with the links to other
> articles in the same Wikipedia. This is not what this approach brings.
>
> Really, how will the grades to items make a differen

[Wikidata] Problems installing Wikibase repository and client in the same site

2017-03-22 Thread Iván Hernández Cazorla

Hi everyone!

I am trying to configure Wikibase in a wiki that I have installed. I 
want to use both, repository and client. I have followed installation 
instructions , but 
I think that I made something wrong because now I have several errors in 
my wiki.


If I access to any page of the site I see "Internal error", the error 
and the backtrace. For example, in the mainpage:


[e837b2d7a6fcc7e1edaead2a] /Portada Error from line 240 of 
/[PATH]/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: 
Class 'Wikimedia\Rdbms\SessionConsistentConnectionManager' not found


It also happens when I try to edit any page in main namespace. Then, 
another error is when I visit the page Special:CreateItem:


[1d904eb6d5d155e0d0c710ae] /Especial:CreateItem TypeError from line 
32 of 
/[PATH]/extensions/Wikibase/repo/includes/Store/Sql/SqlIdGenerator.php: 
Argument 1 passed to Wikibase\SqlIdGenerator::__construct() must be an 
instance of Wikimedia\Rdbms\LoadBalancer, instance of LoadBalancer 
given, called 
in/[PATH]/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php on 
line 298


You can check my settings for Wikibase in that gist 
. 
But I have made is config both extensions in the same file and the 
namespaces like as explain the advanced configuration 
. 
I have configurated PHP 7.


Any idea? Thanks in advance!

Regards, Iván
--
Iván Hernández Cazorla.
Estudiante del Grado de Historia en la *Universidad de Las Palmas de 
Gran Canaria*.

Socio de *Wikimedia España*.
Sitio web personal .
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Amir Ladsgroup
I was mentioned as "the developer of ORES". So I comment on that. Aaron
Halfaker is the creator of ORES.  It's been his work night and day for a
few years now. I've contributed around 20% of the code base.  But let's be
clear, ORES is his brainchild.  There is an army of other developers who
have contributed.  E.g. He7d3r, Jonas.agx, Aetilley, Danilo, Yuvipanda,
Awight, Kenrick95, NealMCB, and countless translators.  The idea that a
single person can develop something like a production machine learning
service.  Yikes.

See:
https://github.com/wiki-ai/revscoring/graphs/contributors (the modeling
library)
https://github.com/wiki-ai/ores/graphs/contributors (the hosting service)
https://github.com/wiki-ai/ores-wmflabs-deploy/graphs/contributors (our
server configuration)
https://github.com/wiki-ai/wikilabels/graphs/contributors (the labeling
system)
https://github.com/wiki-ai/editquality/graphs/contributors (the set of
damage/vandalism detection models)
https://github.com/wikimedia/mediawiki-extensions-ORES/graphs/contributors
(mediawiki extension that highlights based on ORES predictions)

Also, I fail to see the relation of running a labeling script to what's
ORES is doing.

Best

On Wed, Mar 22, 2017 at 8:51 PM John Erling Blad  wrote:

> Only using sitelinks as a weak indication of quality seems correct to me.
> Also the idea that some languages are more important than other, and some
> large languages are more important than other. I would really like it if
> the reasoning behind the classes and the features could be spelled out.
>
> I have serious issues with the ORES training sets, but that is another
> discussion. ;/ (There is a lot of similar bot edits in the sets, and that
> will train a bot-detector, which is not what we need! Grumpf…)
>
> On Wed, Mar 22, 2017 at 3:33 PM, Aaron Halfaker 
> wrote:
>
> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com> wrote:
>
> Hoi,
> What I have read is that it will be individual items that are graded. That
> is not what helps you determine what items are lacking in something. When
> you want to determine if something is lacking you need a relational
> approach. When you approach a award like this one [1], it was added to make
> the award for a person [2] more complete. No real importance is given to
> this award, just a few more people were added because they are part of a
> group that gets more attention from me [3]. For yet another award [4], I
> added all the people who received the award because I was told by someone's
> expert opinion that they were all notable (in the Wikipedia sense of the
> word). I added several of these people in Wikidata. Arguably, the Wikidata
> the quality for the item for the award is great but it has no article
> associated to it in Wikipedia but that has nothing to do with the quality
> of the information it provides. It is easy and obvious to recognise in one
> level deeper that quality issues arise; the info for several people is
> meagre at best.You cannot deny their relevance though; removing them
> destroys the quality for the award.
>
> The point is that in relations you can describe quality, in the grading
> that is proposed there is nothing really that is actionable.
>
> When you add links to the mix, these same links have no bearing on the
> quality of the Wikidata item. Why would it? Links only become interesting
> when you compare the statements in Wikidata with the links to other
> articles in the same Wikipedia. This is not what this approach brings.
>
> Really, how will the grades to items make a difference. How will it help us
> understand that "items relating to railroads are lacking"? It does not.
>
> When you want to have indicators for quality; here is one.. an author (and
> its subclasses) should have a VIAF identifier. An artist with objects in
> the Getty Museum should have an ULAN number. The lack of such information
> is actionable. The number of interwiki links is not, the number of
> statements are not and even references are not that convincing.
> Thanks,
>   GerardM
>
> [1] https://tools.w

Re: [Wikidata] [wikicite-discuss] Re: Wikipedia Task Lists for Editathons using Wikidata

2017-03-22 Thread Vladimir Alexiev
>> Listeria makes auto-updating lists on pages, based on a SPARQL query. 
>> The question is whether you have a common characteristic to catch all your 
>> items, since eg Category is not it (no such prop on WD).
> I'm confused about the example that the category is not a property of 
> Wikidata. Is it not a query-able property in SPARQL to generate this type of 
> output? 

Right: there’s no property Category.  See these discussions
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/30#category
https://www.wikidata.org/wiki/Wikidata:Property_proposal/Archive/30#useful_for 

> After creating the Wikidata item for Visual artists of the African diaspora, 
> I started adding that category to artist Wikidata items –
> - as well as a Commons category if they had media.
> So if I run a SPARQL query using category it won't generate results?

Did you add it as “item’s main category”? That’s incorrect, since that’s 
inverse of “category’s main item”, 
and a category is supposed to have a single main item (e.g. page France vs 
category France).

> I'm confused because the other suggestion was to tag items as of interest to 
> Black Lunch Table. The category seems to be functioning in the same way, not 
> very different. 

How did you tag “of interest to”?

> One other question: is the task list on Listeria usable on Wikipedia pages, 
> or does it need to live in Wikidata's area?

You can put it on any wiki page, eg a Wikidata discussion or project page.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Aaron Halfaker
Hey wiki-research-l folks,

Gerard didn't actually link you to the quality criteria he takes issue
with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
unclear how that is relevant to the goal of measuring the quality of
items.  This is something I've been talking to Lydia about for a long
time.  It's been great for the few Wikis where we have models deployed in
ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
same for Wikidata.   As Lydia said, we do all sorts of fascinating things
with a model like this.  Honestly, I think the criteria is coming together
quite nicely and we're just starting a pilot labeling campaign to work
through a set of issues before starting the primary labeling drive.

1. https://ores.wikimedia.org

-Aaron



On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen 
wrote:

> Hoi,
> What I have read is that it will be individual items that are graded. That
> is not what helps you determine what items are lacking in something. When
> you want to determine if something is lacking you need a relational
> approach. When you approach a award like this one [1], it was added to make
> the award for a person [2] more complete. No real importance is given to
> this award, just a few more people were added because they are part of a
> group that gets more attention from me [3]. For yet another award [4], I
> added all the people who received the award because I was told by someone's
> expert opinion that they were all notable (in the Wikipedia sense of the
> word). I added several of these people in Wikidata. Arguably, the Wikidata
> the quality for the item for the award is great but it has no article
> associated to it in Wikipedia but that has nothing to do with the quality
> of the information it provides. It is easy and obvious to recognise in one
> level deeper that quality issues arise; the info for several people is
> meagre at best.You cannot deny their relevance though; removing them
> destroys the quality for the award.
>
> The point is that in relations you can describe quality, in the grading
> that is proposed there is nothing really that is actionable.
>
> When you add links to the mix, these same links have no bearing on the
> quality of the Wikidata item. Why would it? Links only become interesting
> when you compare the statements in Wikidata with the links to other
> articles in the same Wikipedia. This is not what this approach brings.
>
> Really, how will the grades to items make a difference. How will it help us
> understand that "items relating to railroads are lacking"? It does not.
>
> When you want to have indicators for quality; here is one.. an author (and
> its subclasses) should have a VIAF identifier. An artist with objects in
> the Getty Museum should have an ULAN number. The lack of such information
> is actionable. The number of interwiki links is not, the number of
> statements are not and even references are not that convincing.
> Thanks,
>   GerardM
>
> [1] https://tools.wmflabs.org/reasonator/?&q=29000734
> [2] https://tools.wmflabs.org/reasonator/?&q=7315382
> [3] https://tools.wmflabs.org/reasonator/?&q=3308284
> [4] https://tools.wmflabs.org/reasonator/?&q=28934266
>
> On 22 March 2017 at 11:56, Lydia Pintscher 
> wrote:
>
> > On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
> >  wrote:
> > > In your reply I find little argument why this approach is useful. I do
> > not
> > > find a result that is actionable. There is little point to this
> approach
> > > and it does not fit with well with much of the Wikidata practice.
> >
> > Gerard, the outcome will be very actionable. We will have the
> > groundwork needed to identify individual items and sets of items that
> > need improvement. If it for example turns out that our items related
> > to railroads are particularly lacking then that is something we can
> > concentrate on if we so chose. We can do editathons, data
> > partnerships, quality drives and and and.
> >
> >
> > Cheers
> > Lydia
> >
> > --
> > Lydia Pintscher - http://about.me/lydia.pintscher
> > Product Manager for Wikidata
> >
> > Wikimedia Deutschland e.V.
> > Tempelhofer Ufer 23-24
> > 10963 Berlin
> > www.wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> >
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> ___
> Wiki-research-l mailing list
> wiki-researc...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wikidata mail

[Wikidata] Does Wikidata use a property store or a RDF triplestore?

2017-03-22 Thread 70y2eb+3x9alb6ieihv8
^^






Sent using Guerrillamail.com
Block or report abuse: 
https://www.guerrillamail.com//abuse/?a=Vkh%2FDAkBV7U1hAen5G1RZR3MRMeV1t5RiatQew%3D%3D



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Forgot to mention; this is not really about _quality_, Gerrard says model
of quality, it is about _trust_ and _reputation_. Something can have low
quality and high trust, ref cheap cell phones, and the reputation might not
reflect the actual quality.

You (usually) measure reputation and calculate trust, but I have seen it
the other way around. The end result is the same anyhow.

On Wed, Mar 22, 2017 at 3:31 PM, John Erling Blad  wrote:

> Sitelinks to an item are an approximation of the number of views of the
> data from an item, and as such gives an approximation to the likelihood of
> detecting an error. Few views imply a larger time span before an error is
> detected. It is really about estimating quality as a function of the age of
> the item as number of page views, but approximated through sitelinks.
>
> Problem is, the number of sitelinks is not a good approximation. Yes it is
> a simple approximation, but it is still pretty bad.
>
> References are an other way to verify the data, but that is not a valid
> argument against measuring the age of the data.
>
> I've been toying with an idea for some time that use statistical inference
> to try to identify questionable facts, but it will probably not be done -
> it is way to much work to do in spare time.
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Sitelinks to an item are an approximation of the number of views of the
data from an item, and as such gives an approximation to the likelihood of
detecting an error. Few views imply a larger time span before an error is
detected. It is really about estimating quality as a function of the age of
the item as number of page views, but approximated through sitelinks.

Problem is, the number of sitelinks is not a good approximation. Yes it is
a simple approximation, but it is still pretty bad.

References are an other way to verify the data, but that is not a valid
argument against measuring the age of the data.

I've been toying with an idea for some time that use statistical inference
to try to identify questionable facts, but it will probably not be done -
it is way to much work to do in spare time.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread Erika Herzog aka BrillLyle
I am not a fan of quality scales for Wikipedia -- and from what Gerard 
describes doesn't sound generally impactful or helpful as it relates to 
Wikidata. 

As someone more firmly in the Wikipedia camp, I think that my perception of a 
well done Wikipedia page -- which would mean beautifully curated citations and 
as complete metadata as possible (infoboxes, works & publications, further 
reading, authority control, etc.) -- is very very different than a perception 
of a well done Wikidata entry. They seem to be very separate beasts entirely. 

I also think it is probable that due to the youth of the Wikidata project that 
research about Wikidata might be needing to find its space and focus. To 
conflate Wikipedia methodology and evaluation systems with Wikidata is in my 
opinion not going to work because they are so different. And when it comes to 
notability especially Wikidata is very different to constraints on Wikipedia. 
The two projects obviously work in service to and with each other but it seems 
clear that different evaluation systems would be called for. 

I'm a nascent Wikidatan but am learning and using Wikidata more and more. So I 
probably don't understand the concerns here fully. But I do understand 
Wikipedia -- and how it can relate and interface with Wikidata. And notability 
and quality of information is a constant concern. So

- Erika

> On Mar 22, 2017, at 2:27 AM, Gerard Meijssen  
> wrote:
> 
> Hoi,
> A student is going to start some work on Wikidata quality based on a model of 
> quality that is imho seriously suspect. It is item based and assumes that the 
> more interwiki links there are, the more statements there are and the more 
> references there are, the item will be of a higher quality.
> 
> I did protest against this approach and I did call into question that this 
> work will help us achieve better quality at Wikidata. I did indicate what we 
> should do to approach quality at Wikidata and I was indignantly told that 
> research shows that I am wrong.
> 
> The research is about Wikipedia not Wikidata and the paper quoted does not 
> mention Wikidata at all. As far as I am concerned we have been quite happy to 
> only see English Wikipedia based research and consequently I doubt there is 
> Wikimedia based research that is truly applicable.
> 
> At a previous time a student started work on a quality project for Wikidata; 
> comparisons were to be made with external sources so that we could deduce 
> quality. The student finished his or her research, I assume wrote a paper and 
> left us with no working functionality. It is left at that. So the model were 
> a student can do vital work for Wikidata is also very much in doubt.
> 
> I wrote in an e-mail to user:Epochfail: 
> Hoi,
> You refer to a publication, the basis for quality and it is NOT about 
> Wikidata but about Wikipedia. What is discussed is quality for Wikidata where 
> other assumptions are needed. My point to data is that its quality is in the 
> connections that are made.
> 
> To some extend Wikidata reflects Wikipedia but not one Wikipedia, all 
> Wikipedias. In addition there is a large and growing set of data with no 
> links to Wikipedia or any of the other Wikimedia projects.
> 
> When you consider the current dataset, there are hardly any relevant sources. 
> They do exist by inference - items based on Wikipedia are likely to have a 
> source - items on an award are documented on the official website for the 
> award - etc.
> 
> Quality is therefore in statements being the same on items that are 
> identified as such.
> 
> When you consider Wikidata, it often has more items relating to a university, 
> an award than a Wikipedia does and often it does not link to items 
> representing articles in a specific Wikipedia. When you consider this alone 
> you have actionable difference of at least 2%.
> 
> Sure enough plenty of scope of looking at Wikidata in its own context and NOT 
> quoting studies that have nothing to do with Wikidata.
> Thanks,
>  GerardM
> 
> My question to both researchers and Wikidata people is: Why would this 
> Wikipedia model for quality apply to Wikidata? What research is it based on, 
> is that research applicable and to what extend? Will the alternative approach 
> to quality for WIKIDATA not provide us with more and better quality that will 
> also be of relevance to Wikipedia quality?
> Thanks,
>GerardM
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
What I have read is that it will be individual items that are graded. That
is not what helps you determine what items are lacking in something. When
you want to determine if something is lacking you need a relational
approach. When you approach a award like this one [1], it was added to make
the award for a person [2] more complete. No real importance is given to
this award, just a few more people were added because they are part of a
group that gets more attention from me [3]. For yet another award [4], I
added all the people who received the award because I was told by someone's
expert opinion that they were all notable (in the Wikipedia sense of the
word). I added several of these people in Wikidata. Arguably, the Wikidata
the quality for the item for the award is great but it has no article
associated to it in Wikipedia but that has nothing to do with the quality
of the information it provides. It is easy and obvious to recognise in one
level deeper that quality issues arise; the info for several people is
meagre at best.You cannot deny their relevance though; removing them
destroys the quality for the award.

The point is that in relations you can describe quality, in the grading
that is proposed there is nothing really that is actionable.

When you add links to the mix, these same links have no bearing on the
quality of the Wikidata item. Why would it? Links only become interesting
when you compare the statements in Wikidata with the links to other
articles in the same Wikipedia. This is not what this approach brings.

Really, how will the grades to items make a difference. How will it help us
understand that "items relating to railroads are lacking"? It does not.

When you want to have indicators for quality; here is one.. an author (and
its subclasses) should have a VIAF identifier. An artist with objects in
the Getty Museum should have an ULAN number. The lack of such information
is actionable. The number of interwiki links is not, the number of
statements are not and even references are not that convincing.
Thanks,
  GerardM

[1] https://tools.wmflabs.org/reasonator/?&q=29000734
[2] https://tools.wmflabs.org/reasonator/?&q=7315382
[3] https://tools.wmflabs.org/reasonator/?&q=3308284
[4] https://tools.wmflabs.org/reasonator/?&q=28934266

On 22 March 2017 at 11:56, Lydia Pintscher 
wrote:

> On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
>  wrote:
> > In your reply I find little argument why this approach is useful. I do
> not
> > find a result that is actionable. There is little point to this approach
> > and it does not fit with well with much of the Wikidata practice.
>
> Gerard, the outcome will be very actionable. We will have the
> groundwork needed to identify individual items and sets of items that
> need improvement. If it for example turns out that our items related
> to railroads are particularly lacking then that is something we can
> concentrate on if we so chose. We can do editathons, data
> partnerships, quality drives and and and.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Lydia Pintscher
On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
 wrote:
> In your reply I find little argument why this approach is useful. I do not
> find a result that is actionable. There is little point to this approach
> and it does not fit with well with much of the Wikidata practice.

Gerard, the outcome will be very actionable. We will have the
groundwork needed to identify individual items and sets of items that
need improvement. If it for example turns out that our items related
to railroads are particularly lacking then that is something we can
concentrate on if we so chose. We can do editathons, data
partnerships, quality drives and and and.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] What kind of bot "wiktionary in wikidata" needs?

2017-03-22 Thread Amirouche



Le 21/03/2017 à 09:14, Léa Lacroix a écrit :

Hello,

No, there is nothing from our side regarding extracting data from 
Wiktionary. This is not in the plans of the development team, by the 
way, we think that this decision (to extract or not) and the ways to 
possibly do it, should be taken by both of the communities (Wikidata 
and Wiktionary).


If you have any experiments or demo, feel free to share :)


My understanding is that wiktionary (and wikipedia) CC-BY-SA license is 
incompatible with wikidata CC0 license.


see 
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F




On 20 March 2017 at 21:57, Amirouche > wrote:


Héllo all!


Le 02/03/2017 à 10:34, Léa Lacroix a écrit :

Hello Amirouche,

Thanks a lot for your interest in this project and your
proposal to help.
Currently, the development team is still working on the new
datatype structure for lexemes, and we don't have something to
demo yet.


I don't need wikibase support of L, F and S right now.

What I am wondering is whether there is already work done
wikimedia side regarding the *extraction* of Lexeme, Form and Sens
from wikitionary pages.

I started scrapping english wiktionary. I will have demo ready by
the end of the week. But I'd like to avoid duplicate work and
focus on other stuff if wikimedia already plan to do this.

As soon as we can provide a viable structure to test, we will
announce it here and on the talk page of the project
>.

Cheers,

On 1 March 2017 at 22:43, mailto:f...@imm.dtu.dk> >> wrote:



Hi,


It is my understanding that Wikidata for Wiktionary
requires new
data structures or at least new name space (L, F and S),
and that
is what holding people back.


What could be interesting to have would be a prototype (not
necessarily built with MediaWiki+Wikibase) to see if the
suggested
scheme is ok



On 03/01/2017 10:16 PM, Amirouche wrote:

Héllo,


I have been lurking around for some month now. I
stumbled upon the
wiktionary in wikidata project
 via for instance this pdf

https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf


   
>


Now I'd like to help. For that I want to build a bot to
achieve that goal.


My understanding is that a proof of concept of the
page 11 of
the above
pdf can be good. But I never really did any site
scraping. Is
there any
abstraction that help in this regard.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





--
Léa Lacroix
Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de 

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg 
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das 
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread Gerard Meijssen
Hoi,
When you consider the "collaroborative dimension", it is utterly different
for Wikidata. An example: I just added a few statements to Dorothy Tarrant
[1].For several of those statements I added hundreds of similar statements
on other items. In order to add the award I had to add the award first. I
updated the date of birth and death. When you consider the statements on
most items, they are typically done by bot or by processes like I use. This
"collaborative dimension" is relevant to Wikipedia, not to Wikidata.

You state that previously developed criteria for Wikipedia are important.
Ok, how?

My problem with this approach is that it establishes Wikipedia think that
is not appropriate for Wikidata. You are imho correct where you say that
links are of more relevance. They are because they allow to compare links
between Wikipedia articles and links between Wikidata items. This is
actionable information because the two should largely be the same. I have
described [2] how Wikidata can help in improving the quality of any
Wikipedia and in the process improve its own quality. This can be done by
associating wiki links and red links with Wikidata items. Tools can be of
service in pointing out probable issues.

In your reply I find little argument why this approach is useful. I do not
find a result that is actionable. There is little point to this approach
and it does not fit with well with much of the Wikidata practice.
Thanks,
  GerardM


[1] https://tools.wmflabs.org/reasonator/?q=Q18783615
[2]
http://ultimategerardm.blogspot.nl/2016/01/wikipedia-lowest-hanging-fruit-from.html

On 22 March 2017 at 09:45, Piscopo A.  wrote:

> Hi GerardM
>
> I don’t know if I am one of the researchers you mention in your email, but
> I have indeed carried out research around Wikidata quality and still am.
> I asked the community to help me gather different point of views over what
> data quality means on Wikidata in a RfC a couple of months ago.
>
> If you wonder what happened to that and whether I published anything using
> that material without sharing any results with the community, well, the
> answer is no.
> What we collected is still there and still valuable, waiting to be
> properly analysed once we have enough time to dedicate more to it (which
> will happen later on this year).
>
> As for your questions, I agree with you that looking at Wikidata quality
> with the eyes of (only one) Wikipedia may not be helpful in understanding
> its quality and appreciate its peculiarities. I have tried until now to
> rely more on more general quality dimensions and metrics, and on Linked
> Data-related ones.
> This does not mean that the quality criteria previously developed for
> Wikipedia are not important. They take into account the collaborative
> dimension of the project and are definitely helpful to assess Wikidata as a
> community product.
>
> A short note in favour of my fellow researcher: an evaluation of Wikidata
> by item has been made already by identifying showcase items. Regardless of
> whether we think that evaluating quality by item is the most correct
> approach, I think it is definitely useful to show how good single units of
> information (Items) are. It is just ‘a’ measure for quality, not ‘the’
> measure for quality. As with all research, it may either prove itself to be
> an incredibly valuable contribution to Wikidata (I believe it will) or to
> be useless after all. Whatever the outcome, students/researchers working on
> Wikidata help raise and focus on issues that are important for the
> community and for the project itself, imho of course.
>
> Thanks,
> Alessandro
>
> –––
> Alessandro Piscopo
> Web and Internet Science Group
> School of Electronics and Computer Science
> University of Southampton
> email: a.pisc...@soton.ac.uk
>
> On 22 Mar 2017, at 06:27, Gerard Meijssen  mailto:gerard.meijs...@gmail.com>> wrote:
>
> Hoi,
> A student is going to start some work on Wikidata quality based on a model
> of quality that is imho seriously suspect. It is item based and assumes
> that the more interwiki links there are, the more statements there are and
> the more references there are, the item will be of a higher quality.
>
> I did protest against this approach and I did call into question that this
> work will help us achieve better quality at Wikidata. I did indicate what
> we should do to approach quality at Wikidata and I was indignantly told
> that research shows that I am wrong.
>
> The research is about Wikipedia not Wikidata and the paper quoted does not
> mention Wikidata at all. As far as I am concerned we have been quite happy
> to only see English Wikipedia based research and consequently I doubt there
> is Wikimedia based research that is truly applicable.
>
> At a previous time a student started work on a quality project for
> Wikidata; comparisons were to be made with external sources so that we
> could deduce quality. The student finished his or her research,