[Wikidata] Quality issues

2015-11-19 Thread Gerard Meijssen
Hoi,
At Wikidata we often find issues with data imported from a Wikipedia. Lists
have been produced with these issues on the Wikipedia involved and arguably
they do present issues with the quality of Wikipedia or Wikidata for that
matter. So far hardly anything resulted from such outreach.

When Wikipedia is a black box, not communicating about with the outside
world, at some stage the situation becomes toxic. At this moment there are
already those at Wikidata that argue not to bother about Wikipedia quality
because in their view, Wikipedians do not care about its own quality.

Arguably known issues with quality are the easiest to solve.

There are many ways to approach this subject. It is indeed a quality issue
both for Wikidata and Wikipedia. It can be seen as a research issue; how to
deal with quality and how do such mechanisms function if at all.

I blogged about it..
Thanks,
 GerardM

http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Lydia Pintscher
On Thu, Nov 19, 2015 at 1:10 PM, Lukas Benedix
 wrote:
> Is there any evidence, that the quality of bot edits is higher than edits
> by humans?

Let's please not treat bot edits as something magical that comes down
from the sky. Bots have a human operating them. If that person knows
what they are doing it is good - if not not. The difference is just
the scale and impact a person can have.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Lukas Benedix
Is there any evidence, that the quality of bot edits is higher than edits
by humans?

LB

> Hoi,
> Because once it is a requirement and not a recommendation, it will be
> impossible to reverse this. The insidious creep of more rules and
> requirements will make Wikidata increasingly less of a wiki. Arguably most
> of the edits done by bot are of a higher quality than those done by hand.
> It is for the people maintaining the SPARQL environment to ensure that it
> is up to the job as it does not affect Wikidata itself.
>
> I think each of these argument holds its own. Together they are hopefully
> potent enough to prevent such silliness.
>
> Thanks,
>  GerardM
>
>
> On 19 November 2015 at 08:55, Tom Morris  wrote:
>
>> So, the page that Markus points to describes heeding the replication lag
>> limit as a recommendation.  Since running a bot is a privilege, not a
>> right, why isn't the "recommendation" a requirement instead of a
>> recommendation?
>>
>> Tom
>>
>> On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch <
>> mar...@semantic-mediawiki.org> wrote:
>>
>>> On 18.11.2015 19:40, Federico Leva (Nemo) wrote:
>>>
 Andra Waagmeester, 18/11/2015 19:03:

> How do you do add "hunderds (if not thousands)" items per minute?
>

 Usually
 1) concurrency,
 2) low latency.

>>>
>>> In fact, it is not hard to get this. I guess Andra is getting speeds of
>>> 20-30 items because their bot framework is throttling the speed on
>>> purpose.
>>> If I don't throttle WDTK, I can easily do well over 100 edits per
>>> minute in
>>> a single thread (I did not try the maximum ;-).
>>>
>>> Already a few minutes of fast editing might push up the median dispatch
>>> lag sufficiently for a bot to stop/wait. While the slow edit rate is a
>>> rough guess (not a strict rule), respecting the dispatch stats is
>>> mandatory
>>> for Wikidata bots, so things will eventually slow down (or your bot be
>>> blocked ;-). See [1].
>>>
>>> Markus
>>>
>>> [1] https://www.wikidata.org/wiki/Wikidata:Bots
>>>
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Gerard Meijssen
Hoi,
 Markus we agree.  Given that the lag of updates is
measurable, it would be good to have an algorithm that allows bots to
negotiate their speed and thereby maximise throughput. When such an
algorithm is in bot environments as pywiki, any and all pywiki bots can
safely go wild and do the good they are known for. 
Thanks,
  GerardM

On 19 November 2015 at 11:06, Markus Krötzsch  wrote:

> On 19.11.2015 10:40, Gerard Meijssen wrote:
>
>> Hoi,
>> Because once it is a requirement and not a recommendation, it will be
>> impossible to reverse this. The insidious creep of more rules and
>> requirements will make Wikidata increasingly less of a wiki. Arguably
>> most of the edits done by bot are of a higher quality than those done by
>> hand. It is for the people maintaining the SPARQL environment to ensure
>> that it is up to the job as it does not affect Wikidata itself.
>>
>> I think each of these argument holds its own. Together they are
>> hopefully potent enough to prevent such silliness.
>>
>
> Maybe it would not be that bad. I actually think that many bots right now
> are slower than they could be because they are afraid to overload the site.
> If bots would check the lag, they could operate close to the maximum load
> that the site can currently handle, which is probably more than most bots
> are doing now.
>
> The "requirement" vs. "recommendation" thing is maybe not so relevant,
> since bot rules (mandatory or not) are currently not enforced in any strong
> way. Basically, the whole system is based on mutual trust and this is how
> it should stay.
>
> Markus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Markus Krötzsch

On 19.11.2015 10:40, Gerard Meijssen wrote:

Hoi,
Because once it is a requirement and not a recommendation, it will be
impossible to reverse this. The insidious creep of more rules and
requirements will make Wikidata increasingly less of a wiki. Arguably
most of the edits done by bot are of a higher quality than those done by
hand. It is for the people maintaining the SPARQL environment to ensure
that it is up to the job as it does not affect Wikidata itself.

I think each of these argument holds its own. Together they are
hopefully potent enough to prevent such silliness.


Maybe it would not be that bad. I actually think that many bots right 
now are slower than they could be because they are afraid to overload 
the site. If bots would check the lag, they could operate close to the 
maximum load that the site can currently handle, which is probably more 
than most bots are doing now.


The "requirement" vs. "recommendation" thing is maybe not so relevant, 
since bot rules (mandatory or not) are currently not enforced in any 
strong way. Basically, the whole system is based on mutual trust and 
this is how it should stay.


Markus

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Gerard Meijssen
Hoi,
Because once it is a requirement and not a recommendation, it will be
impossible to reverse this. The insidious creep of more rules and
requirements will make Wikidata increasingly less of a wiki. Arguably most
of the edits done by bot are of a higher quality than those done by hand.
It is for the people maintaining the SPARQL environment to ensure that it
is up to the job as it does not affect Wikidata itself.

I think each of these argument holds its own. Together they are hopefully
potent enough to prevent such silliness.

Thanks,
 GerardM


On 19 November 2015 at 08:55, Tom Morris  wrote:

> So, the page that Markus points to describes heeding the replication lag
> limit as a recommendation.  Since running a bot is a privilege, not a
> right, why isn't the "recommendation" a requirement instead of a
> recommendation?
>
> Tom
>
> On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> On 18.11.2015 19:40, Federico Leva (Nemo) wrote:
>>
>>> Andra Waagmeester, 18/11/2015 19:03:
>>>
 How do you do add "hunderds (if not thousands)" items per minute?

>>>
>>> Usually
>>> 1) concurrency,
>>> 2) low latency.
>>>
>>
>> In fact, it is not hard to get this. I guess Andra is getting speeds of
>> 20-30 items because their bot framework is throttling the speed on purpose.
>> If I don't throttle WDTK, I can easily do well over 100 edits per minute in
>> a single thread (I did not try the maximum ;-).
>>
>> Already a few minutes of fast editing might push up the median dispatch
>> lag sufficiently for a bot to stop/wait. While the slow edit rate is a
>> rough guess (not a strict rule), respecting the dispatch stats is mandatory
>> for Wikidata bots, so things will eventually slow down (or your bot be
>> blocked ;-). See [1].
>>
>> Markus
>>
>> [1] https://www.wikidata.org/wiki/Wikidata:Bots
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata