[Wikidata] Quality issues
Hoi, At Wikidata we often find issues with data imported from a Wikipedia. Lists have been produced with these issues on the Wikipedia involved and arguably they do present issues with the quality of Wikipedia or Wikidata for that matter. So far hardly anything resulted from such outreach. When Wikipedia is a black box, not communicating about with the outside world, at some stage the situation becomes toxic. At this moment there are already those at Wikidata that argue not to bother about Wikipedia quality because in their view, Wikipedians do not care about its own quality. Arguably known issues with quality are the easiest to solve. There are many ways to approach this subject. It is indeed a quality issue both for Wikidata and Wikipedia. It can be seen as a research issue; how to deal with quality and how do such mechanisms function if at all. I blogged about it.. Thanks, GerardM http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] WDQS updates have stopped
On Thu, Nov 19, 2015 at 1:10 PM, Lukas Benedix wrote: > Is there any evidence, that the quality of bot edits is higher than edits > by humans? Let's please not treat bot edits as something magical that comes down from the sky. Bots have a human operating them. If that person knows what they are doing it is good - if not not. The difference is just the scale and impact a person can have. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] WDQS updates have stopped
Is there any evidence, that the quality of bot edits is higher than edits by humans? LB > Hoi, > Because once it is a requirement and not a recommendation, it will be > impossible to reverse this. The insidious creep of more rules and > requirements will make Wikidata increasingly less of a wiki. Arguably most > of the edits done by bot are of a higher quality than those done by hand. > It is for the people maintaining the SPARQL environment to ensure that it > is up to the job as it does not affect Wikidata itself. > > I think each of these argument holds its own. Together they are hopefully > potent enough to prevent such silliness. > > Thanks, > GerardM > > > On 19 November 2015 at 08:55, Tom Morris wrote: > >> So, the page that Markus points to describes heeding the replication lag >> limit as a recommendation. Since running a bot is a privilege, not a >> right, why isn't the "recommendation" a requirement instead of a >> recommendation? >> >> Tom >> >> On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch < >> mar...@semantic-mediawiki.org> wrote: >> >>> On 18.11.2015 19:40, Federico Leva (Nemo) wrote: >>> Andra Waagmeester, 18/11/2015 19:03: > How do you do add "hunderds (if not thousands)" items per minute? > Usually 1) concurrency, 2) low latency. >>> >>> In fact, it is not hard to get this. I guess Andra is getting speeds of >>> 20-30 items because their bot framework is throttling the speed on >>> purpose. >>> If I don't throttle WDTK, I can easily do well over 100 edits per >>> minute in >>> a single thread (I did not try the maximum ;-). >>> >>> Already a few minutes of fast editing might push up the median dispatch >>> lag sufficiently for a bot to stop/wait. While the slow edit rate is a >>> rough guess (not a strict rule), respecting the dispatch stats is >>> mandatory >>> for Wikidata bots, so things will eventually slow down (or your bot be >>> blocked ;-). See [1]. >>> >>> Markus >>> >>> [1] https://www.wikidata.org/wiki/Wikidata:Bots >>> >>> >>> >>> ___ >>> Wikidata mailing list >>> Wikidata@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> >> >> ___ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> >> > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] WDQS updates have stopped
Hoi, Markus we agree. Given that the lag of updates is measurable, it would be good to have an algorithm that allows bots to negotiate their speed and thereby maximise throughput. When such an algorithm is in bot environments as pywiki, any and all pywiki bots can safely go wild and do the good they are known for. Thanks, GerardM On 19 November 2015 at 11:06, Markus Krötzsch wrote: > On 19.11.2015 10:40, Gerard Meijssen wrote: > >> Hoi, >> Because once it is a requirement and not a recommendation, it will be >> impossible to reverse this. The insidious creep of more rules and >> requirements will make Wikidata increasingly less of a wiki. Arguably >> most of the edits done by bot are of a higher quality than those done by >> hand. It is for the people maintaining the SPARQL environment to ensure >> that it is up to the job as it does not affect Wikidata itself. >> >> I think each of these argument holds its own. Together they are >> hopefully potent enough to prevent such silliness. >> > > Maybe it would not be that bad. I actually think that many bots right now > are slower than they could be because they are afraid to overload the site. > If bots would check the lag, they could operate close to the maximum load > that the site can currently handle, which is probably more than most bots > are doing now. > > The "requirement" vs. "recommendation" thing is maybe not so relevant, > since bot rules (mandatory or not) are currently not enforced in any strong > way. Basically, the whole system is based on mutual trust and this is how > it should stay. > > Markus > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] WDQS updates have stopped
On 19.11.2015 10:40, Gerard Meijssen wrote: Hoi, Because once it is a requirement and not a recommendation, it will be impossible to reverse this. The insidious creep of more rules and requirements will make Wikidata increasingly less of a wiki. Arguably most of the edits done by bot are of a higher quality than those done by hand. It is for the people maintaining the SPARQL environment to ensure that it is up to the job as it does not affect Wikidata itself. I think each of these argument holds its own. Together they are hopefully potent enough to prevent such silliness. Maybe it would not be that bad. I actually think that many bots right now are slower than they could be because they are afraid to overload the site. If bots would check the lag, they could operate close to the maximum load that the site can currently handle, which is probably more than most bots are doing now. The "requirement" vs. "recommendation" thing is maybe not so relevant, since bot rules (mandatory or not) are currently not enforced in any strong way. Basically, the whole system is based on mutual trust and this is how it should stay. Markus ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] WDQS updates have stopped
Hoi, Because once it is a requirement and not a recommendation, it will be impossible to reverse this. The insidious creep of more rules and requirements will make Wikidata increasingly less of a wiki. Arguably most of the edits done by bot are of a higher quality than those done by hand. It is for the people maintaining the SPARQL environment to ensure that it is up to the job as it does not affect Wikidata itself. I think each of these argument holds its own. Together they are hopefully potent enough to prevent such silliness. Thanks, GerardM On 19 November 2015 at 08:55, Tom Morris wrote: > So, the page that Markus points to describes heeding the replication lag > limit as a recommendation. Since running a bot is a privilege, not a > right, why isn't the "recommendation" a requirement instead of a > recommendation? > > Tom > > On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch < > mar...@semantic-mediawiki.org> wrote: > >> On 18.11.2015 19:40, Federico Leva (Nemo) wrote: >> >>> Andra Waagmeester, 18/11/2015 19:03: >>> How do you do add "hunderds (if not thousands)" items per minute? >>> >>> Usually >>> 1) concurrency, >>> 2) low latency. >>> >> >> In fact, it is not hard to get this. I guess Andra is getting speeds of >> 20-30 items because their bot framework is throttling the speed on purpose. >> If I don't throttle WDTK, I can easily do well over 100 edits per minute in >> a single thread (I did not try the maximum ;-). >> >> Already a few minutes of fast editing might push up the median dispatch >> lag sufficiently for a bot to stop/wait. While the slow edit rate is a >> rough guess (not a strict rule), respecting the dispatch stats is mandatory >> for Wikidata bots, so things will eventually slow down (or your bot be >> blocked ;-). See [1]. >> >> Markus >> >> [1] https://www.wikidata.org/wiki/Wikidata:Bots >> >> >> >> ___ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata