Hoi,
I make use of the SourceMD environment, it is well behaved allows for
throttling and when I have multiple jobs it only runs one at a time. I do
understand that my jobs are put on hold when the situation warrants it, I
even put them myself on hold when I think about it.

When someone else puts my job on hold, I cannot release them at a better
time and I now have seven jobs doing nothing. A new job progresses
normally. The point is that management is ok but given that what I do is
well behaved, I expect my jobs to run and when held to be released at a
later time. When I cannot depend on jobs to finish, my work is not finished
and I do not know if I should run more jobs and what jobs to get the data
to a finished state.
Thanks,
        GerardM

On Tue, 18 Jun 2019 at 06:35, Stas Malyshev <smalys...@wikimedia.org> wrote:

> Hi!
>
> > We are currently dealing with a bot overloading the Wikidata Query
> > Service. This bot does not look actively malicious, but does create
> > enough load to disrupt the service. As a stop gap measure, we had to
> > deny access to all bots using python-request user agent.
> >
> > As a reminder, any bot should use a user agent that allows to identify
> > it [1]. If you have trouble accessing WDQS, please check that you are
> > following those guidelines.
>
> To add to this, we have had this trouble because two events that WDQS
> currently does not deal well with have coincided:
>
> 1. An edit bot that edited with 200+ edits per minute. This is too much.
> Over 60/m is really almost always too much. And also it would be a good
> thing to consider if your bots does multiple changes (e.g. adds multiple
> statements) doing it in one call instead of several, since WDQS
> currently will do an update on each change separately, and this may be
> expensive. We're looking into various improvements to this, but it is
> the state currently.
>
> 2. Several bots have been flooding the service query endpoint with
> requests. There is recently a growth in bots that a) completely ignore
> both regular limits and throttling hints b) do not have proper
> identifying user agent and c) use distributed hosts so our throttling
> system has a problem to deal with them automatically. We intend to crack
> down more and more on such clients, because they look a lot like DDOS
> and ruin the service experience for everyone.
>
> I will write down more detailed rules probably a bit later, but so far
> these:
>
> https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints
> and additionally having distinct User-Agent if you're running a bot is a
> good idea.
>
> And for people who are thinking it's a good idea to launch a
> max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon
> machines so that throttling has hard time detecting it, and then when
> throttling does detect it neglecting to check for a week that all the
> bot is doing is fetching 403s from the service and wasting everybody's
> time - please think again. If you want to do something non-trivial
> querying WDQS and limits get in the way - please talk to us (and if you
> know somebody who isn't reading this list but is considering wiring a
> bot interfacing with WDQS - please educate them and refer them for help,
> we really prefer to help than to ban). Otherwise, we'd be forced to put
> more limitations on it that will affect everyone.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to