Hoi, I make use of the SourceMD environment, it is well behaved allows for throttling and when I have multiple jobs it only runs one at a time. I do understand that my jobs are put on hold when the situation warrants it, I even put them myself on hold when I think about it.
When someone else puts my job on hold, I cannot release them at a better time and I now have seven jobs doing nothing. A new job progresses normally. The point is that management is ok but given that what I do is well behaved, I expect my jobs to run and when held to be released at a later time. When I cannot depend on jobs to finish, my work is not finished and I do not know if I should run more jobs and what jobs to get the data to a finished state. Thanks, GerardM On Tue, 18 Jun 2019 at 06:35, Stas Malyshev <smalys...@wikimedia.org> wrote: > Hi! > > > We are currently dealing with a bot overloading the Wikidata Query > > Service. This bot does not look actively malicious, but does create > > enough load to disrupt the service. As a stop gap measure, we had to > > deny access to all bots using python-request user agent. > > > > As a reminder, any bot should use a user agent that allows to identify > > it [1]. If you have trouble accessing WDQS, please check that you are > > following those guidelines. > > To add to this, we have had this trouble because two events that WDQS > currently does not deal well with have coincided: > > 1. An edit bot that edited with 200+ edits per minute. This is too much. > Over 60/m is really almost always too much. And also it would be a good > thing to consider if your bots does multiple changes (e.g. adds multiple > statements) doing it in one call instead of several, since WDQS > currently will do an update on each change separately, and this may be > expensive. We're looking into various improvements to this, but it is > the state currently. > > 2. Several bots have been flooding the service query endpoint with > requests. There is recently a growth in bots that a) completely ignore > both regular limits and throttling hints b) do not have proper > identifying user agent and c) use distributed hosts so our throttling > system has a problem to deal with them automatically. We intend to crack > down more and more on such clients, because they look a lot like DDOS > and ruin the service experience for everyone. > > I will write down more detailed rules probably a bit later, but so far > these: > > https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints > and additionally having distinct User-Agent if you're running a bot is a > good idea. > > And for people who are thinking it's a good idea to launch a > max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon > machines so that throttling has hard time detecting it, and then when > throttling does detect it neglecting to check for a week that all the > bot is doing is fetching 403s from the service and wasting everybody's > time - please think again. If you want to do something non-trivial > querying WDQS and limits get in the way - please talk to us (and if you > know somebody who isn't reading this list but is considering wiring a > bot interfacing with WDQS - please educate them and refer them for help, > we really prefer to help than to ban). Otherwise, we'd be forced to put > more limitations on it that will affect everyone. > > -- > Stas Malyshev > smalys...@wikimedia.org > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata