Hi! > We are currently dealing with a bot overloading the Wikidata Query > Service. This bot does not look actively malicious, but does create > enough load to disrupt the service. As a stop gap measure, we had to > deny access to all bots using python-request user agent. > > As a reminder, any bot should use a user agent that allows to identify > it [1]. If you have trouble accessing WDQS, please check that you are > following those guidelines.
To add to this, we have had this trouble because two events that WDQS currently does not deal well with have coincided: 1. An edit bot that edited with 200+ edits per minute. This is too much. Over 60/m is really almost always too much. And also it would be a good thing to consider if your bots does multiple changes (e.g. adds multiple statements) doing it in one call instead of several, since WDQS currently will do an update on each change separately, and this may be expensive. We're looking into various improvements to this, but it is the state currently. 2. Several bots have been flooding the service query endpoint with requests. There is recently a growth in bots that a) completely ignore both regular limits and throttling hints b) do not have proper identifying user agent and c) use distributed hosts so our throttling system has a problem to deal with them automatically. We intend to crack down more and more on such clients, because they look a lot like DDOS and ruin the service experience for everyone. I will write down more detailed rules probably a bit later, but so far these: https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints and additionally having distinct User-Agent if you're running a bot is a good idea. And for people who are thinking it's a good idea to launch a max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon machines so that throttling has hard time detecting it, and then when throttling does detect it neglecting to check for a week that all the bot is doing is fetching 403s from the service and wasting everybody's time - please think again. If you want to do something non-trivial querying WDQS and limits get in the way - please talk to us (and if you know somebody who isn't reading this list but is considering wiring a bot interfacing with WDQS - please educate them and refer them for help, we really prefer to help than to ban). Otherwise, we'd be forced to put more limitations on it that will affect everyone. -- Stas Malyshev smalys...@wikimedia.org _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata