Hi!

> We are currently dealing with a bot overloading the Wikidata Query
> Service. This bot does not look actively malicious, but does create
> enough load to disrupt the service. As a stop gap measure, we had to
> deny access to all bots using python-request user agent.
> 
> As a reminder, any bot should use a user agent that allows to identify
> it [1]. If you have trouble accessing WDQS, please check that you are
> following those guidelines.

To add to this, we have had this trouble because two events that WDQS
currently does not deal well with have coincided:

1. An edit bot that edited with 200+ edits per minute. This is too much.
Over 60/m is really almost always too much. And also it would be a good
thing to consider if your bots does multiple changes (e.g. adds multiple
statements) doing it in one call instead of several, since WDQS
currently will do an update on each change separately, and this may be
expensive. We're looking into various improvements to this, but it is
the state currently.

2. Several bots have been flooding the service query endpoint with
requests. There is recently a growth in bots that a) completely ignore
both regular limits and throttling hints b) do not have proper
identifying user agent and c) use distributed hosts so our throttling
system has a problem to deal with them automatically. We intend to crack
down more and more on such clients, because they look a lot like DDOS
and ruin the service experience for everyone.

I will write down more detailed rules probably a bit later, but so far
these:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints
and additionally having distinct User-Agent if you're running a bot is a
good idea.

And for people who are thinking it's a good idea to launch a
max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon
machines so that throttling has hard time detecting it, and then when
throttling does detect it neglecting to check for a week that all the
bot is doing is fetching 403s from the service and wasting everybody's
time - please think again. If you want to do something non-trivial
querying WDQS and limits get in the way - please talk to us (and if you
know somebody who isn't reading this list but is considering wiring a
bot interfacing with WDQS - please educate them and refer them for help,
we really prefer to help than to ban). Otherwise, we'd be forced to put
more limitations on it that will affect everyone.

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to