Hi!

> Forgive my ignorance. I don't know much about infrastructure of WDQS and
> how it works. I just want to mention how application servers do it. In
> appservers, there are dedicated nodes both for apache and the replica
> database. So if a bot overdo things in Wikipedia (which happens quite a
> lot), users won't feel anything but the other bots take the hit. Routing
> based on UA seems hard though while it's easy in mediawiki (if you hit
> api.php, we assume it's a bot).

We have two clusters - public and internal, with the latter serving only
Wikimedia tasks thus isolated from outside traffic. However, we do not
have a practical way right now to separate bot and non-bot traffic, and
I don't think we now have resources for another cluster.

> Routing based on UA seems hard though while it's easy in mediawiki

I don't think our current LB setup can route based on user agent. There
could be a gateway that does that, but given that we don't have
resources for another cluster for now, it's not too useful to spend time
on developing something like that for now.

Even if we did separate browser and bot traffic, we'd still have the
problem on bot cluster - most bots are benign and low-traffic, and we
want to do our best to enable them to function smoothly. But for this to
work, we need ways to weed out outliners that consume too much
resources. In a way, the bucketing policy is a sort of version of what
you described - if you use proper identification, you are judged on your
traffic. If you use generic identification, you are bucketed with other
generic agents, and thus may be denied if that bucket is full. This is
not the best final solution, but experience so far shows it reduced the
incidence of problems. Further ideas on how to improve it of course are
welcome.

-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to