RKemper added a comment.
Alright, I had an initial meeting with Traffic team (Brandon & Valentin). Traffic team meeting summary ---------------------------- The primary concern they had was related to the potential impact on the ATS side of things; in a scenario where Blazegraph is taking a consistently long time to respond, ATS would be impacted due to the large numbers of dangling sockets, which could theoretically impact the rest of production infrastructure (mediawiki, etc). This isn't necessarily a new problem, it sounds like this is a potential concern they've had with WDQS in general for some time now. One of the possibilities we discussed was to bypass the caching layer entirely and just use LVS, so with respect to these net-new services backed by a single backend host each, it would look like a single backend host behind LVS but avoiding the ATS/caching layer entirely. That eliminates the concern around ATS but does introduce a few drawbacks: - (primary drawback) **We lose `requestctl`** which is a tremendously useful tool when managing WDQS outages. We'd presumably be going back to the old way of doing things where we'd manually ban at the nginx level when necessary. - Some extra latency would be introduced since we wouldn't be terminating TLS as close to the user. This probably isn't the hugest deal; adding up to 100ms of latency to the user end likely wouldn't break existing usecases. - There's some changes to puppet automation, etc that we'd have to make. It sounds like the main one is that tls certs would have to go thru acmechief rather than relying on the cdn. This generates some work on our (Search team) end in creating the corresponding puppet patch(es) but wouldn't be a showstopper. Of the above 3 drawbacks the most painful one is losing requestctl; it's a really great tool. But it might be a worthwhile tradeoff to entirely avoid the possibility of a misbehaving query service impacting non-WDQS production infrastructure like MediaWiki itself. I'd note that I'm not aware of us specifically having encountered that problem (ATS backing up and impacting the rest of prod) in previous WDQS outages, but it's also not something we were going out of our way to look for either. So, we'll need to discuss amongst Search team and see what the consensus is, then bring the discussion back to Traffic team for further feedback. Other context ------------- The existing request flow for WDQS as it exists currently is `haproxy [traffic team manage certs] -> varnish -> ats -> envoy -> nginx -> blazegraph` (these are from hastily transcribed notes + I filled in the missing gaps on the righthand side [nginx->blazegraph] so I'll want to follow up and validate that the above flow is correct) As far as how things would look like after spinning up the new endpoints (sidestepping the question of whether to bypass the caching layer or not), `query.wikidata.org` would still be getting the vast majority of the traffic, with `wdqs-scholarly-articles` getting just a few % of total traffic at most. We expect actual usage of these new endpoints to be quite low - basically only the WDQS powerusers will actually be trying it out, at least initially - but given it'd still be a production service and exposed to the outside world there is of course always the potential for a malicious attacker wrt the concerns about ATS getting backed up. TASK DETAIL https://phabricator.wikimedia.org/T351650 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper Cc: bking, dcausse, dr0ptp4kt, RKemper, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org