BBlack added a comment. In https://phabricator.wikimedia.org/T126730#2034900, @Christopher wrote:
> I may be wrong, but the headers that are returned from a request to the nginx > server wdqs1002 say that varnish 1.1 is already being used there. It's varnish 3.0.6 currently (4.x is coming down the road). > And, for whatever reason,** it misses**, because repeating the same query > gives the same response time. It misses because the response is sent with `Transfer-Encoding: chunked`. If it were sent un-chunked with a Content-Length, the varnish would have a chance at caching it. However, the next thing you'd run into is that the response doesn't contain any caching-relevant headers (e.g. `Expires`, `Cache-Control`, `Age`). Lacking these, varnish would cache it with our configured default_ttl, which on the misc cluster where `query.wikidata.org` is currently hosted, is only 120 seconds. > Even though Varnish cache **should work** to proxy nginx for optimizing > delivery of static query results, it lacks several important features of an > object broker. Namely, client control of object expiration (TTL) and > retrieval of "named query results" from persistent storage. > > A WDQS service use case may in fact be to compare results from several days > ago with current results. Thus, assuming the latest results state is what > the client wants my actually not be true. I think all of this is doable. Named query results is something we talked about in the previous discussion re `GET` length restrictions. `POST`ing (and/or server-side configuring, either way!) a complex query and saving it as a named query through a separate query-setup interface, then executing the query for results with a `GET` on just the query name. I don't think we really want client control of object expiration (at least, not "varnish cache object expiration"), but what we want is the ability to parameterize named queries based on time, right? e.g. a named query that gives a time-series graph might have parameters for start time and duration. You might initially post the complex SPARQL template and save it as `fooquery`, then later have a client get it as `/sparql?saved_query=fooquery&start=201601011234&duration=1w`. Varnish would have the chance to cache those based on the query args as separate results, and you could limit the time resolution if you want to enhance cacheability. If it's for inclusion from a page that wants to graph that data and always show a "current" graph rather than hardcoded start/duration (and I could see use-cases for both in articles), you could support a start time of `now` with an optional resolution specifier that defaults to 1 day, like `&start=now/1d`. The response to such a query would set cache-control headers that allow caching at varnish up to 24H (based on `now/1d` resolution), which means everyone executing that query gets new results about once a day and they all shared a single cached result per day. The important thing here is there's no need for a client to have control over result object expiration if the query encodes everything that's relevant to expiration and the maximum cache lifetime is set small enough that other effects (e.g. data updates to existing historical data) are negligible in the big picture. > Possibly, the optimal solution would use the varnish-api-engine > (http://info.varnish-software.com/blog/introducing-varnish-api-engine) in > conjunction with a WDQS REST API (provided with a modified RESTBase?). Is > the varnish-api-engine being used anywhere in WMF? Also, delegating query > requests to an API could allow POSTs. Simply with Varnish cache, the POST > problem would remain unresolved. We're not using the Varnish API Engine, and I don't see us pursuing that anytime soon. Most of what it does can be done other ways, and more importantly it's commercial software. There seems to be some confusion as to whether `POST` is or isn't still an issue here... Also, a whole separate issue is that WDQS is currently mapped through our `cache_misc` cluster. That cluster is for small lightweight miscellaneous infrastructure. WDQS was probably always a poor match for that, but we put it there because at the time it was seen as being a lightweight / low-rate service that would mostly be used directly by humans to execute one-off complicated queries. The plans in this ticket sound nothing like that, and `cache_misc` probably isn't an appropriate home for a complex query services that's going to backend serious query load from wikis and the rest of the world... TASK DETAIL https://phabricator.wikimedia.org/T126730 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: BBlack Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs