Our WDQS backend servers (in CODFW only) have incredibly patchy
availability currently.

As a result a sizeable portion of queries made to query.wikidata.org are
failing or taking unusually long.

We're doing our best to isolate a cause (basically a user or user(s)
submitting particularly expensive or error-generating queries). Until we
succeed in that service availability is likely to be quite poor.

Note that we currently have a mitigation in place where we're restarting
blazegraph across the affected hosts (codfw) hourly, but that mitigation is
insufficient currently.

You can see the current status of wdqs backend server availability here:
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&from=now-1h&to=now&refresh=1m

^ This is a graph of our total triple count (i.e. not explicitly a graph of
service availability), but servers affected by the blazegraph deadlock
issue that we're experiencing fail to report metrics while they're
affected. So the presence or absence of RDF triple counts for a given host
corresponds to its uptime
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to