Hello all!

Here are a few updates from Wikidata Query Service:

* We are getting close to full functional coverage of our Flink based
Streaming Updater [1]. We are starting to work on productionizing it and
having a deployment strategy. The current goal is deploy on top of
Kubernetes.
* We've been reviewing how we log queries and we've been adding some
context to the logs. In particular, we are adding CPU load and query
concurrency [2], with the hope that we can normalize our analysis: a query
that takes time because the server is overload does not have the same
meaning as a query that takes time because it is intrinsically expensive.
* We've been exploring our assumption that expensive queries are more
likely to be human generated queries (via the UI) than bots [3]. That
assumption seems to be wrong.
* We are looking into upgrading to JDK11. We are currently running on JDK8,
we have some time before it is truly end of life. Blazegraph itself has a
number of issues with JDK11.
* We had a few issues with data reload on Wikimedia Commons Query Service.
We are now doing those data reload without interruption, so future issues
should not result in any downtime, but just delays in getting the new data.
The data size of WCQS is growing faster than we expected. We are
tentatively planning on working on a streaming updater for WCQS early 2021.

Have fun!

   Guillaume

[1] https://phabricator.wikimedia.org/T244590
[2] https://phabricator.wikimedia.org/T261937
[3] https://phabricator.wikimedia.org/T261841#6532765

-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+1 / CET
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to