Gehel created this task.
Gehel triaged this task as "High" priority.
Gehel added projects: Wikidata-Query-Service, Discovery-Search (Current work), Operations.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION

We've had a number of cases where wdqs-updater was either lagging because of load on blazegraph or causing issues on its own, affecting blazegraph, or at least the shared servers. A number of operations done by updater could be shared between servers, thus reducing the processing power needed and reducing the load on other services.

At high level, the updater process is:

  1. get a stream of wikidata changes (either from Recent Changes API or by filtering Kafka events)
  2. deduplicate those events over a period of time
  3. enrich them with the actual data changed by querying Wikidata API
  4. batching the enriched changes to apply them to blazegraph

All this is a fairly standard event sourcing pattern.

The event stream is the same for all servers, so step 1), 2) and 3) could be shared, and they don't have any direct dependency on blazegraph. Step 4) needs to be done for each wdqs blazegraph instance.

Additional constraints:

  • we need to be able to replay events over some period of time (~2 weeks) during data load, data is loaded from a wikidata dump, and then updater process is used to catch up on event occurring after the dump
  • some level of ordering is required

It looks like k8s would be a reasonable place to run such a service. A single instance of the service would be needed as some shared state is required for deduplication and ordering. After step 3), events could be sent to another kafka topic. Step 4) would be a simplified updater, running on each wdqs node.

I'm probably missing a few things, feedback on the proposal is welcomed!


TASK DETAIL
https://phabricator.wikimedia.org/T207837

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Aklapper, Joe, Gehel, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, Jonas, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to