Smalyshev created this task.
Smalyshev added projects: MediaWiki-API, Wikidata-Query-Service, Epic, Wikimedia-Stream, EventBus.
Herald added a subscriber: Aklapper.
Herald added projects: Wikidata, Discovery, Analytics.

TASK DESCRIPTION

Requirements:

  • Real-time - I can get changes from wiki within small time (<30 seconds) of the time they happened, with that time being defined as the time the changes have been committed to the database and visible to the users on wiki.
  • Reliable - if I consume every individual change message in the stream in the sequence the stream provides, I will know about all the changes in the wiki content.
  • Seekable - I can connect to the stream in a predictable point at wiki history (either by timestamp or by RC ID) and get all the messages. At least 14 days of messages back should be available, but larger availability is not a must.
  • Resumable - I can disconnect from the stream and then reconnect later and resume consumption from the same point I have left it. The service should not require constant connection for getting the updates, and the stream from disconnected and resumed connection should be the same as if connection has never been interrupted (except the obvious difference in message delivery times, etc.)
  • Scalable - there's no hard limit on the number of clients connecting, within the reasonable limits of infrastructure, networking, etc.
  • Stateless - the server does not keep per-client state and the client always has all the information to continue stream consumption from the point it has stopped at (this may be not a very important one if scalability is kept).

Current use case:

Supplying update stream for Wikidata Query Service.

Delta for existing services:

API:Recentchanges

  • Not reliable - messages can be backfilled into the stream with timestamp many seconds in the past, which means sequential reading of the stream by timestamp would miss those (T161342). Even if reading by RCID is implemented, parallel commits could still lead to backfilling and thus unreliable stream.
  • Does not have events for page props updates (T145712) which happen asynchronously from main article update.
  • May miss some deletion events if it is combined with revision hiding.

EventStreams

  • Not consumable per-wiki due to the lack of filtering (T152731)
  • Not seekable
  • Does not have data back more than 7 days

TASK DETAIL
https://phabricator.wikimedia.org/T161731

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Aklapper, Smalyshev, QZanden, EBjune, merbst, Sethakill, Avner, debt, dg711, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, JAllemandou, mobrovac, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, Deskana, Manybubbles, jayvdb, Anomie, Mbch331, Krenair, jeremyb, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to