dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  As a wdqs maintainer I would like to be able to backfill without human 
intervention the revision-create topics when it contains data produced from 
multiple DCs.
  
  During a DC switch change-prop will switch to another DC and will produce its 
data to another kafka topic. In normal situation this is not a problem as the 
backlog should be pretty low and the ability to mark sources as "idle" and the 
work done in T258687 <https://phabricator.wikimedia.org/T258687> will make the 
flink pipeline works flawlessly during DC switches.
  The problem arises when the backlog is large (which happens if you want to 
backfill, e.g. when starting the pipeline for the first time or if it was 
stopped for multiple days).
  
  In situation after a switch from `eqiad` to `codfw` the topic 
`eqiad.mediawiki.revision-create` may contain data needed for a backfill that 
is several days older than the first event available on 
`codfw.mediawiki.revision-create`. The events from 
`codfw.mediawiki.revision-create` will be held in their respective time window 
while waiting for the `eqiad.mediawiki.revision-create` to be aligned with 
`codfw.mediawiki.revision-create` (fully drained and marked idle or on the 
first event newer than the ones read from `codfw`). When this happens all the 
windows created while buffering `codfw.mediawiki.revision-create` events will 
be fired at the same time.
  This also may create a relatively large state on the event reordering 
operator (up to 200Mb on a backlog of 2.65M from eqiad + 22M from codfw). With 
our current settings checkpointing fails after our 15mins time out. Progress is 
still made so it's possible that the backfill may succeed after sufficient 
retries.
  
  The problem here is that flink is not aware of our DC switch strategy and 
there is no need to start reading data from `codfw.mediawiki.revision-create` 
if `eqiad.mediawiki.revision-create` is not fully drained (and vice versa).
  
  Note that this problem is only relevant if we want to backfill a large 
backlog created during a DC switch and it might not be worth 
investigating/implementing possible solutions if we do not want to backfill in 
such scenarios.
  
  One possible solution could be to write a custom source that consumer based 
on the event timestamp only one DC queue at a time.

TASK DETAIL
  https://phabricator.wikimedia.org/T262676

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, darthmon_wmde, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to