dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a maintainer of the wdqs streaming updater I want to understand why the 
checkpoint _metadata file has grown to 70m (which requires bumping flink memory 
limits) so that I can prevent it from happening again.
  
  The flink application died around `2021-07-17T13:20:00` time at which a 
switch in row A (codfw) died.
  
  It is unclear if the growth of the _metadata file size is related to the 
failure or if it prevented the restart of the pipeline (default 
`akka.framesize` too small)
  
  A copy of the checkpoint _metadata file has been kept in 
`stat1004:/home/dcausse/flink-1.12.1-wdqs/wdqs_streaming_updater/checkpoints/b4d1cd3eb1ab4002a63b7c229a8c3542/chk-140815`)
  
  The pipeline was able to restart after tuning `akka.framesize` to 100Mb and 
giving more heap.
  A savepoint was then taken but created  a metadata file even bigger (800Mb). 
It's available at 
`swift://updater.thanos-swift/wdqs_streaming_updater/savepoints/savepoint-f6a960-fdd300f4e05b`.
  
  AC:
  
  - understand what caused the growth to the _metadata file
  - fix the underlying issue

TASK DETAIL
  https://phabricator.wikimedia.org/T286890

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to