dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION As a maintainer of the wdqs streaming updater I want to understand why the checkpoint _metadata file has grown to 70m (which requires bumping flink memory limits) so that I can prevent it from happening again. The flink application died around `2021-07-17T13:20:00` time at which a switch in row A (codfw) died. It is unclear if the growth of the _metadata file size is related to the failure or if it prevented the restart of the pipeline (default `akka.framesize` too small) A copy of the checkpoint _metadata file has been kept in `stat1004:/home/dcausse/flink-1.12.1-wdqs/wdqs_streaming_updater/checkpoints/b4d1cd3eb1ab4002a63b7c229a8c3542/chk-140815`) The pipeline was able to restart after tuning `akka.framesize` to 100Mb and giving more heap. A savepoint was then taken but created a metadata file even bigger (800Mb). It's available at `swift://updater.thanos-swift/wdqs_streaming_updater/savepoints/savepoint-f6a960-fdd300f4e05b`. AC: - understand what caused the growth to the _metadata file - fix the underlying issue TASK DETAIL https://phabricator.wikimedia.org/T286890 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org