dcausse added a comment.

  The 3 tasks above should be the followups of this incident.
  The root cause of the incident is I think a mix of the poor `swift` client 
used by the flink H/A component and possibly the instability of thanos-fe2001 
that exacerbated the poor behaviors of this swift client.
  The checkpoints were stored properly but the flink H/A component was not able 
to fully acknowledge that the checkpoint was successful. The job is configured 
to keep only the last successful checkpoints but the H/A issue caused this 
cleanup to fail and old checkpoints were not removed.
  
  Moving forward we will:
  
  - stop the presto-swift client in favor of an S3 
<https://phabricator.wikimedia.org/S3> connector.
  - cleanup the `rdf-streaming-updater-codfw` container
  - monitor and alert on the space usage on these containers (if there's also 
way to implement a quota per container I'd be in favor of doing so)
  
  Unless I missed something or that we want to continue tracking some work with 
task I believe we can close this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T314835

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: EBernhardson, MatthewVernon, gmodena, elukey, bking, Ottomata, dcausse, 
LSobanski, dr0ptp4kt, fgiunchedi, Aklapper, Hellket777, Raineydaz, 
LisafBia6531, Astuthiodit_1, AWesterinen, 786, BTullis, Biggs657, 
karapayneWMDE, joanna_borun, Invadibot, Lalamarie69, MPhamWMF, Devnull, 
maantietaja, Juan90264, Muchiri124, Alter-paule, Beast1978, CBogen, ItamarWMDE, 
Un1tY, Akuckartz, Di3sel1975, Hook696, Kent7301, Chambersjay, RhinosF1, 
joker88john, Legado_Shulgin, ReaperDawn, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Conradrock, Davinaclare77, Cpaulf30, Techguru.pc, Lahi, 
Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Hfbn0, 
QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, Maathavan, _jensen, 
rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to