dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a maintainer of the W[DC]QS Streaming Updater I want to be monitor and be 
alerted when the space usage of these flink jobs reach a certain threshold so 
that I can act quickly to investigate any issues and perform the required 
cleanups.
  
  (This is a followup of T314835 <https://phabricator.wikimedia.org/T314835>)
  
  Today we use 3 containers in thanos:
  
  - rdf-streaming-updater-codfw
  - rdf-streaming-updater-eqiad
  - rdf-streaming-updater-staging
  
  Given than:
  
  - a wikidata savepoint is 3Gb
  - a commons savepoint is 2Gb
  - incremental checkpoints do consume less than 1.5Gb for each job
  - flink_ha_storage should not need more than 200Mb
  
  If we keep a couple savepoints per job we should be able to operate with 50Gb.
  The number of objects should be relatively small as well, 12 per savepoints, 
a bit more per checkpoints so consuming more than 500 objects might require 
some investigation.
  
  AC:
  
  - update the dashboard on 
https://grafana-rw.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1 and 
add a graph to have the space and object usage of these containers
  - create an alert if the space usage is above 50Gb per container
  - create an alert if the number of objects is above 500 per container

TASK DETAIL
  https://phabricator.wikimedia.org/T316005

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: fgiunchedi, bking, dcausse, Aklapper, AWesterinen, MPhamWMF, CBogen, 
Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to