dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a maintainer of a flink session cluster I want to stop using the presto 
client for swift present in the flink image so that I can migrate to newer 
version of flink since it was removed.
  
  This is a followup of T302494 <https://phabricator.wikimedia.org/T302494> 
where we dropped this dependency from the jobs running in the flink session 
cluster. This task is about dropping this swift client from the image.
  
  Existing flink session clusters rely on this swift client to store their H/A 
related data (e.g. job jars). This means we must migrate existing clusters to 
using s3 as a simple drop-in replacement is unlikely to work.
  
  Suggested migration procedure:
  
  - For codfw
    - route wdqs & wcqs to eqiad only
    - adapt the wikidata maxlag to poll eqiad only
    - stop (with a savepoint) all the jobs (WDQS & WCQS) running on the codfw 
k8s wikikube cluster
    - undeploy all the k8s deployments under the `rdf-streaming-updater` 
namespace (dropping all flink generated configmaps might be necessary by e.g. 
recreating the k8s namespace)
    - delete the flink_ha_storage folder on the corresponding s3 bucket 
(`rdf-streaming-updater-codfw`)
    - drop presto-swift from 
https://gerrit.wikimedia.org/g/wikidata/query/flink-rdf-streaming-updater and 
create a new image
    - adapt the patch generated by PipelineLib when merging the patch above and 
remove all mentions to swift from `deployment-charts` (possibly adapting 
existing patch: 
https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/766123)
    - deploy the chart to the rdf-streaming-updater namespace in codfw (which 
should be empty)
    - deploy the flink jobs (WCQS & WDQS) from their corresponding savepoints
    - repool codfw & resume polling codfw for wikidata maxlag calculation
  - For eqiad (do all the above replacing eqiad with codfw and vice versa)
  
  Note that most of this procedure can be tested against the staging cluster 
(omitting the parts about the routing live traffic and wikidata maxlag)
  
  AC:
  
  - none of the flink session clusters are using the presto swift client

TASK DETAIL
  https://phabricator.wikimedia.org/T304914

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to