Hello,

We are currently using Apache flink 1.12.0 deployed on k8s cluster of 1.18
with zk for HA. Due to certain vulnerabilities in container related with
few jar(like netty-*, meso), we are forced to upgrade.

While upgrading flink to 1.14.0, faced NPE,
https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570

To address it, I have followed steps

   1. savepoint creation
   2. Stop the job
   3. Restore from save point where i am facing challenge.

For step #3 from above, i was able to restore from savepoint mainly because:
"bin/flink run -s :savepointPath [:runArgs] "
It majorly about restarting a jar file uploaded. As our application is
based on k8s and running using docker, i was not able to restore it. And
because of it, state of variables in accumulator got corrupted and i lost
the data in one of env.

My query is, what is preffered way to restore from savepoint, if
application is running on k8s using docker.

We are using following command to run job manager:
 /docker-entrypoint.sh "standalone-job" "-Ds3.access-key=
${AWS_ACCESS_KEY_ID}" "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}"
"-Ds3.endpoint=${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum=
${ZOOKEEPER_CLUSTER}" "--job-classname" "<class-name>"  ${args}

Thank you in advance...!

-- 
Regards,
Parag Surajmal Somani.

Reply via email to