Hello, We are currently using Apache flink 1.12.0 deployed on k8s cluster of 1.18 with zk for HA. Due to certain vulnerabilities in container related with few jar(like netty-*, meso), we are forced to upgrade.
While upgrading flink to 1.14.0, faced NPE, https://issues.apache.org/jira/browse/FLINK-23901?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17402570#comment-17402570 To address it, I have followed steps 1. savepoint creation 2. Stop the job 3. Restore from save point where i am facing challenge. For step #3 from above, i was able to restore from savepoint mainly because: "bin/flink run -s :savepointPath [:runArgs] " It majorly about restarting a jar file uploaded. As our application is based on k8s and running using docker, i was not able to restore it. And because of it, state of variables in accumulator got corrupted and i lost the data in one of env. My query is, what is preffered way to restore from savepoint, if application is running on k8s using docker. We are using following command to run job manager: /docker-entrypoint.sh "standalone-job" "-Ds3.access-key= ${AWS_ACCESS_KEY_ID}" "-Ds3.secret-key=${AWS_SECRET_ACCESS_KEY}" "-Ds3.endpoint=${AWS_S3_ENDPOINT}" "-Dhigh-availability.zookeeper.quorum= ${ZOOKEEPER_CLUSTER}" "--job-classname" "<class-name>" ${args} Thank you in advance...! -- Regards, Parag Surajmal Somani.