Re: Recover from savepoints with Kubernetes HA

2021-07-23 Thread Austin Cawley-Edwards
Great, glad it was an easy fix :) Thanks for following up! On Fri, Jul 23, 2021 at 3:54 AM Thms Hmm wrote: > Finally I found the mistake. I put the „—host 10.1.2.3“ param as one > argument. I think the savepoint argument was not interpreted correctly or > ignored. Might be that the „-s“ param

Re: Recover from savepoints with Kubernetes HA

2021-07-23 Thread Thms Hmm
Finally I found the mistake. I put the „—host 10.1.2.3“ param as one argument. I think the savepoint argument was not interpreted correctly or ignored. Might be that the „-s“ param was used as value for „—host 10.1.2.3“ and „s3p://…“ as new param and because these are not valid arguments they were

Re: Recover from savepoints with Kubernetes HA

2021-07-22 Thread Yang Wang
Please note that when the job is canceled, the HA data(including the checkpoint pointers) stored in the ConfigMap/ZNode will be deleted. But it is strange that the "-s/--fromSavepoint" does not take effect when redeploying the Flink application. The JobManager logs could help a lot to find the

Re: Recover from savepoints with Kubernetes HA

2021-07-22 Thread Austin Cawley-Edwards
Hey Thomas, Hmm, I see no reason why you should not be able to update the checkpoint interval at runtime, and don't believe that information is stored in a savepoint. Can you share the JobManager logs of the job where this is ignored? Thanks, Austin On Wed, Jul 21, 2021 at 11:47 AM Thms Hmm

Re: Recover from savepoints with Kubernetes HA

2021-07-21 Thread Austin Cawley-Edwards
Hi Thomas, I've got a few questions that will hopefully help get to find an answer: What job properties are you trying to change? Something like parallelism? What mode is your job running in? i.e., Session, Per-Job, or Application? Can you also describe how you're redeploying the job? Are you

Recover from savepoints with Kubernetes HA

2021-07-21 Thread Thms Hmm
Hey, we have some application clusters running on Kubernetes and explore the HA mode which is working as expected. When we try to upgrade a job, e.g. trigger a savepoint, cancel the job and redeploy, Flink is not restarting from the savepoint we provide using the -s parameter. So all state is