Hi Vishal, you can also keep the same cluster id when cancelling a job with savepoint and then resuming a new job from it. Terminating the job should clean up all state in Zk.
Cheers, Till On Fri, Feb 8, 2019 at 11:26 PM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > In one case however, we do want to retain the same cluster id ( think > ingress on k8s and thus SLAs with external touch points ) but it is > essentially a new job ( added an incompatible change but at the interface > level it retains the same contract ) , the only way seems to be to remove > the chroot/subcontext from ZK , and relaunch , essentially deleting ant > vestiges of the previous incarnation. And that is fine if that is indeed > the process. > > > On Fri, Feb 8, 2019 at 7:58 AM Till Rohrmann <trohrm...@apache.org> wrote: > >> If you keep the same cluster id, the upgraded job should pick up >> checkpoints from the completed checkpoint store. However, I would recommend >> to take a savepoint and resume from this savepoint because then you can >> also specify that you allow non restored state, for example. >> >> Cheers, >> Till >> >> On Fri, Feb 8, 2019 at 11:20 AM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> Is the rationale of using a jobID 000000* also roughly the same. As in a >>> Flink job cluster is a single job and thus a single job id suffices ? I am >>> more wondering about the case when we are doing a compatible changes to a >>> job and want to resume ( given we are in HA mode and thus have a >>> chroot/subcontext on ZK for the job cluster ) , it would make no sense to >>> give a brand new job id ? >>> >>> On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann <trohrm...@apache.org> >>> wrote: >>> >>>> Hi Sergey, >>>> >>>> the rationale why we are using a K8s job instead of a deployment is >>>> that a Flink job cluster should terminate after it has successfully >>>> executed the Flink job. This is unlike a session cluster which should run >>>> forever and for which a K8s deployment would be better suited. >>>> >>>> If in your use case a K8s deployment would better work, then I would >>>> suggest to change the `job-cluster-job.yaml` accordingly. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov <belikov.ser...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> my team is currently experimenting with Flink running in Kubernetes >>>>> (job cluster setup). And we found out that with JobManager being deployed >>>>> as "Job" we can't just simply update certain values in job's yaml, e.g. >>>>> spec.template.spec.containers.image ( >>>>> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817). >>>>> This causes certain troubles in our CI/CD pipelines so we are thinking >>>>> about using "Deployment" instead of "Job". >>>>> >>>>> With that being said I'm wondering what was the motivation behind >>>>> using "Job" resource for deploying JobManager? And are there any pitfalls >>>>> related to using Deployment and not Job for JobManager? >>>>> >>>>> Thank you in advance. >>>>> -- >>>>> Best regards, >>>>> Sergey Belikov >>>>> >>>>