Re: Flink Job cluster in HA mode - recovery vs upgrade

Chesnay Schepler Sun, 23 Aug 2020 07:25:43 -0700

If HA is enabled the the cluster will continue from the latestexternalized checkpoint.

Without HA it still start from the savepoint.

On 23/08/2020 16:18, Alexey Trenikhun wrote:

Let’s say job cluster was submitted as job from save point sp1, sospec includes “-s sp1”, job run for days, takin externalizedcheckpoints every 5 minute, then suddenly pod failed, Kubernetes jobcontroller restarts job pod using original job spec, which has “-ssp1”, so Flink job will start from sp1 rather than from latestexternalized checkpoint. Is my understanding correct?
------------------------------------------------------------------------
*From:* Chesnay Schepler <ches...@apache.org>
*Sent:* Sunday, August 23, 2020 1:46:45 AM
*To:* Alexey Trenikhun <yen...@msn.com>; Piotr Nowojski<pnowoj...@apache.org>
*Cc:* Flink User Mail List <user@flink.apache.org>
*Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade

A job cluster is submitted as a job, not a deployment.
The built-in Job controller of Kubernetes ensures that this jobfinishes successfully, and if required starts new pods.
On 23/08/2020 06:43, Alexey Trenikhun wrote:
Since it is necessary to use cancel with save point/resume from savepoint, then it is not possible to use Deployment (otherwiseJobManager pod will restart on crash from same save point), so weneed to use Job, but in that case ifJob pod is crashed who will startnew instance of Job pod ? Sounds like currently HA with kubernetes isnot achievable unless some controller is used to manage JobManager.Am I right?
------------------------------------------------------------------------
*From:* Chesnay Schepler <ches...@apache.org> <mailto:ches...@apache.org>
*Sent:* Saturday, August 22, 2020 12:58 AM
*To:* Alexey Trenikhun <yen...@msn.com> <mailto:yen...@msn.com>;Piotr Nowojski <pnowoj...@apache.org> <mailto:pnowoj...@apache.org>*Cc:* Flink User Mail List <user@flink.apache.org><mailto:user@flink.apache.org>
*Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade
If, and only if, the cluster-id and JobId are identical then theJobGraph will be recovered from ZooKeeper.
On 22/08/2020 06:12, Alexey Trenikhun wrote:
Not sure I that I understand your statement about "the HaServicesare only being given the JobGraph", seemsHighAvailabilityServices#getJobGraphStore provides JobGraphStore,and potentially implementation ofJobGraphStore#recoverJobGraph(JobID jobId) for this store couldbuild new graph for jar rather than read stored graph from ZooKeeper?
Also, if there is single job with same job-id (job cluster),jobgraph of failed job will be over written by new one which willhave same job-id?
------------------------------------------------------------------------
*From:* Chesnay Schepler <ches...@apache.org><mailto:ches...@apache.org>
*Sent:* Friday, August 21, 2020 12:16 PM
*To:* Alexey Trenikhun <yen...@msn.com> <mailto:yen...@msn.com>;Piotr Nowojski <pnowoj...@apache.org> <mailto:pnowoj...@apache.org>*Cc:* Flink User Mail List <user@flink.apache.org><mailto:user@flink.apache.org>
*Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade
The HaServices are only being given the JobGraph, to this is notpossible.
Actually I have to correct myself. For a job cluster the state in HAshould be irrelevant when you're submitting another jar.Flink has no way of knowing that this jar is in any way connected tothe previous job; they will be treated as separate things.
However, you will likely end up with stale data in zookeeper (thejobgraph of the failed job).
On 21/08/2020 17:51, Alexey Trenikhun wrote:
Is it feasible to override ZooKeeperHaServices to recreate JobGraphfrom jar instead of reading it from ZK state. Any hints? I havefeeling that reading JobGraph from jar is more resilient approach,less chances of mistakes during upgrade
Thanks,
Alexey

------------------------------------------------------------------------
*From:* Piotr Nowojski <pnowoj...@apache.org><mailto:pnowoj...@apache.org>
*Sent:* Thursday, August 20, 2020 7:04 AM
*To:* Chesnay Schepler <ches...@apache.org> <mailto:ches...@apache.org>
*Cc:* Alexey Trenikhun <yen...@msn.com> <mailto:yen...@msn.com>;Flink User Mail List <user@flink.apache.org><mailto:user@flink.apache.org>
*Subject:* Re: Flink Job cluster in HA mode - recovery vs upgrade
Thank you for the clarification Chesney and sorry for the incorrectprevious answer.
Piotrek
czw., 20 sie 2020 o 15:59 Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org>> napisał(a):
    This is incorrect; we do store the JobGraph in ZooKeeper. If
    you just delete the deployment the cluster will recover the
    previous JobGraph (assuming you aren't changing the Zookeeper
    configuration).

    If you wish to update the job, then you should cancel it (along
    with creating a savepoint), which will clear the Zookeeper
    state, and then create a new deployment

    On 20/08/2020 15:43, Piotr Nowojski wrote:
    Hi Alexey,

    I might be wrong (I don't know this side of Flink very well),
    but as far as I know JobGraph is never stored in the ZK. It's
    always recreated from the job's JAR. So you should be able to
    upgrade the job by replacing the JAR with a newer version, as
    long as the operator UIDs are the same before and after the
    upgrade (for operator state to match before and after the
    upgrade).

    Best, Piotrek

    czw., 20 sie 2020 o 06:34 Alexey Trenikhun <yen...@msn.com
    <mailto:yen...@msn.com>> napisał(a):

        Hello,

        Let's say I run Flink Job cluster with persistent storage
        and Zookeeper HA on k8s with single  JobManager and use
        externalized checkpoints. When JM crashes, k8s will
        restart JM pod, and JM will read JobId and JobGraph from
        ZK and restore from latest checkpoint. Now let's say I
        want to upgrade job binary, I delete deployments, create
        new deployments referring to newer image, will JM still
        read JobGraph from ZK or will create new one from new job jar?

        Thanks,
        Alexey

Re: Flink Job cluster in HA mode - recovery vs upgrade

Reply via email to