Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

Chesnay Schepler Fri, 02 Oct 2020 02:42:46 -0700

Yes, the patch call only triggers the cancellation.

You can check whether it is complete by polling the job status viajobs/<jobid> and checking whether state is CANCELED.


On 9/27/2020 7:02 PM, Eleanore Jin wrote:

I have noticed this: if I have Thread.sleep(1500); after the patchcall returned 202, then the directory gets cleaned up, in themeanwhile, it shows the job-manager pod is in completed state beforegetting terminated: see screenshot: https://ibb.co/3F8HsvG

So the patch call is async to terminate the job? Is there a way tocheck if cancel is completed? So that the stop tm and jm can be calledafterwards?


Thanks a lot!
Eleanore

On Sun, Sep 27, 2020 at 9:37 AM Eleanore Jin <eleanore....@gmail.com<mailto:eleanore....@gmail.com>> wrote:


    Hi Congxian,
    I am making rest call to get the checkpoint config: curl -X GET \
    
http://localhost:8081/jobs/d2c91a44f23efa2b6a0a89b9f1ca5a3d/checkpoints/config


    and here is the response:
    {
        "mode": "at_least_once",
        "interval": 3000,
        "timeout": 10000,
        "min_pause": 1000,
        "max_concurrent": 1,
        "externalization": {
            "enabled": false,
            "delete_on_cancellation": true
        },
        "state_backend": "FsStateBackend"
    }

    I uploaded a screenshot of how azure blob storage looks like after
    the cancel call : https://ibb.co/vY64pMZ

    Thanks a lot!
    Eleanore

    On Sat, Sep 26, 2020 at 11:23 PM Congxian Qiu
    <qcx978132...@gmail.com <mailto:qcx978132...@gmail.com>> wrote:

        Hi Eleanore
            What the `CheckpointRetentionPolicy`[1] did you set for
        your job? if
        `ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION` is set,
        then the checkpoint will be kept when canceling a job.

        PS the image did not show

        [1]
        
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints
        Best,
        Congxian


        Eleanore Jin <eleanore....@gmail.com
        <mailto:eleanore....@gmail.com>> 于2020年9月27日周日
        下午1:50写道：

            Hi experts,

            I am running flink 1.10.2 on kubernetes as per job
            cluster. Checkpoint is enabled, with interval 3s,
            minimumPause 1s, timeout 10s. I'm using FsStateBackend,
            snapshots are persisted to azure blob storage (Microsoft
            cloud storage service).

            Checkpointed state is just source kafka topic offsets, the
            flink job is stateless as it does filter/json transformation.

            The way I am trying to stop the flink job is via
            monitoring rest api mentioned in doc
            
<https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jobs-jobid-1>

            e.g.
            curl -X PATCH \
             
            
'http://localhost:8081/jobs/3c00535c182a3a00258e2f57bc11fb1a?mode=cancel'
            \
              -H 'Content-Type: application/json' \
              -d '{}'

            This call returned successfully with statusCode 202, then
            I stopped the task manager pods and job manager pod.

            According to the doc, the checkpoint should be cleaned up
            after the job is stopped/cancelled.
            What I have observed is, the checkpoint dir is not cleaned
            up, can you please shield some lights on what I did wrong?

            Below shows the checkpoint dir for a cancelled flink job.
            image.png

            Thanks!
            Eleanore

Re: Checkpoint dir is not cleaned up after cancel the job with monitoring API

Reply via email to