Hi Henry,

I gave a blue comment in your original email.

Thanks, vino.

徐涛 <happydexu...@gmail.com> 于2018年9月25日周二 下午12:56写道:

> Hi Vino,
> *What is the definition and difference between job cancel and job fails?*
> Can I say that if the program is shutdown artificially, then it is a job
> cancel,
>                        if the program is shutdown due to some error, it is
> a job fail?
>
>
This is not entirely true, and artificially triggering a cancel may also
lead to failure. You can think that if the human triggers the cancel, each
task instance can be correctly canceled, then the final job's status is
canceled. The final state of the job due to various anomalies is failed.


> This is important because it is the prerequisite for the following
> question:
>
> In the document of Flink 1.6, it says:
> * "ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION: Retain the
> checkpoint when the job is cancelled. Note that you have to manually clean
> up the checkpoint state after cancellation in this case.   *
> *        ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION: Delete the
> checkpoint when the job is cancelled. The checkpoint state will only be
> available if the job fails."*
> But it does not says whether the checkpoint will be retained on fail.
> If the checkpoint activity of fail is the same as cancel, then I have to
> use RETAIL_ON_CANCELLATION, because if I do not use it, the checkpoint will
> be deleted on job fail.
> If the checkpoint activity of fail is not delete, then at this case it is
> safe on job fail.
>

In the configuration, there are two enumeration classes
`CheckpointRetentionPolicy` and `ExternalizedCheckpointCleanup`, you need
to consider which configuration you want to use. Your main concern is
ExternalizedCheckpointCleanup, which cleans up the metadata for
externalized checkpoints. Are you sure you want to use it? Flink defaults
to self-management checkpoint cleanup, which is a non-externalized
checkpoint.


> Best
> Henry
>
>
> 在 2018年9月25日,上午11:16,vino yang <yanghua1...@gmail.com> 写道:
>
> Hi Henry,
>
> Answer your question:
>
> What is the definition and difference between job cancel and job fails?
>
> > The cancellation and failure of the job will cause the job to enter the
> termination state. But cancellation is artificially triggered and normally
> terminated, while failure is usually a passive termination due to an
> exception.
>
> If I use DELETE_ON_CANCELLATION option, in this case, does I have the
> checkpoint to resume the program?
>
> > No, if you use externalized checkpoints. you cannot resume from
> externalized checkpoints after the job has been cancelled.
>
> I mean if I can guarantee that a savepoint can always be made before
> manually cancelation. If I use DELETE_ON_CANCELLATION option on
> checkpoints, is there any probability that I do not have a checkpoint to
> recover from?
>
> > From the latest source code, savepoint is not affected by
> CheckpointRetentionPolicy, it needs to be cleaned up manually.
>
> Thanks, vino.
>
> 徐涛 <happydexu...@gmail.com> 于2018年9月25日周二 上午11:06写道:
>
>> Hi All,
>> I mean if I can guarantee that a savepoint can always be made before
>> manually cancelation. If I use DELETE_ON_CANCELLATION option on
>> checkpoints, is there any probability that I do not have a checkpoint to
>> recover from?
>> Thank a a lot.
>>
>> Best
>> Henry
>>
>>
>>
>> 在 2018年9月25日,上午10:41,徐涛 <happydexu...@gmail.com> 写道:
>>
>> Hi All,
>> In flink document, it says
>> DELETE_ON_CANCELLATION: “Delete the checkpoint when the job is cancelled.
>> The checkpoint state will only be available if the job fails.”
>> What is the definition and difference between job cancel and job
>> fails? If I run the program on yarn, and after a few days, the yarn
>> application get failed for some reason.
>> If I use DELETE_ON_CANCELLATION option, in this case, does I have the
>> checkpoint to resume the program?
>>
>> If the checkpoint are *only* deleted when I cancel the program, I can
>> always make the savepoint before cancelation. Then it seems that I can
>> *only* set DELETE_ON_CANCELLATION then.
>> I can not find a case that RETAIN_ON_CANCELLATION should be used.
>>
>> Best
>> Henry
>>
>>
>>
>

Reply via email to