If I understand fine-grained recovery correctly, one would still need to
take checkpoints.

Ashish would like to avoid checkpointing and accept to lose the state of
the failed task.
However, he would like to avoid losing more state than necessary due to
restarting of tasks that did not fail.

Best, Fabian

2018-03-15 1:45 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:

> Hi,
>
> Have you looked into fine-grained recovery? https://cwiki.
> apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+
> Recovery+from+Task+Failures
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+:+Fine+Grained+Recovery+from+Task+Failures>
>
> Stefan cc'ed might be able to give you some pointers about configuration.
>
> Best,
> Aljoscha
>
>
> On 6. Mar 2018, at 22:35, Ashish Pokharel <ashish...@yahoo.com> wrote:
>
> Hi Gordon,
>
> The issue really is we are trying to avoid checkpointing as datasets are
> really heavy and all of the states are really transient in a few of our
> apps (flushed within few seconds). So high volume/velocity and transient
> nature of state make those app good candidates to just not have
> checkpoints.
>
> We do have offsets committed to Kafka AND we have “some” tolerance for gap
> / duplicate. However, we do want to handle “graceful” restarts / shutdown.
> For shutdown, we have been taking savepoints (which works great) but for
> restart, we just can’t find a way.
>
> Bottom line - we are trading off resiliency for resource utilization and
> performance but would like to harden apps for production deployments as
> much as we can.
>
> Hope that makes sense.
>
> Thanks, Ashish
>
> On Mar 6, 2018, at 10:19 PM, Tzu-Li Tai <tzuli...@gmail.com> wrote:
>
> Hi Ashish,
>
> Could you elaborate a bit more on why you think the restart of all
> operators
> lead to data loss?
>
> When restart occurs, Flink will restart the job from the latest complete
> checkpoint.
> All operator states will be reloaded with state written in that checkpoint,
> and the position of the input stream will also be re-winded.
>
> I don't think there is a way to force a checkpoint before restarting
> occurs,
> but as I mentioned, that should not be required, because the last complete
> checkpoint will be used.
> Am I missing something in your particular setup?
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>
>
>
>

Reply via email to