I was on vacation but wanted to thank Biao for summarizing the current
state! Thanks!

On Mon, Jul 15, 2019 at 2:00 AM Biao Liu <mmyy1...@gmail.com> wrote:

> Hi Aaron,
>
> From my understanding, you want shutting down a Task Manager without
> restart the job which has tasks running on this Task Manager?
>
> Based on current implementation, if there is a Task Manager is down, the
> tasks on it would be treated as failed. The behavior of task failure is
> defined via `FailoverStrategy` which is `RestartAllStrategy` by default.
> That's the reason why the whole job restarts when a Task Manager has gone.
> As Paul said, you could try "region restart failover strategy" when 1.9 is
> released. It might be helpful however it depends on your job topology.
>
> The deeper reason of this issue is the consistency semantics of Flink,
> AT_LEAST_ONCE or EXACTLY_ONCE. Flink must respect these semantics. So there
> is no much choice of `FailoverStrategy`.
> It might be improved in the future. There are some discussions in the
> mailing list that providing some weaker consistency semantics to improve
> the `FailoverStrategy`. We are pushing forward this improvement. I hope it
> can be included in 1.10.
>
> Regarding your question, I guess the answer is no for now. A more frequent
> checkpoint or a savepoint manually triggered might be helpful by a quicker
> recovery.
>
>
> Paul Lam <paullin3...@gmail.com> 于2019年7月12日周五 上午10:25写道:
>
>> Hi,
>>
>> Maybe region restart strategy can help. It restarts minimum required
>> tasks. Note that it’s recommended to use only after 1.9 release, see [1],
>> unless you’re running a stateless job.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-10712
>>
>> Best,
>> Paul Lam
>>
>> 在 2019年7月12日,03:38,Aaron Levin <aaronle...@stripe.com> 写道:
>>
>> Hello,
>>
>> Is there a way to gracefully terminate a Task Manager beyond just killing
>> it (this seems to be what `./taskmanager.sh stop` does)? Specifically I'm
>> interested in a way to replace a Task Manager that has currently-running
>> tasks. It would be great if it was possible to terminate a Task Manager
>> without restarting the job, though I'm not sure if this is possible.
>>
>> Context: at my work we regularly cycle our hosts for maintenance and
>> security. Each time we do this we stop the task manager running on the host
>> being cycled. This causes the entire job to restart, resulting in downtime
>> for the job. I'd love to decrease this downtime if at all possible.
>>
>> Thanks! Any insight is appreciated!
>>
>> Best,
>>
>> Aaron Levin
>>
>>
>>

Reply via email to