We use Akka's DeathWatch mechanism to detect dead components. TaskManager failure shouldn’t prevent recovering from state (as long as there are enough task slots).
I’m not sure if I understand what you mean by "source stream thread" crash. If is was some error during performing a checkpoint so that it didn’t complete, Flink will not be able to recover from such incomplete checkpoint. Could you share us the logs with your issue? Thanks, Piotrek > On Sep 29, 2017, at 7:30 AM, yunfan123 <yunfanfight...@foxmail.com> wrote: > > In my understanding, flink just use task heartbeat to monitor taskManager is > running. > If source stream (Time Trigger for XXX)thread is crash, it seems flink can't > recovery from this state? > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/