Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
Hi all, I am trying to test failure recovery of a Flink job when a JM or TM goes down. Our target is having job auto restart and back to normal condition in any case. However, what's I am seeing is very strange and hope someone here help me to understand it. When JM or TM went down, I see the jo

Re: Flink failure recovery tooks very long time

2018-09-06 Thread Yun Tang
t there is a switch to ... ci.apache.org From: trung kien Sent: Thursday, September 6, 2018 18:50 To: user@flink.apache.org Subject: Flink failure recovery tooks very long time Hi all, I am trying to test failure recovery of a Flink job when a JM or TM goes down. Our target is having job auto

Re: Flink failure recovery tooks very long time

2018-09-06 Thread vino yang
ci.apache.org > > ---------- > *From:* trung kien > *Sent:* Thursday, September 6, 2018 18:50 > *To:* user@flink.apache.org > *Subject:* Flink failure recovery tooks very long time > > Hi all, > > I am trying to test failure recovery of a Flink job when a

Re: Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
st-once> >> Apache Flink offers a fault tolerance mechanism to consistently recover >> the state of data streaming applications. The mechanism ensures that even >> in the presence of failures, the program’s state will eventually reflect >> every record from the data stream exa

Re: Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
rg/projects/flink/flink-docs-stable/internals/stream_checkpointing.html#exactly-once-vs-at-least-once> >>> Apache Flink offers a fault tolerance mechanism to consistently recover >>> the state of data streaming applications. The mechanism ensures that even >>> in the pres

Re: Flink failure recovery tooks very long time

2018-09-06 Thread Yun Tang
t; Sent: Thursday, September 6, 2018 18:50 To: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Flink failure recovery tooks very long time Hi all, I am trying to test failure recovery of a Flink job when a JM or TM goes down. Our target is having job auto restart and back to normal con