Ok, thanks for the clarification.
On Thu, Dec 28, 2017 at 1:05 AM Ufuk Celebi wrote:
> On Thu, Dec 28, 2017 at 12:11 AM, Hao Sun wrote:
> > Thanks! Great to know I do not have to worry duplicates inside Flink.
> >
> > One more question, why this happens? Because TM and JM both check
> leadershi
On Thu, Dec 28, 2017 at 12:11 AM, Hao Sun wrote:
> Thanks! Great to know I do not have to worry duplicates inside Flink.
>
> One more question, why this happens? Because TM and JM both check leadership
> in different interval?
Yes, it's not deterministic how this happens. There will also be cases
Thanks! Great to know I do not have to worry duplicates inside Flink.
One more question, why this happens? Because TM and JM both check
leadership in different interval?
> The TM noticed the loss of leadership before the JM did.
On Wed, Dec 27, 2017, 13:52 Ufuk Celebi wrote:
> On Wed, Dec 27, 2
On Wed, Dec 27, 2017 at 4:41 PM, Hao Sun wrote:
> Somehow TM detected JM leadership loss from ZK and self disconnected?
> And couple of seconds later, JM failed to connect to ZK?
>
Yes, exactly as you describe. The TM noticed the loss of leadership before
the JM did.
> After all the cluster re
out what is the root
cause this time.
>From JM.log
*2017-12-26 14:57:08,624* INFO org.apache.zookeeper.ClientCnxn - Client
session timed out, have not heard from server in 85001ms for sessionid
0x25ddcdec0ef77af, closing socket connection and attempting reconnect
2017-12-26 14:57:23,621 W