Thanks! Great to know I do not have to worry duplicates inside Flink. One more question, why this happens? Because TM and JM both check leadership in different interval? > The TM noticed the loss of leadership before the JM did.
On Wed, Dec 27, 2017, 13:52 Ufuk Celebi <u...@apache.org> wrote: > On Wed, Dec 27, 2017 at 4:41 PM, Hao Sun <ha...@zendesk.com> wrote: > >> Somehow TM detected JM leadership loss from ZK and self disconnected? >> And couple of seconds later, JM failed to connect to ZK? >> > > Yes, exactly as you describe. The TM noticed the loss of leadership before > the JM did. > > >> After all the cluster recovered nicely by its own, but I am wondering >> does this break the exactly-once semantics? If yes, what should I take care? >> > > Great :-) It does not break exactly-once guarantees *within* the Flink > pipeline as the state of the latest completed checkpoint will be restored > after recovery. This rewinds your job and might result in duplicate or > changed output if you don't use an exactly once or idempotent sink. > > – Ufuk > >