Re: Leader election and leader operation based on zookeeper

2019-10-01 Thread Jordan Zimmerman
Yes, I think this is a hole. As I've thought more about it I think the method you described using the lock node in the transaction is actually the best. -JZ > On Sep 29, 2019, at 11:41 PM, Zili Chen wrote: > > Hi Jordan, > > Here is a possible edge case of coordination node way. > > When an

Re: Leader election and leader operation based on zookeeper

2019-09-29 Thread Zili Chen
Hi Jordan, Here is a possible edge case of coordination node way. - When an instance becomes leader it: - Gets the version of the coordination ZNode - Sets the data for that ZNode (the contents don't matter) using the retrieved version number - If the set succeeds you

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Zili Chen
>the Curator recipes delete and recreate their paths However, as mentioned above, we do a one-shot election(doesn't reuse the curator recipe) so that we check the latch path is always the path in the epoch the contender becomes leader. You can check out an implementation of the design here[1]. Eve

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Jordan Zimmerman
Thinking more about this... I imagine this works if the current leader path is always used. I need to think about this some more. -JZ > On Sep 21, 2019, at 1:31 PM, Jordan Zimmerman > wrote: > > The issue is that the leader path doesn't stay constant. Every time there is > a network partiti

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Jordan Zimmerman
The issue is that the leader path doesn't stay constant. Every time there is a network partition, etc. the Curator recipes delete and recreate their paths. So, I'm concerned that client code trying to keep track of the leader path would be error prone (it's one reason that they aren't public - i

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Zili Chen
Hi Jordan, >I think using the leader path may not work could you share a situation where this strategy does not work? For the design we do leader contending one-shot and when perform a transaction, checking the existence of latch path && in state LEADING. Given the election algorithm works, stat

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Jordan Zimmerman
Yeah, Ted - I think this is basically the same thing. We should all try to poke holes in this. -JZ > On Sep 21, 2019, at 11:54 AM, Ted Dunning wrote: > > > I would suggest that using an epoch number stored in ZK might be helpful. > Every operation that the master takes could be made conditio

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Ted Dunning
WHat I suggested is almost exactly what Jordan suggested. I should have read the rest of the thread before posting. On Sat, Sep 21, 2019 at 9:54 AM Ted Dunning wrote: > > I would suggest that using an epoch number stored in ZK might be helpful. > Every operation that the master takes could be

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Ted Dunning
I would suggest that using an epoch number stored in ZK might be helpful. Every operation that the master takes could be made conditional on the epoch number using a multi-transaction. Unfortunately, as you say, you have to have the update of the epoch be atomic with becoming leader. The natural

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Jordan Zimmerman
Here's a first pass Curator recipe to handle the transaction part (note you add a check(path, version) on the coordination node, not a setData). https://gist.github.com/Randgalt/1a19dcd215e202936e5b92c121fc73de When you

Re: Leader election and leader operation based on zookeeper

2019-09-21 Thread Jordan Zimmerman
I took a quick look at "Rethink High-Availability Stores" and I think using the leader path may not work. I think the best solution will be something akin to combining a leader election with a common ZNode versioning scheme. i.e. Create a single ZNode to be used for coordination Elect a leader i

Re: Leader election and leader operation based on zookeeper

2019-09-20 Thread Zili Chen
Hi Jordan, Thanks for your pointing out. However, I'm not clear about lock strategy of Curator. Is it possible that getZookeeperClient().getZooKeeper() concurrent with a session expire and re-instance ZK client(thus I get the wrong session id)? Furthermore, even if I get the session id, check it

Re: Leader election and leader operation based on zookeeper

2019-09-20 Thread Jordan Zimmerman
> It seems Curator does not expose session id you can always access the ZooKeeper handle directly to get the session ID: CuratorFramework curator = ... curator.getZookeeperClient().getZooKeeper() -JZ > On Sep 20, 2019, at 10:21 PM, Zili Chen wrote: > > >>I am assuming the "write operation" he

Re: Leader election and leader operation based on zookeeper

2019-09-20 Thread Zili Chen
>>I am assuming the "write operation" here is write to ZooKeeper Yes. >>Looks like contender-1 was not reusing same ZooKeeper client object, so this explains how the previous supposed to be fail operation succeeds? Yes. Our communication to ZK is based on Curator, which will re-instance a client

Re: Leader election and leader operation based on zookeeper

2019-09-20 Thread Michael Han
>> thus contender-1 commit a write operation even if it is no longer the leader I am assuming the "write operation" here is write to ZooKeeper (as opposed to write to an external storage system)? If so: >> contender-1 recovers from full gc, before it reacts to revoke leadership event, txn-1 retri

Leader election and leader operation based on zookeeper

2019-09-20 Thread Zili Chen
Hi ZooKeepers, Recently there is an ongoing refactor[1] in Flink community aimed at overcoming several inconsistent state issues on ZK we have met. I come here to share our design of leader election and leader operation. For leader operation, it is operation that should be committed only if the co