Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper: link <http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf>
it suggests 2 ways in which the storage can support lock generations and proposes an alternative for the case where the storage can't be made aware of lock generations. On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman < [email protected]> wrote: > Ivan, I just read the blog and I still don’t see how this can happen. > Sorry if I’m being dense. I’d appreciate a discussion on this. In your blog > you state: "when ZooKeeper tells you that you are leader, there’s no > guarantee that there isn’t another node that 'thinks' its the leader.” > However, given a long enough session time — I usually recommend 30–60 > seconds, I don’t see how this can happen. The client itself determines that > there is a network partition when there is no heartbeat success. The > heartbeat is a fraction of the session timeout. Once the heartbeat fails, > the client must assume it no longer has the lock. Another client cannot > take over the lock until, at minimum, session timeout. So, how then can > there be two leaders? > > -Jordan > > On July 15, 2015 at 2:23:12 PM, Ivan Kelly ([email protected]) wrote: > > I blogged about this exact problem a couple of weeks ago [1]. I give an > example of how split brain can happen in a resource under a zk lock (Hbase > in this case). As Camille says, sequence numbers ftw. I'll add that the > data store has to support them though, which not all do (in fact I've yet > to see one in the wild that does). I've implemented a prototype that works > with hbase[2] if you want to see what it looks like. > > -Ivan > > [1] > > https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215 > [2] https://github.com/ivankelly/hbase-exclusive-writer > > On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta <[email protected]> wrote: > > > Jordan, I mean the client gives up the lock and stops working on the > shared > > resource. So when zookeeper is unavailable, no one is working on any > shared > > resource (because they cannot distinguish network partition from > zookeeper > > DEAD scenario). > > > > > > > > -- > > View this message in context: > > > http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html > > Sent from the zookeeper-user mailing list archive at Nabble.com. > > >
