To expand on my point, if you want to be able to continue to attempt to make progress when the ZK is down, the act of getting a lock should also cause the lock owner to get a sequence number that it can use to identify the period of operation it is in. I believe that then, say, you get sequence number 1. If you tag all of your requests with 1, if for any reason you lose the lock and don't know it, and server #2 gets the lock, it should get sequence #2. The resource should then reject all requests with sequence below 2, so if any remaining requests tagged 1 are lying around they should be rejected by the resource. And there you have it: You can continue to make safe forward progress while in uncertain state on the ZK side so long as the original lock holder is available and the resource validates these things. If both the ZK itself go down and the original lock holder goes down, you're still AWOL presumably.
C On Wed, Jul 15, 2015 at 2:24 PM, Camille Fournier <[email protected]> wrote: > I thought that the client itself had a notion of the session timeout > internally that would conservatively let the client know that it was dead? > If not, then that's my faulty memory. > > That being said, if you really care about the client not sending messages > when it does not have the lock, the resource under contention needs to > validate the messages it is receiving, though. You cannot guarantee that > just because a client believes it is connected and sends a message to > locked resource that the message will be received while the sender still > has the lock. If you don't care about this possibility then just assuming > you lose the lock when you are in any state other than connected is > adequate but just be aware that events such as long GC pauses and network > issues can cause you to access the resource improperly. > > C > > On Wed, Jul 15, 2015 at 2:19 PM, Jordan Zimmerman < > [email protected]> wrote: > >> Once client A loses connection it must assume that it no longer has the >> lock (you could try to time the session but I think that’s a bad idea). >> Once you reconnect, you will know if your session is still active or not. >> When done correctly, there’s no chance that both A and B will think they >> own the lock at the same time. >> >> -Jordan >> >> >> >> On July 15, 2015 at 1:17:10 PM, Vikas Mehta ([email protected]) wrote: >> >> Thanks for the quick response Camille. If client A owns the lock, gets >> disconnected due to network partition, it will not see the SESSION_EXPIRED >> event until it is too late, i.e. client B has acquired the lock and done >> the >> damage. Problem here is that client cannot distinguish network partition >> from zookeeper ensemble in leader election state. >> >> >> >> -- >> View this message in context: >> http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581279.html >> Sent from the zookeeper-user mailing list archive at Nabble.com. >> > >
