2011/10/13 Frédéric Jolliton <frede...@jolliton.com>

> > ZK and the client will realize that the connection is interrupted very
> > quickly.  You will get a disconnection event at that time.  The ZK client
> > software will automatically try to reconnect.  When it succeeds, you will
> be
> > notified of the reconnection or of a session expiration.
> >
> > Note that you will be notified of the connection loss *before* ZK deletes
> > your ephemeral file (if the clock on the ZK server is stable).
> >
> > Any method you use will have the problem that the connection loss is not
> > detected immediately.
>
> Thanks for the reply.
>
> However, the way you're wording it seems to imply that it depends of
> timing assumptions (that the machine will be responsive enough). These
> assumptions can break if the machine is under heavy load for example,
> and thus do not give strong guaranties. Is that right?
>

It is as you say.  But this is inherent in the problem of distributed locks
(and that is a key problem with distributed locks).


> I can't figure how that would work 100% of the time.
>

Well, ultimately you can't guarantee that.

The problem is that there are scenarios where a machine has to recognize
either split-brain or loss of lock.  If that recognition can be arbitrarily
slowed down, then you have a serious problem because you typically have a
hard real-time deadline for that recognition for correct performance.  If
you include arbitrary time-travel in the form of clock shifts, then you have
an even harder problem.

With Zookeeper at least you can specify what the deadlines are.  You get to
specify the heartbeat interval and the session expiration time.  These
control how quickly connection loss is detected and how quickly a process
must relinquish control on loss of connection.  Just being able to specify
these is very refreshing, frankly speaking.

Reply via email to